WPS6009 Policy Research Working Paper 6009 How Accurate Are Recall Data? Evidence from Coastal India Francesca de Nicola Xavier Giné The World Bank Development Research Group Finance and Private Sector Development Team March 2012 Policy Research Working Paper 6009 Abstract This paper investigates the accuracy of recall data by These results imply that the variance estimated from the comparing administrative records with retrospective, self- self-reported earnings distribution will be lower than reported survey responses to income and asset questions the real one. The paper also finds that data reported by for a sample of self-employed households from coastal income earners are more accurate than those by their India. It finds that the magnitude of the recall error wives. In addition, the use of time cues can worsen increases over time, in part because respondents rely less accuracy if they are not relevant to the respondent. on memory and instead infer earnings based on past Where the recall questions are placed in the two-hour earnings. Individuals tend to recall monthly earnings long survey, however, does not affect accuracy. more accurately when they are higher than the median. This paper is a product of the Finance and Private Sector Development Team, Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The author may be contacted at xgine@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team How accurate are recall data? Evidence from coastal India* Francesca de Nicola e Xavier Gin´ IFPRI The World Bank JEL Classi�cation Codes: C8, O12, Q12 Keywords: self-employment, recall error, measurement error, telescoping * e de Nicola: FdeNicola@cgiar.org, Gin´ (corresponding author): xgine@worldbank.org, +1 202 4730451. The views expressed in this paper are those of the authors and do not necessarily represent the views of IFPRI, the World Bank, its Executive Directors, or the countries they represent. We have bene�tted from the comments of Jishnu Das, Aart Kray, David McKenzie, Giovanna Prennushi and several conference participants. Funding from IFPRI and the World Bank is gratefully acknowledged. 1 1 Introduction Self-employment is a major source of income and employment in developing countries, especially among low-income households (OECD (2009)). Accurate data on employment, income and pro�ts are therefore critical for measuring poverty and inequality and for making sound, evidence-based, policy prescriptions. For example, Poterba and Summers (1986) �nd through audits of employment surveys that correcting employment self-reports can change the estimated duration of unemploy- ment by a factor of two. Similarly, if consumption is regressed against income, and income is measured with error, attenuation bias may lead the policymaker to conclude that there is risk- sharing when in fact households may not be protected from idiosyncratic income fluctuations. If accurate records on consumption, income and pro�ts existed, there would be no reason for concern, but the majority of individuals engaged in self-employment do not keep records. In practice, in- come and consumption data typically come from self-reports collected in surveys that are subject to recall and measurement error. Panel survey data are typically collected by either interviewing the same set of households over time multiple times, or by surveying households only once and asking questions about their current and past situations. The �rst method is regarded as more precise and reliable but it requires tracking households over time with potential attrition problems and often re-hiring enumerators for each round of data collection as they tend to happen several months or even years apart. These factors can substantially increase the cost and thus motivate the growing interest in retrospective panel survey data. Being collected all at once, they abstract from attrition problems and minimize the cost of gathering information. However, their reliability may be compromised if respondents are asked about events that are recalled imprecisely. As researchers engage more and more in primary data collection, especially in developing countries, it is critical to assess how accurate retrospective data are to decide on the most appropriate way to collect reliable data. In this paper, we investigate the accuracy of recall data by comparing administrative records with retrospective survey data from a developing country. Self-reported data come from a sample of self-employed households engaged in �shing in costal India. To the best of our knowledge, this is the �rst study on recall error using actual and reported data on self-employment in a developing country, thus contributing to the large econometric literature concerned with measurement error. See Bound, Brown and Mathiowetz (2001) for a review. We assess recall error using two different events in the lives of small-scale boat owners. The �rst is the date of purchase of the boat, which constitutes the single largest productive asset. The second is monthly gross earnings from �shing over the 34 months prior to the survey. We focus on boat owners for two main reasons. First, because the two events of interest are relevant to the respondent and easy to elicit, allowing us to minimize reporting error not due to imperfect recall. Indeed, small scale boat owners in coastal India are exclusively self-employed and their earnings account for most of the household income. In addition, earnings from �shing are perhaps easier to elicit than overall income because the concept is well de�ned and familiar to respondents. When monthly income is asked in surveys, its de�nition is typically left to the judgement of the respondent 2 or is preceded by a cumbersome preamble that can be imperfectly understood.1 Since we only focus on one source of income, we also minimize the misreporting that arises when omitting other sources of income. The second reason why we use boat owners is access to unique administrative data that allow us to validate the survey data and to identify key patterns of recall error. For both events considered, we �nd that the absolute value of recall error increases with the recall period, con�rming a well known fact in the cognitive science literature (for example, Tourangeau (2000)). Because the length of the recall period is correlated with the magnitude of the error, the assumption of classical measurement error (i.e. that the error term is white noise) does not hold in the data. We suggest a plausible explanation for this correlation. As respondents are asked to remember events further back in time, they rely less on memory or recall and instead infer earnings based on the history of past earnings (see Tourangeau, Rips and Rasinski (2000) and references therein). To test this hypothesis, we study whether self-reports revert to the boat owner’s mean or his most recent earnings (relative to the date of the interview) as the recall period increases. Consistent with evidence from the U.S. by Bound et al. (1989), using the Panel Study in Income Dynamics (PSID) data, and Angrist and Krueger (1999), using the Current Population Survey (CPS) data, we �nd evidence of convergence to the mean but not to the most recent value. Because respondents rely less on memory and more on inference as the recall period increases, some moments of the self-reported income distribution will be error-ridden for long enough recall periods. If one is interested in the mean of the income distribution, then recall data are appropriate because the mean of the self-reported income process matches well that of the realized process. However, when estimating the volatility of the income process, recall data will yield a lower variance than the true one as the recall period increases. Using a moving window of 12 months to estimate changes in the mean and variance over time, we �nd that with a 2-month recall period, the variance of the self-reported income is indistinguishable from that of the actual income, but when the recall period is 24 months, the variance of the self-reported income process is 13 percent lower. In contrast, the mean of the recall period is only 2 percent below the actual mean, irrespective of the recall period. We also study whether months when earnings are higher than the median are better recalled relative to months when earnings are below the median. On the one hand, low earnings may be better recalled given that they are costlier in utility terms. On the other hand, individuals may dislike recalling unpleasant events (Wagenaar (1986), Thompson (1996), Holmes (1970) and Skowronski et al. (1991)). We test this potential asymmetry in recall in two ways. First, we explore whether boat owners tend to remember correctly the month with the highest earnings relative to the month with lowest earnings. Second, we assess whether recall error is higher in months when earnings were below the median. We �nd that earnings higher than the median and the months when they happen are recalled better, suggesting that individuals tend to forget unpleasant events. Finally we explore several methodological issues related to the collection of retrospective data. 1 McFadden et al. (2005) provide an example of such a preamble: “Please state income before taxes, including fringes such as employer-paid health insurance, excluding income from sale of household goods or automobiles, excluding bonuses, etc.�. 3 First, we investigate who in the household provides more accurate information. While male boat owners are the ones directly earning income, their spouses handle the money received by their husbands as they are responsible for shopping and cooking. We �nd that boat owners provide more accurate responses and that these are not influenced by individual or household characteristics, unlike reports from their wives. Interestingly, however, we �nd no evidence of underreporting by females as would be the case if husbands secretly kept a part of total earnings to themselves before handing over the money to their wives. Second, we assess whether the provision of time cues improves the accuracy of recall. Perhaps unsurprisingly, we �nd that unless the time cues are relevant to the respondent, they can worsen accuracy because recall error is compounded with errors in the actual timing of the cue. Lastly, we study whether the position of the recall questions in the two-hour long survey influences the accuracy of recall and �nd no effect. These results contribute to three related literatures. First, they relate to the work of social and cognitive psychologists investigating how respondents answer questions, the mental processes that are activated by recall, and the personal and environmental factors that influence autobiograph- ical memory (see Dex (1995) and Tourangeau, Rips and Rasinski (2000) for extensive reviews). Most of these psychological studies use lab experiments or compare diaries and survey answers (Sudman and Bradburn (1973)) from developed countries, mainly the U.S. and Europe. Second, by using data from India, we also contribute to a small but growing literature that uses data from developing countries. Beegle et al. (2012) compare different approaches to collecting consumption data in Tanzania, including variations in the recall period. Beckett et al. (2001) assess the accuracy of recall by contrasting the retrospective information in the second Malaysian Family Life Surveys (MFLS) �elded in 1988-9 with the answers provided 12 years earlier during the �rst round of the MFLS. A similar approach is followed by Beegle, Carletto and Himelein (2012), who compare self-reports of African farmers collected at two different points in time, although not as far apart as in Beckett et al. (2001). Conversely, comparing different survey frequencies, a Das, Hammer and S´nchez-Paramo (2012) study the prevalence and duration of sickness episodes and doctor visits. Finally, De Mel, McKenzie and Woodruff (2009) are concerned with measuring pro�ts among micro entrepreneurs. All these studies lack administrative data that can be used as an independent source of information to validate the survey responses. The third literature to which this paper contributes is related to validation studies that combine censuses or large-scale panel surveys (such as the PSID, the CPS or the Survey of Consumer Finance (SCF)) with administrative data records from the Social Security Administration (see for instance, Duncan and Hill (1985), Kennickell and Starr-McCluer (1997), Bound and Krueger (1991), Bound et al. (1994), Pischke (1995) for the U.S. and Akee (2011) for a developing country). This literature assesses the nature of the measurement error by comparing self-reported earnings data with administrative records in the cross-section or in �rst-differenced data if more than one year of data are available. The results refute the assumption of classical measurement error but in the U.S., the use of self-reported instead of administrative data leads to little loss of accuracy. In a developing country, however, Akee (2011) �nds larger losses of accuracy, especially if �rst- 4 differenced data are used. While this literature uses self-reported earnings elicited at different points in time to study “contemporaneous� measurement error, our paper is concerned with “longitudinal� measurement error by using self-reported earnings at different points in time elicited only once. Another key difference is that the literature uses wage workers while we use self-employed individuals. Recall errors may therefore be different because the income processes are different. The rest of the paper is organized as follows. Section 2 describes the context and the data used. Section 3 examines the determinants of the recall error. Section 4 addresses some practical issues about the collection of retrospective data and Section 5 concludes. 2 Context and Data During the study period, most of the small-scale �shing boats in coastal Tamilnadu were made with �ber-reinforced plastic (FRP) by a handful of local manufacturers. FRP boats are fairly homogeneous in size (18x7 feet) and are typically powered by an 8 or 9 HP outboard engine.2 As mentioned, the vessel is the single largest productive investment that a boat owner makes. At the time of the study, an FRP boat costs around 60,000 Indian rupees (Rs) (roughly USD 1,276). In contrast, the daily gross earnings from �shing across boat owners and time of year ranged from Rs 200 to Rs 4,000 (about USD 4 to 86). On a typical day, boats leave the shore around 1AM and come back between 7AM and 11AM, in time for the village �sh market, which takes place at the beach. During the market, local �sh auctioneers and staff from the �shermen societies market the catches to a group of sellers, which comprises local traders as well as agents from national and international �sh-processing companies. About 90 percent of the daily catches are sold to the auctioneer or the �shermen society. The rest consists of �sh that does not meet the criteria set by the merchants and agents due to size or variety. This �sh is either sold in local markets or consumed at home. We collected unique administrative and retrospective survey data from March 2005 to July 2007 from a sample of 239 small-scale �shing households (boat owner and spouse) in seven villages along the southeastern coast of India, in the state of Tamilnadu. In all but one village, boat owners are members of �shermen societies. In that village, boat owners still rely on local auctioneers that provide credit for �shing equipment or consumption in exchange for the right to market their catches. We have administrative records for 7 of the 10 auctioneers that work in that village, accounting for 85 percent of all boat owners. The percentage of boat owners that are members of the �shermen society and thus for whom we have administrative data in the remaining six villages is lower, at 63 percent. We have a total of 2,098 boat owner self-reports and 1,646 spouse self-reports of monthly 2 Prior to the purchase of an FRP boat, boat owners used a catamaran, which is a raft-like vessel made of two Alphesia logs tied together with two crossbeams at the two ends. FRPs can cope with rough surf and are, at the same time, more comfortable, faster and more economical than catamarans. With the same number of crew, an FRP’s catches are about 50 percent bigger than those of a catamaran. 5 earnings and 64 boat owner and spouse self-reports on the date of purchase of the boat.3 Given that each of the 239 boat owners was asked 12 recall questions, there should a total of 2,868 self-reports. As it turns out, 504 boat owner reports are missing because they did not go �shing that month. During the low season, it is common for some boat owners to quit �shing and to work in a large trawler for a wage. The remaining 266 observations are missing due to lack of administrative data, possibly because the boat owner left the society or switched to an auctioneer that refused to share the data. The number of missing observations among wives is higher. About 507 observations are missing because the boat owner did not go �shing, 263 observations are missing because administrative data are missing, and the remaining 452 observations are missing because self-reports are missing. Interestingly, the number of missing wife self-reports does not increase with the recall period. Appendix Table A2 reports the results of a pooled regression with the total number of missing observations per boat owner and wife (column 1), the number of missing observations because the boat owner did not go �shing (column 2), and the number of missing observations because either administrative data or self-reports (in the case of wives) are missing (column 3). We include as controls a boat owner dummy, age, literacy, wealth, and average earnings in 2005. The coefficient on the boat owner dummy is negative and signi�cant in columns 1 and 3, con�rming that wives’ reports contain more missing values. Interestingly, in column 2 we �nd evidence of selection, since boat owners with lower average administrative earnings in 2005 during the high season were more likely to quit �shing during the low season.4 2.1 Administrative data Administrative data on earnings come from hand written daily records from March 2005 to July 2007 kept by auctioneers and hired staff at the �shermen society. These records are updated daily in the presence of the boat owner during the sale to the �sh merchants. As a result, misreporting is not a concern. At the end of the year, some auctioneers hand over the individualized history of sales to their boat owners. From the administrative data kept by auctioneers in one village, we infer the date of purchase of the �ber boat using the purpose of loans recorded since 2000. 2.2 Recall data Recall data come from self-reports collected during a survey administered between September and December 2007. Boat owners and their wives were interviewed separately and asked information about a variety of topics including socio-economic characteristics, �shing practices and recall ques- tions about their catches and the timing of important events. While boat owners and their wives had been interviewed before, this was the �rst time they were asked about earnings, given that these 3 The value of catches is recorded for all the boat owners in the sample but we only have administrative data on the date of boat purchase from the records of auctioneers (moneylenders) in one village. As it turns out, we use the purpose of the loan to determine when the boat was purchased and this information is missing from the records of the �shermen societies. 4 In Section 3, we run the analysis using all available data to maximize the number of observations and restrict the sample to boat owners with data on all 12 recall questions. 6 data had always been collected directly from the administrative records. In particular, respondents were asked to report the monetary value of �sh sales (not the overall value of �sh catches) in the months of March, July and November 2005, March, May, July, September, and November 2006, and January, March, May, and July 2007. Respondents were also asked to recall the months in 2006 with the highest and lowest values of catches. Half the respondents were asked the recall questions at the beginning of the survey, after being asked about the household roster. The other half were asked the same questions toward the end of the survey, on average an hour and twenty minutes later than the �rst half. The survey lasted a bit more than two hours. 2.3 Summary Statistics Table 1 reports descriptive statistics of the characteristics of the survey and the variables used in the analysis. Appendix Table A.1 describes each variable in detail. All boat owners are male, married and the heads of their households. They are 42 years old on average and have primary education. They make about 5.40 dollars per capita a day, similar to the per capita income of many individuals engaged in self-employment activities in developing countries. The median household has one income earner (the boat owner), which explains why most of the household income (93 percent), on average, comes exclusively from �shing. The next largest source of household income is the sale of �sh in local markets of the �sh rejected by auctioneers and societies. Figure 1 HERE Table 1 reports (in logs) average monthly earnings of 19,836 Rs (507 USD) according to admin- istrative records, 19,406 Rs according to boat owner self-reports and 19,327 Rs according to their spouses. These averages are very similar to each other. Indeed, Figure 1 reports the histograms of administrative data V A and boat owner self-reports V S in Indian rupees. It suggests that the em- pirical distributions overlap almost perfectly while exhibiting the right skewness typical of income data.5 Table 1 also reports that boats were purchased on average between �ve and six years prior to the date of the interview. 2.4 De�nitions We now de�ne the measure of recall error that will be used in the analysis. Let the relationship A S between the true measure of earnings (administrative data) Vit and the self-reported measure Vit for individual i at time t be given by: (1) S A log Vit = log Vit + ϵV it so that ϵV = log Vit − log Vit , it S A 5 Both distributions have skewness of 0.99 and kurtosis of 5. The Shapiro Francia test of normality is strongly rejected (p-value is 0.000). 7 where the recall error ϵV need not be white noise. An analogous recall error for individual i can be it de�ned for the date when the FRP boat was purchased: (2) ϵT = TiS − TiA , i where T A and T S are number of months elapsed between the time of the survey and the date the FRP boat was purchased according to the administrative records and survey data, respectively. Table 1 reports recall errors in earnings and the timing of purchase of the FRP for the boat owners and their spouses’ self-reports. While errors about earnings appear small, reports about the timing of the purchase suffer from backward telescoping, in that the reported dates of purchase precede the actual date by about six months on average. The standard deviation of all recall errors is also large. In the next sub-section we study these differences in accuracy in greater detail. 2.5 Correlations and Reliability Ratios Following the validation studies in the U.S. (Bound, Brown and Mathiowetz (2001)) and in a devel- oping country (Akee (2011)), Table 2 reports the correlation coefficients between the administrative and survey data. We report the correlation for all the data and for “contemporaneous� data, which only include earnings generated one or two months before the survey, to make them comparable with those in validation studies. We �nd correlations that are very high, 0.98 for boat owners’ reports and 0.97 for spouses’ reports. Akee (2011) reports correlations of 0.48 and 0.60 depending on the year used. Despite the higher volatility of earnings among self-employed individuals relative to wage workers, the correlations are high because, as mentioned, earnings are a familiar concept to respondents and thus easy to elicit. The correlations using all the data are lower, as expected, due to the presence of recall error, but still high (0.96 for male reports and 0.93 for the spouses’ reports). Table 2 also reports the correlation between the administrative data and the error term. As it turns out, using contemporaneous earnings data for boat owners (�rst row of Panel A), we cannot reject that measurement error is classical because the correlation is zero. For the spouses’ self-report, the correlation is negative and signi�cant. When all the data are used, we also �nd a negative and signi�cant correlation for both boat owners and their wives. Akee (2011) also reports a negative correlation although it is larger in magnitude (-0.40 or -0.58, depending on the year). Bound and Krueger (1991) refer to this negative correlation as “mean-reverting� indicating that respondents tend to underreport earnings when they are relatively high and vice-versa. This �nding is intuitive because it implies that other factors, in particular the length of the recall period, may affect the magnitude of the error. As we will see in the next section, this mean-reversion pattern does intensify with the length of the recall period and has implications for the accuracy of the moments of the earnings distribution. In Panel B of Table 2, the correlation between self-reports and administrative data on the timing of purchase of the FRP is also high for boat owners and their wives (both 0.81). The correlation 8 between the error term and administrative data is 0.07 for boat owners and 0.08 for their wives, yet it is not statistically signi�cant, possibly due to lack of power given that we only have 32 observations. Table 2 also reports the reliability ratio de�ned as the coefficient of administrative data in a regression with self-reported data as the dependent variable (Bound and Krueger (1991)).6 For earnings data, we �nd a reliability ratio of 0.91 for men and 0.88 for women, using all the data. Akee (2011) �nds reliability ratios of 0.42 and 0.70 depending on the year. For the timing of the purchase of the FRP, we �nd a reliability ratio of 0.62 for both men and women. 3 Determinants of Recall Error In this section we assess the determinants of recall error given that the assumption of classical measurement error does not hold. We focus on the accuracy of recall over time and the ability of respondents to recall high and low earnings. 3.1 Recall error over time One of the main concerns when using retrospective data is that accuracy may worsen with the length of the recall period. This is indeed supported by most evidence from the psychological and economic literature. For example, Sudman and Bradburn (1973) develop and validate a theoretical model showing that aided recall becomes more and more important as the recall period increases. Similarly Duncan and Hill (1985) show that the ratio of error-to-true variance of earnings and unemployment hours more than doubles in size after one year, hence providing evidence of hyperbolic decay in accuracy.7 Using our earnings data, we �nd that the error-to-true variance ratio increases over time from 0.10 using data with a recall period of less than a year, to 0.16 when the recall period is more than one year.8 Figure 2 here Figure 2 displays boxplots of the recall error in earnings using the boat owner self-reports de�ned in (1) for 5 different recall periods: 2 to 5, 6 to 12, 13 to 18, 19 to 24 and more than 24 months. 6 Cov(log(V S ),log(V A )) The reliability ratio is r = Var(log(V A )) . Under the assumption of classical measurement error r can be written Var(log(V A )) as r = Var(log(V A ))+Var(log(V S )−log(V A )) . Given that the assumption of classical measurement error does not hold in the A S A A Var(log(V earnings data, the reliability ratio can be written as r = Var(log(V A ))+Var(log(V ))+Cov(log(V )−log(V ),log(V )) A ),log(V A )) . S )−log(V A ))+2 Cov(log(V S )−log(V 7 In contrast, Rubin and Baddeley (1989) �nd a linear negative relation between the absolute errors in dating and time elapsed since the event. S A 8 The error-to-true variance ratio is de�ned as Var(V −V) ) . The ratios we �nd are lower than those reported by Var(V A Bound and Krueger (1991) (0.11 for men and 0.16 for women) because, as already mentioned, it may be easier to elicit monthly earnings from respondents in our data than it is to elicit annual earnings from salaried employees in the CPS. 9 It is clear that the interquartile range increases with the recall period and so does the number of outliers, even when using winsorized data.9 Figure 3 here Similarly, Figure 3 plots the same recall error in absolute value on the y-axis and the recall period in months on the x-axis. Data are also winsorized to reduce the effect of spurious outliers. Each dot represents the average recall error in absolute value across boat owners at a point in time. The solid line indicates the local polynomial smooth line across the dots, and the shaded area captures the 95 percent con�dence interval around them. The dashed line at zero provides the benchmark of perfectly accurate recall and so the closer the dots are to the dashed line, the smaller the recall error is. We again �nd that recall of recent earnings is fairly accurate but that it signi�cantly worsens over time. When less than a year has elapsed between the survey and time recalled, the administrative and survey data differ by less than 5 percent on average, but after one year the difference increases to over 15 percent on average (across boat owners and across time). This is well captured by the widening of the shaded area indicating the 95 percent con�dence interval. Alternatively, we compute the Spearman’s rank correlation between the monthly earnings from administrative records and survey data. When only 2007 data are used, the correlation is 0.97 but it drops to 0.92 when data from 2005 and 2006 are included. More formally, we test the decay in accuracy using a regression speci�cation that controls for other factors such as boat owner characteristics and features of the survey (e.g. the position of the recall questions in the questionnaire). Speci�cally, we run the following regression: (3) |ϵit | = f (Mt ) + Xit β + uit where the error is de�ned in equations (1) and (2), f (Mt ) is a function of the number of months elapsed since the survey (recall period) and Xit is a set of covariates that include the age, education and wealth of the respondent and a dummy indicating whether the recall questions were asked early in the survey. The inclusion of boat owner �xed effects instead of these covariates, or the inclusion of village �xed effects, does not affect the estimates of the function f (Mt ). We use the absolute value of the error ϵit to isolate deviations from the true value from systematic under or over reporting. Because we use the absolute value of the error, the dependent variable is right skewed and thus not normally distributed.10 As a result, we use absolute least deviations (LAD) instead of OLS and bootstrap the standard errors. Table 3 reports the results. In columns 1 to 4 the dependent variable is the absolute value of the recall error in earnings data, while in columns 5 and 6 the 9 Winsorization entails replacing all observations above and below a certain percentile by the value in those two percentiles. We use 98 percent winsorization: all observations above the 99th percentile (below the 1st precentile) are replaced by the 99th percentile (1st percentile). Results without winsorization and from trimmed data show qualitatively very similar results. 10 In fact, the error term is not normally distributed either. Its distribution has skewness -13 and kurtosis 229 and the Shapiro-Francia test for normality is rejected with a p-value=0.000. 10 dependent variable is the absolute value of the recall error in the timing of the purchase of the FRP boat. Columns 1 and 2 use data from all boat owners, while columns 3 and 4 restrict the sample to boat owners that answered all the recall questions in the survey. Columns 1, 3 and 5 use a linear function of the months elapsed since the survey, while columns 2, 4 and 6 use a quadratic function. Higher-order terms were not signi�cant. In the case of earnings, both the linear term in columns 1 and 3 and the quadratic term in columns 2 and 4 are positive and signi�cant, indicating that the recall error increases over time. Columns 2 and 4 con�rm the pattern shown in Figure 3 of exponential increase in the recall error and are our preferred speci�cations. A pattern of linear increase (rather than quadratic) is found for the purchase of FRP data in column 5. This increase in recall error is consistent with the theoretical (Rubin and Baddeley (1989)) and empirical evidence from psychology (for example, Baddeley, Lewis and Nimmo-Smith (1978), Burt (1992), Linton (1975) and Thompson (1996)). When we compare columns 1 and 2 with columns 3 and 4, we �nd that there are no signi�cant differences between using all the available data or the restricted sample of boat owners without missing observations. As a result, we henceforth report the results using all available data. In addition, since recall periods longer than one year are infrequent, we split the sample into recall periods less than a year long and more than a year long (see Appendix Table A.3). The results suggest a linear increase when the recall period is shorter than one year and an exponential increase when the recall period is longer than a year. Given the small number of observations in the purchase of FRP data and the fact that most purchases took place more than a year prior to the survey, we cannot split the sample. 3.1.1 Inference vs recall One reason why accuracy may worsen as the recall period increases is that respondents try to answer the questions by resorting more and more to estimation rather than retrieval from memory. This may happen because the likelihood of remembering the actual earnings in a given month declines as the number of subsequent months going �shing increases (see Bradburn, Rips and Shevell (1987)). Similarly, Thompson (1996) among others suggests that repeated experiences are more difficult to remember than unique ones. To test this hypothesis, we run the following non-linear regression: ( ) A (4) Vit = β∞ 1 − e−at V∞,it + βA e−at Vit + uit , S A A where V∞,it is the focal value to which self-reported data converge in period t if β∞ and a are non-zero, computed using administrative data from boat owner i. Notice that the focal value may depend on the calendar month, hence the subscript t. We start with two different measures for A this focal value V∞,it . First, the value of boat owner i’s most recent earnings prior to the survey A (Vi,Last ), and second, the average value of boat owner i’s past earnings in that calendar month (ViA ). S As an example, the focal value when Vit denotes the earnings of boat owner i for March 2007, say, will be the earnings for the month prior to the survey or the average of the respondent’s earnings 11 from administrative data for the month of March for all available years. Columns 1 and 2 of Table A 4 report the results. Regardless of the focal value V∞,it , the coefficient βA is always close to one and statically signi�cant at 1 percent, indicating, not surprisingly, that the self-reported value is A correlated with the actual value Vit . The coefficient a in column 1 is insigni�cant, indicating that self-reports do not converge to the most recent value of catches, perhaps because of the signi�cant seasonality in earnings. A In contrast, the coefficient a is positive and statistically signi�cant in column 2, when V∞,it corresponds to average earnings.11 This suggests that as the recall period increases, boat owners tend to report average earnings for that month rather than the true earnings. Put differently, respondents revert to the mean as the recall period increases, providing a rationale for the negative A and signi�cant correlation between log Vit and the error term ϵV using retrospective data, which is it also found in Bound and Krueger (1991) and Akee (2011) using cross-sectional data. So far we have shown that as the recall period increases, respondents tend to estimate earnings using their past history of earnings. However, the recall of a given boat owner could also be influenced by the histories of other boat owners with whom the respondent usually interacts. We asked boat owners to identify individuals with whom they typically discuss �shing issues such as weather, catchment points, etc. (“friends�) as well as other boat owners that live within �ve minutes walking from their house (“neighbors�). We de�ne as peers the set of all friends and neighbors. A Column 3 of Table 4 uses the average earnings of boat owner i’s peers as the focal value Vi,Peers . A Continuing with the example above, this focal value Vi,Peers for boat owner i in March 2007 is the average earnings for all boat owner i’s peers for all available data for the month of March. Boat owners identi�ed on average six neighbors and one friend, but since 20 boat owners did not identify a neighbor and 110 did not identify a friend, the number of observations in column 3 drops from 2,098 to 1,903. Similar to column 2, we �nd that as the recall period increases, the earnings self- report converges to the earnings of peers. Of course, there is no claim to causality. Peers could talk to each other or they could be similar and thus have similar incomes. Finally, column 4 uses data from boat owners in the village not identi�ed as friends or neighbors by boat owner i to construct A the focal value Vi,NonPeers . Surprisingly, the results are similar to those of column 3, suggesting a correlation between own �shing history and that of other boat owners in the village. While boat owners may use information from non peers because earnings are publicly observable during the A A sale of �sh on the shore, in Table 5 we test whether Vit or Vit,Group has more explanatory power. In particular, we run a regression similar to equation (4) above, but with two focal values: ( ) (5) Vit = β∞ 1 − e−at [αVit + (1 − α)Vit,Group ] + βA e−at Vit + uit , S A A A where �Group� refers to �Peers� in column 1 and to �NonPeers� in column 2. Similar to columns 3 and 4 of Table 4, the coefficients β∞ and a are positive and signi�cant but, more importantly, the coefficient α is high at 0.7 but not signi�cant in column 1 and high and signi�cant at 0.9 in 11 Results are similar when median earnings are used as a focal value. Indeed, the correlation between average and median earnings is large at 0.89. 12 column 2. This means that the relevant focal value is V A at least in column 2 and, as a result, respondents use their own history of catches to infer past earnings rather than that of other boat owners in the village. The �nding that earning self-reports converge to the mean earnings of the respondent has clear implications for the accuracy of the self-reported earnings distribution. If one is interested in the mean of the earnings distribution, then retrospective data is appropriate. However, if one is interested in the volatility of the earnings distribution, retrospective data will tend to underestimate the true volatility. To see this more formally, we compute the mean and variance of administrative and survey data using 12-month windows. The �rst observation uses 2 to 13 months prior to the survey, the second uses 3 to 14 months prior to the survey and so on. We choose 12 months because of seasonality in the earnings data. The last observation uses months 23 to 34, so we have 21 rolling windows per respondent or 42 observations per individual after pooling the administrative and survey records.12 We then run the following regression: (6) Miw = α1 (V A ) + α2 Trendw + α3 Trendw × (V A ) + uiw where Miw is the moment of the self-reported earnings distribution, that is, either the mean or the variance for individual i and window w, (V A ) is an indicator for whether the data come from administrative records and Trendw is a trend variable that takes value 1 for the �rst window, 2 for the second window, etc. Column 1 of Table 6 reports the results when the mean is the dependent variable and column 2 reports the results for the variance. According to column 1, the administrative data have a higher mean of Rs 430. The difference is statistically signi�cant but economically small because it amounts to an underreporting in the self-reported data of around 2 percent. Indeed, while more than 65 percent of self-reports fall within a 5 percent margin of their administrative counterparts, about a quarter of self-reports underreport by more than 5-percent, while only 10 percent over- report. Because lower values of the “Trend� variable refer to dates that are closer to the date of the survey while larger values refer to earlier dates, the negative and signi�cant coefficient on the “Trend� variable suggests that average earnings increase over time, in part due to inflation. More importantly, the interaction between the Trend variable and the indicator for administrative data is not signi�cant. This suggests that retrospective data can be used to estimate the mean of the earnings distribution (notwithstanding the small bias), irrespective of the recall period. The results in column 1 can also be used to refute an alternative explanation for the growth in recall error. If the respondent perceived that earnings had a different time trend than the actual one, recall error would increase with the recall period. The fact that the interaction term in column 1 is not signi�cant suggests that there is no discrepancy between the perceived time trend in self-reported data and the actual one in the administrative data. In contrast, the only coefficient in column 2 that is signi�cant is that of the interaction. Given 12 The results are robust to choosing other window lengths, for example 6 months. 13 that it is positive, it suggests that as the recall period (“Trend�) increases, the administrative data have higher variance. This coefficient is large. After two months, the difference in variance is negligible but it rises to 7 percent with a one-year recall period. As a result, the variance estimated with retrospective data will be downward biased for long enough recall periods. 3.2 High vs low earnings We now investigate whether a month with high (or low) earnings is more accurately recalled. According to the assumption of diminishing marginal utility, high earnings may be remembered less precisely because they are less costly in terms of foregone consumption.13 In contrast, there exists a psychological literature that supports the view that months with high earnings should be recalled more accurately because individuals tend to forget unpleasant events. For example, based on six-year log entries in his personal diary, Wagenaar (1986) argues that pleas- ant events are recalled more precisely than the ones he rated as neutral or unpleasant. Using the same method of extracting information from personal diaries but from a larger sample, Thompson (1996) and Holmes (1970) document similar �ndings. In addition, Skowronski et al. (1991) show that pleasant events are better remembered using survey data collected from undergraduate stu- dents. We test these two competing views in two ways. First, we assess whether boat owners are more accurate at identifying the month with the highest earnings compared with that with the lowest earnings. In particular, we compare the administrative data to the answer to the following question from the survey: “In the past 12 months, which was the month with the highest/lowest catches?�. According to Table 1 and consistent with the psychological literature, boat owners appear to recall better the months with the highest earnings (52 percent) relative to those with the lowest earnings (26 percent). More formally, we run a regression with a dummy as dependent variable that takes the value 1 if the month with highest or lowest earnings in 2006 was correctly identi�ed. The number of observations in the regressions is 478 because there are two observations per boat owner (one for highest and one for lowest earnings). Table 7 reports the results. In column 1, the probability that boat owners correctly recall the month with the highest earnings is 27 percent higher relative to the recollection of the month with lowest earnings. Columns 2 and 3 include the percentage of household income from �shing and either the number of adults or the number of male adults involved in the �shing sector in the household. In both columns the result holds. Interestingly, households with a higher percentage of income from �shing are also more accurate in identifying the months with highest and lowest earnings. This suggests that accuracy is related to how relevant income from �shing is to the household, echoing the �ndings in e Gin´, Townsend and Vickery (2009), which studies the accuracy of rainfed farmers at the onset of the monsoon. 13 The reference-dependent utility framework of Tversky and Kahneman (1991) is also consistent with this prediction. They argue that the value function is concave in the domain of gains, relative to a reference point, and convex in the domain of losses. Hence, boat owners would suffer more from a small loss than they would bene�t from a gain of the same magnitude and thus would remember better events associated with losses. 14 We also assess whether high earnings are better recalled by running the same regression in equation (3) but including a dummy that takes the value one if monthly catches for a given month, say March, are lower than the median of all available years for the month of March in the admin- istrative data. Consistent with the �ndings from the previous table, column 1 in Table 8 reports a positive and statistically signi�cant coefficient of “Monthly Earnings < median (1=Yes)�.14 Higher earnings in a given month, however, could be driven by higher daily earnings for days when the boat owner goes �shing or a higher number of days going �shing. If higher earnings are due to more �shing days, then accuracy may be higher because income from �shing that month is more important to the household, instead of the psychological argument that pleasant events such as months with high earnings are better recalled. To disentangle between the two explanations, we control for “Num. days worked in a month� in column 2. The coefficient is negative and statistically signi�cant, suggesting higher accuracy in months when income from �shing is more important. However, the coefficient associated with the dummy “Monthly Earnings 1), then boat owners (wives) are more accurate. We �nd that β = 0.32 and the coefficient is statistically signi�cant at 1 percent. In addition, the p-value of the t-test that β = 1 is zero, indicating that the slope of the dashed line is indeed lower than one, and that boat owners are overall more accurate than their wives. A more formal assessment of whether boat owners are more accurate than their wives is to run a LAD regression similar to equation (3) using as dependent variable the errors de�ned in equations 16 (1) and (2). In particular, we run the following pooled LAD regression: ϵit = α + βBOi + γ0 Xit + γ1 Xit × BOi + uit where BO i is a dummy equal to one if the respondent is a boat owner and zero if she is the wife, Xit is the usual set of covariates and Xit × BOi is the interaction of each covariate with the boat owner dummy to assess differences in the impact of covariates by gender. Table 9 reports the results. The dependent variable in columns 1 and 2 is the absolute value of the error in earnings ϵV , in columns 3 and 4 it is the error ϵV , in column 5 it is the absolute it it value of the error in the timing of the investment purchase ϵT while in column 6 it is the error it in the timing of the investment ϵT . The coefficient β is negative and signi�cant in column 1, it suggesting that boat owners are more accurate than their wives. The coefficient β in column 2 is negative but no longer signi�cant once we control for the coefficient of variation of earnings using the administrative data “CV (V A )�. In columns 3 and 4, the coefficient β is again not signi�cant, leading to the conclusion that boat owners do not systematically lie to their spouses, since wives do not seem to underreport relative to boat owners. The linear term “Months elapsed since survey� is positive and signi�cant in columns 1 and 2, while its interaction with the boat owner dummy is negative and signi�cant. This suggests that the recall period worsens accuracy among wives but not among boat owners. Table 9 also reports the p-value of the t-test that the sum of each covariate in the regression and its interaction with the boat owner dummy is zero. The p-value for “Months elapsed since survey� in column 2 is 0.375. The quadratic term “Months elapsed2 since survey� is positive and signi�cant in both columns 1 and 2, but the interaction is not signi�cant, indicating that there is no gender difference. In columns 3 and 4, while the coefficient on “Months elapsed since survey� and its interaction with the boat owner dummy are not signi�cant, the coefficient on “Months elapsed2 since survey� is negative and signi�cant, suggesting that as the recall period increases, both the boat owners and their wives tend to underreport more. The interaction with the boat owner dummy is insigni�cant, suggesting again no systematic gender differences. In columns 5 and 6, the coefficient of the linear variable “Months elapsed since survey� is signif- icant but not its interaction with the boat owner dummy. The quadratic term and its interaction with the boat owner dummy are not signi�cant. In column 5, this is evidence that accuracy about the date when the boat was purchased worsens linearly with the recall period. In column 6, the result suggests that as the recall period increases, boat owners and their wives tend to report dates of purchase that are closer to the survey date than the actual dates. 4.1.1 Respondent characteristics Table 9 also reports the impact of personal characteristics (age, education and wealth) on the recall error. At �rst glance, columns 1 and 2 reveal an impact of characteristics among spouses, but not among boat owners. We now describe the impact of each characteristic in turn. 17 According to the literature, age and accuracy of recall are positively correlated. For example, in Mathiowetz and Duncan (1988), younger respondents display higher recall errors in the number of unemployment spells. Similarly, in Kennickell and Starr-McCluer (1997), older respondents report their wealth more accurately, perhaps because they are able to monitor their assets more carefully, thus offsetting the potentially negative effect of memory. Interestingly, we �nd in columns 1 and 2 that older wives are more accurate, but there is no such correlation for boat owners. In columns 3 to 6, we fail to �nd a correlation between age and accuracy for either boat owners or their wives. Concerning education, the literature suggests that more educated respondents are in gen- eral more likely to provide accurate answers (see for instance Ferber (1965), Lansing (1961) and Beckett et al. (2001)), although the reasons vary depending on the context. For example, Beckett et al. (2001) study outcomes ranging from fertility to migration, and argue that more educated women are more accurate because they are more conscious about their health. Studies of income reporting in developed countries suggest an alternative explanation for the positive correlation between ed- ucation and accuracy of recall. More educated individuals typically have formal, steady jobs with predictable and smooth income pro�les. Less educated individuals tend to be self-employed with higher income volatility. In this case, education is masking the underlying volatility of the income process that is driving accuracy. Consistent with the literature, in columns 1 and 2 we �nd that more educated wives are more accurate but there is no correlation among boat owners (the p-value is 0.527). The result is robust to changes to the de�nition of education, whether it is the number of years of schooling or an indicator for the ability to read and write. The coefficient on income volatility proxied by the coefficient of variation (CV) in columns 2 and 4 is positive and signi�cant, while the interaction of income volatility with the boat owner dummy is negative and signi�cant only in column 2. This indicates that wives in households with more volatile earnings are less accurate perhaps because volatile earnings are harder to recall (column 2) and that both boat owners and wives tend to overreport their earnings (column 4). Finally, we consider wealth. Ferber (1966) and Lansing (1961) �nd that richer respondents tend to remember more precisely, but poorer respondents may report more accurately their earnings from e �shing if it represents their only source of income. As mentioned, Gin´, Townsend and Vickery (2009) show that farmers whose income depends more heavily on the monsoon have more accurate expectations about its arrival, suggesting that individuals pay attention to the events about which they care most. Similarly, Kurbat, Shevell and Rips (1998) show that accuracy of recall improves with the relevance and emotional impact of the event to the respondent. We �nd wealth to be uncorrelated with accuracy of recall for both boat owners and wives. 4.2 Telescoping and time cues We investigate the presence of telescoping using the timing of when boat owners bought their �rst FRP boat. In this analysis, we use only data from auctioneers in one village because, as mentioned, we use as date of purchase the date of loan disbursement for the purchase of a boat and only auctioneers recorded the purpose of the loan. 18 Telescoping is an important type of recall error that consists in mistaking the timing of an event. In cognitive psychology, forward telescoping indicates the tendency to date remote events closer in time, while backward telescoping designates the opposite, dating earlier in time more re- cent events. The empirical evidence on telescoping in the psychological literature is also mixed. While Brown, Shevell and Rips (1986) associate forward telescoping with more vividly remem- bered (“accessible�) events, Kemp (1988) �nds it is greater for less well-remembered facts, and Thompson, Skowronski and Lee (1988) document that telescoping may occur for any of the events considered in their study. Figure 5 here Figure 5 plots on the y-axis the date of purchase reported in the 2007 survey by boat owners (left panel) and their wives (right panel), and on the x-axis the date recorded by the auctioneer. If the scatter points were perfectly aligned along the 45°solid line, then boat owners (wives) would correctly remember the exact month and year of the purchase, and therefore there would be no recall error and hence no telescoping. If instead the scatter points were above the 45°solid line, there would be forward telescoping, since respondents would date the event further in the future relative to when it actually happened. The dashed line interpolates the scatter points (circles) linearly and provides evidence of forward telescoping (above the 45°line) if the FRP boat was purchased more than 5 years ago (60 months) and of backward telescoping (below the 45°line) if it was purchased more recently. We also study differences in accuracy when respondents are provided with time cues. In partic- ular, respondents were asked to count the number of months before and after the following events (the time cue) took place: (i) the date the village priest was replaced, (ii) the date of local elections and (iii) the 2004 tsunami. In total we have six dates of FRP purchase for each respondent: one without time cues, two with a change of priest time cue (since there were two replacements from 2000 to 2007), two with an election time cue (there were elections in 2001 and 2006) and one with the tsunami time cue. The psychological literature suggests that time cues lower recall error, since providing time reference boundaries should improve precision. For instance, Conway and Bekerian (1987) and Dex (1995) �nd evidence that using other known events is an effective strategy to help respondents date a speci�c episode. Kurbat, Shevell and Rips (1998) argue that actually respondents may spontaneously use boundary references to help recall. Figure 5 shows that the dashed line, which interpolates the reports of the date of purchase without time cues (in circles) is closest to the 45°solid line. This suggests that asking directly the date of purchase yields the most accurate answers. By giving time cues that are important to the respondent (long dot-dashed line), enumerators are able to elicit answers (shown as squares) of similar quality. However, if the time cues are not important to the respondent, the answers (shown as triangles) are far worse (short dot-dashed line). To test the robustness of this descriptive 19 evidence, we run the following OLS regression: ∑ ∑ ϵT − ϵT cues = it it De + De ∗ BOi + Months elapsedi + ϵie where ϵT is the recall error without cues de�ned in equation (2), and ϵT cues is the error made when it it recall is aided by cues. The dummy De takes the value one when event e is used as a time cue and BOi is a dummy that takes the value one if the respondent is boat owner i (and not his wife). The variable Months elapsed captures the number of months elapsed between the time of the survey and the purchase of the boat. Table 10 reports the results. We �nd that boat owners and their spouses make signi�cantly larger recall errors when they receive as time cues the date of the replacement of the village priest. When the time cues are the local elections or the Tsunami, the recall errors are not statistically different from those made when no time cues are used. In a follow-up survey in 2009, boat owners and their wives were asked to rank the events used as time cues in order of importance. The replacement of priests systematically appeared in the last and next-to-last positions. In contrast, the tsunami and the elections ranked among the top. To conclude, time cues may be helpful to the extent that the event cued is relevant for the respondent, a �nding consistent with Conway and Bekerian (1987) who �nd that some time cues are better than others. When the event is not relevant, the recall error may compound mistakes in the exact timing of the cue with the recall error for the event. In our case, given that the replacement of the priest was not perceived by the respondent to be an important event, he or she wrongly remembered when such event occurred. 4.3 Position in the survey The last methodological issue we explore concerns the position of the recall questions in the ques- tionnaire. Two competing forces are potentially at play. First, the rapport with the enumerator improves throughout the interview, inducing the respondent to provide more accurate responses toward the end of the interview compared with the beginning (Sudman and Bradburn (1973)). Second, the level of fatigue and lack of concentration also increases during the interview, thus decreasing accuracy along the survey. In order to test whether rapport or fatigue dominates, we designed two versions of the questionnaire that only differ in the order of the recall questions. One version asks them toward the beginning while the other asks them toward the end. Because certain sections only appear in the boat owner survey and others in the wife survey, the recall questions in the wife survey were closer together in time. Couples were randomly assigned to receive one or the other version of the survey in a way that both the boat owner and his wife were administered the same version. The dummy variable Early in survey(Yes=1) takes the value one if the �shing module is asked at the beginning of the survey, and is included among the covariates in Table 9. In all speci�cations, the coefficient of Early in survey(Yes=1) and its interaction with the boat owner dummy are not 20 signi�cant and thus we conclude that the position of the question does not appear to influence the precision of recall. 5 Conclusions This paper uses a unique validation data set from a sample of self-employed households in southern coastal India. We compare unique hand-written administrative records with data from a cross- sectional survey collected in 2007 that asked retrospective information about earnings of up to 34 months prior to the survey as well as the timing of a large productive investment. We �nd that recall error in earnings increases with the recall period, in part because as the recall period increases respondents resort more and more to inference rather than memory. As the recall period increases, respondents tend to report the mean over past earnings in that month. This implies that the variance of the self-reported earnings distribution will be lower than the true one. One may worry that the recall period used here is too long, but the results show that the variance is signi�cantly lower by 7 percent even when the recall period is one year. This �nding has important implications for risk-sharing studies such as Meghir and Pistaferri (2004), because using survey data will understate the variability of income, and thus potentially overstate the ability of individuals to smooth consumption over the life cycle. Instead of using recall data to measure actual earnings, we can use them to simply measure whether earnings increased or decreased in a year relative to the previous year. According to the administrative data, 68 percent of boat owners experienced an increase in monthly earnings in 2007 relative to 2006. Using their self-reports, virtually all boat owners (98 percent) that experienced an increase correctly recalled it. Among boat owners that experienced a decrease in earnings, 75 percent correctly recalled it. Although boat owners are more accurate in recalling increases in yearly earnings, recall data appear appropriate for constructing income measures that may be related to life evaluation measures (Kahneman and Deaton (2010)). Related, we �nd that boat owners recall higher monthly earnings more accurately. As it turns out, the inaccuracy in low-earning months stems from the tendency for some boat owners to over- state the earnings for those months. This is consistent with the �nding in the literature that entrepreneurs tend to be optimistic, and has implications for the way people form expectations e (Mullainathan (2002)). In fact, Delavande, Gin´ and McKenzie (2011) use the same sample of boat owners to elicit expectations about future earnings and do �nd that some boat owners sys- tematically forecast higher average earnings. We also �nd that anchoring the timing of the event of interest to events in the village that are not relevant to the respondent can actually worsen accuracy as the recall error is inflated by inaccuracy about the timing of the cue. While we believe that these results are applicable to other populations, given that the sample is relatively small and exclusively engaged in a particular occupation, the results may be more relevant to other self-employed individuals with frequent cashflows, such as drivers or business owners, than 21 farmers or livestock herders who receive income rather infrequently during the year. In addition to marketing the catches on a daily basis, earnings from �shing may be more salient and thus a more familiar concept than earnings in other occupations because they are recorded in the presence of the boat owner. An advantage of this fact is that we are able to study the properties of recall error and abstract from other sources of measurement error. 22 6 Figures .15 .1 Fraction A S V V .05 0 0 20000 40000 60000 80000 100000 Value of catches Figure 1: Distribution of the value of catches: Administrative vs survey data 5 0 log(V ) - log(V ) A S -5 -10 02-05 06-12 13-18 19-24 >24 Figure 2: Distribution of recall error 23 1 Absolute value(log(V ) - log(V )) A .8 S .2 .4 0.6 0 10 20 30 40 Months elapsed Figure 3: Recall error over time 2 1 (log(V ) - log(V )), BO A 0 S -1 -2 -3 -3 -2 -1 0 1 2 S A (log(V ) - log(V )), Wife Figure 4: Are boat owners more accurate than their wives at recalling earnings? 24 200 200 150 150 100 100 T , Wife T , BO S S 50 50 0 0 45° line 45° line Without time cues Without time cues With cues for least important With cues for least important With cues for most important With cues for most important -50 -50 20 40 60 80 20 40 60 80 A A T T Figure 5: Telescoping and time cues. The squares (triangles) represent the time of the purchase when the most (least) important time cue is used to aid recall. The circles represent the time of the purchase without time cues. 25 7 Tables Table 1: Summary statistics N Mean SD Min Max Boat owner characteristics Age 239 42.08 9.15 19 62 Literate (1=Yes) 239 0.74 0.44 0 1 Correct recall of month with highest catches (1=Yes) 239 0.52 0.50 0 1 Correct recall of month with lowest catches (1=Yes) 239 0.26 0.44 0 1 Num. days worked in a month 2098 20.41 4.72 1 27 Wife characteristics Age 239 38.77 9.41 14 66 Literate (1=Yes) 239 0.84 0.37 0 1 Household characteristics Number of income earners 239 1.74 1.01 1 5 Household size 239 4.79 1.41 2 12 Wealth in 2006 (in 1000 Rs) 239 9.06 6.49 0 63 Pct. income from �shing 239 0.93 0.15 0 1 Survey characteristics Length of interview, BO (min) 238 134.21 18.00 55 211 Length of interview, Wife 239 135.24 13.16 103 191 Timing of recall section in minutes (early), BO 119 16.50 10.09 8 105 Timing of recall section in minutes (late), BO 119 81.87 14.50 45 125 Months elapsed since survey 2098 15.45 8.04 2 34 Earnings A log Vit 2098 9.68 0.73 6 11 S log Vit , BO 2098 9.62 0.93 0 11 S log Vit , Wife 1646 9.65 0.85 0 11 Timing of investment (FRP) TA 32 63.25 16.74 26 80 TS , BO 32 69.19 22.29 27 96 TS , Wife 32 69.75 22.45 27 96 Recall error ϵV , BO it 2098 -0.06 0.53 -10 2 ϵV , Wife it 1646 -0.06 0.54 -10 2 ϵT , BO it 32 5.94 13.14 -39 29 ϵT , Wife it 32 6.50 13.23 -39 30 Source: Administrative data from 2005 to 2007 and survey data collected in 2007. See Appendix Table A.1 for de�nitions of the variables. 26 Table 2: Correlations and reliability ratios Panel A: Earnings Corr(log V S , log V A ) Corr(ϵV , log V A ) Reliability ratio1 BO Wife BO Wife BO Wife Contemporaneous 0.98*** 0.97*** -0.02 -0.32*** All 0.96*** 0.93*** -0.05* -0.11*** 0.91 0.88 Panel B: Purchase FRP Corr(T S , T A ) Corr(ϵT , T A ) Reliability ratio2 BO Wife BO Wife BO Wife All 0.81*** 0.81*** 0.07 0.08 0.62 0.62 Note: 1 Because Corr(ϵV , log V A )̸= 0 the reliability ratio is computed using the formula under non classical measurement error. 2 In this case Corr(ϵT , T A ) = 0 and so the reliability ratio is computed using the formula under classical measurement error. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. Table 3: Recall error over time Dependent variable ϵV it ϵV it ϵV it ϵV it ϵT it ϵT it (1) (2) (3) (4) (5) (6) Months elapsed since survey 0.231*** -0.061 0.325*** -0.171* 0.242** 0.688 (0.016) (0.082) (0.041) (0.095) (0.114) (0.853) Months elapsed 2 since survey 0.011*** 0.018*** -0.004 (0.004) (0.004) (0.008) Pseudo-R2 0.033 0.037 0.062 0.075 0.189 0.209 Mean dep. var. 0.11 0.11 0.11 0.11 10.69 10.69 N 2098 2098 708 708 32 32 LAD regression. The regression includes the following regressors (not displayed): Early in the questionnaire(1=Yes), Age, Literate(1=Yes), Wealth in 2006. The results are robust to controlling for boat owner �xed effects. The bootstrap standard errors are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. In columns (1) to (4) “Months elapsed since survey� and “Months elapsed2 since survey� are divided by 100. Table 4: Inference vs recall I A VLast VA A VPeers A VNon Peers (1) (2) (3) (4) β∞ -0.097 0.734*** 0.748*** 0.630*** (0.499) (0.156) (0.196) (0.180) βA 0.992*** 1.000*** 0.994*** 0.995*** (0.006) (0.006) (0.006) (0.006) a 0.001 0.005*** 0.003*** 0.003*** (0.001) (0.002) (0.001) (0.001) Adj-R2 0.979 0.979 0.979 0.979 Mean dep. var. 19406 19406 19435 19406 N 2098 2098 1903 2098 Non linear least squares regression, based on equation (4). The dependent variable is V S . The standard errors clustered at the boat owner level are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. 27 Table 5: Inference vs recall II A VPeers A VNon Peers (1) (2) β∞ 0.797*** 0.744*** (0.148) (0.132) α 0.710 0.903* (0.371) (0.433) βA 0.997*** 1.000*** (0.006) (0.006) a 0.005** 0.005** (0.002) (0.002) Adj-R2 0.979 0.979 N 1903 2098 Non linear least squares regression results, based on equation (5). The dependent variable is V S . The standard errors clustered at the boat owner level are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. Table 6: Moments of the earnings distribution Mean Variance (1) (2) VA (1=Yes) 430*** -2569 (115) (2527) Trend -232*** -80 (27) (466) Trend × VA (1=Yes) -3 657* (13) (350) Constant 21075*** 121553*** (518) (8002) R2 0.034 0.001 Mean dep. var. 18650 123073 N 9510 9176 OLS regression. The standard errors clustered at the boat owner level are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. 28 Table 7: High Vs Low earnings I (1) (2) (3) Highest catches(1=Yes) 0.269*** 0.273*** 0.272*** (0.041) (0.041) (0.041) Pct. income from �shing 0.393*** 0.413*** (0.152) (0.156) Num. adults in family 0.028 (0.020) Num. �shermen in family 0.001 (0.029) Mean dep. var. 0.39 0.39 0.39 Pseudo-R2 0.078 0.091 0.087 N 478 478 478 Marginal probit effects regression. The dependent variable is a dummy that takes value 1 if the month with highest (lowest) earnings is correctly identi�ed. There are two observations per boat owner, one for highest and the other for lowest earnings. The regression includes the following regressors (not displayed): Early in the questionnaire (1=Yes), Age, Literate(1=Yes), Wealth in 2006. The results are robust to the inclusion of enumerator �xed effects. The standard errors clustered at the boat owner level are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. Table 8: High vs low earnings II (1) (2) (3) (4) Months elapsed since survey -0.059 -0.023 -0.087 -0.051 (0.077) (0.069) (0.092) (0.064) Months elapsed2 since survey 0.010*** 0.008*** 0.011*** 0.009*** (0.003) (0.003) (0.004) (0.003) Monthly earnings < median (1=Yes) 0.008*** 0.005* (0.003) (0.003) Daily earnings < median (1=Yes) 0.006** 0.009*** (0.002) (0.002) Num. days worked in a month -0.003*** -0.003*** (0.000) (0.000) Pseudo-R2 0.038 0.050 0.037 0.051 Mean dep. var. 0.11 0.11 0.11 0.11 N 2098 2098 2098 2098 LAD regression. The dependent variable is ϵV . The regression includes the following regressors (not displayed): it Early in the questionnaire(1=Yes), Age, Literate(1=Yes), Wealth in 2006. The results are robust to controlling for boat owner �xed effects. The bootstrap standard errors are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. In columns (1) to (4) “Months elapsed since survey� and “Months elapsed2 since survey� are divided by 100. 29 Table 9: Features of the survey and characteristics of the respondents Dependent variable |ϵV | it |ϵV | it ϵV it ϵV it |ϵT | it ϵT it (1) (2) (3) (4) (5) (6) BO(1=Yes) -0.042* -0.025 -0.005 -0.004 -6.489 -8.280 (0.024) (0.017) (0.008) (0.009) (14.480) (23.365) Months elapsed since survey 0.298* 0.320** 0.000 -0.001 0.229*** 0.205* (0.163) (0.128) (0.060) (0.048) (0.084) (0.106) Months elapsed since survey × BO -0.359* -0.398** 0.008 0.007 0.014 0.058 (0.186) (0.179) (0.066) (0.073) (0.143) (0.157) Months elapsed2 since survey 0.011** 0.011** -0.004* -0.004* (0.006) (0.004) (0.002) (0.002) Months elapsed2 since survey × BO -0.000 0.001 0.000 0.000 (0.007) (0.007) (0.002) (0.003) CV(VA ) 0.032** 0.017*** (0.013) (0.005) CV(VA ) × BO -0.028** -0.003 (0.014) (0.006) Early in the survey (1=Yes) 0.005 0.004 0.001 0.001 4.583 5.148 (0.005) (0.005) (0.002) (0.002) (3.616) (5.252) Early in the survey(1=Yes) × BO -0.001 -0.001 -0.004 -0.003 -1.250 -2.396 (0.005) (0.006) (0.003) (0.003) (4.691) (8.360) Age -0.001*** -0.001*** -0.000 -0.000 0.017 -0.034 (0.000) (0.000) (0.000) (0.000) (0.234) (0.335) Age × BO 0.001** 0.001*** 0.000* 0.000 0.137 0.153 (0.000) (0.000) (0.000) (0.000) (0.285) (0.460) Literate(1=Yes) -0.020*** -0.018*** -0.001 -0.000 -0.705 -0.882 (0.008) (0.007) (0.003) (0.002) (4.397) (3.999) Literate(1=Yes) × BO 0.021*** 0.020*** -0.005 -0.005 -0.080 -0.893 (0.007) (0.007) (0.003) (0.003) (8.025) (7.309) Wealth in 2006 (in Rs 1000) 0.000 0.000 0.000 0.000 -0.159 -0.154 (0.000) (0.000) (0.000) (0.000) (0.287) (0.349) Wealth in 2006 (in Rs 1000) × BO -0.001 -0.000 -0.000 -0.000 -0.050 -0.072 (0.000) (0.000) (0.000) (0.000) (0.383) (0.637) P-value of t-test Months elapsed + Months elapsed × BO = 0 0.375 0.888 0.051 0.079 Months elapsed2 + Months elapsed2 × BO = 0 0.003 0.029 CV(VA ) + CV(VA ) × BO = 0 0.405 0.000 Early in the survey + Early in the survey × BO = 0 0.078 0.096 0.231 0.692 Age + Age × BO = 0 0.909 0.085 0.351 0.758 Literate + Literate × BO = 0 0.527 0.013 0.902 0.784 Wealth + Wealth × BO = 0 0.395 0.283 0.358 0.712 Pseudo-R2 0.09 0.09 0.01 0.01 0.20 0.14 Mean dep. var. 0.141 0.141 -0.051 -0.051 10.813 10.813 N 3744 3744 3744 3744 64 64 LAD regression. The bootstrap standard errors are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. In columns (1) to (6) “Months elapsed since survey� and “Months elapsed2 since survey� are divided by 100. 30 Table 10: Time cues (purchase of FRP data) (1) (2) Start of parish priest (1=Yes) 45.451*** 24.592*** (3.752) (3.514) Start of parish priest (1=Yes) × BO (1=Yes) -41.719*** (2.388) Tamil Nadu state election (1=Yes) -2.591 -1.940 (3.752) (3.156) Tamil Nadu state election (1=Yes) × BO 1.302 (1=Yes) (2.388) Tsunami (1=Yes) -0.455 -0.435 (4.115) (2.793) Tsunami (1=Yes) × BO (1=Yes) 0.042 (3.378) Months elapsed since survey 0.057 0.057 (0.052) (0.051) R2 0.648 0.421 Mean dep. var. 12.64 12.64 N 480 480 OLS regression. The dependent variable is ϵT − ϵT cues . The standard errors clustered at the boat owner level are it it reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. 31 A Appendix Table A.1: De�nition of variables Time elapsed between the event the respondent is asked to recall and the Recall period time of the survey. Dummy variable if respondent reports being able to read a newspaper Literate(1=Yes) and write a letter. Value of savings and durable goods owned in the households in 2006 Wealth in 2006 expressed in thousands of rupees. Percentage of income from sales of own catches. Total income includes also monthly wages from working in �shing and non-�shing sectors, Pct. income from �shing bene�ts from government schemes, migration income, dowry income, pension, and gambling. It is a linear function of time that takes the value 1 when 2 to 13 months Trend elapse since the time of the survey, value 2 when 3 to 14 months elapse, etc. VA Value of earnings (catches) from administrative data. V S Value of earnings (catches) reported in the survey. Months elapsed between the time of the survey and that of the purchase TA of the FRP according to administrative records. Months elapsed between the time of the survey and that of the purchase TS of the FRP according to survey responses. Any missing Total number of missing income observations, for any possible reason. Total number of missing income observations because the boat owner did Missing, BO did not work not go �shing that month. Total number of missing income observations because either the Missing, other reasons administrative or the survey record is missing. 32 Table A.2: Attrition Any missing Missing Missing BO did not work other reasons (1) (2) (3) Sex (1=BO) -1.965*** -0.012 -1.953*** (0.225) (0.051) (0.225) Age 0.000 -0.001 0.001 (0.011) (0.003) (0.011) Literate (1=Yes) -0.424 -0.027 -0.398 (0.278) (0.072) (0.278) Wealth in 2006 0.002 0.003 -0.001 (0.016) (0.003) (0.016) A V2005 -0.000 -0.000*** -0.000 (0.000) (0.000) (0.000) Village F.E. Yes Yes Yes R2 0.842 0.972 0.618 Mean dep. var. 4.13 2.07 2.05 N 470 470 470 OLS regression. The dependent variable is the Number of missing Vit . The standard errors are clustered at individual level and reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. Table A.3: Recall error before vs after one year Before After (1) (2) (3) (4) Months elapsed since survey 0.159*** 0.184 0.373*** -0.601* (0.045) (0.209) (0.055) (0.348) Months elapsed2 since survey -0.002 0.025*** (0.016) (0.009) Pseudo-R 2 0.014 0.014 0.019 0.021 Mean dep. var. 0.04 0.04 0.15 0.15 N 725 725 1373 1373 LAD regression. The dependent variable is ϵV . The regression includes the following regressors (not displayed): it Early in the questionnaire(1=Yes), Age, Literate(1=Yes), Wealth in 2006. The results are robust to controlling for boat owner �xed effects. The bootstrap standard errors are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. In columns (1) and (4) “Months elapsed since survey� and “Months elapsed2 since survey� are divided by 100. 33 Table A.4: High vs low earnings II: Robustness checks (1) (2) (3) (4) (5) (6) Panel A: Median Months elapsed since survey -0.023 -0.051 -0.021 -0.065 -0.055 -0.077 (0.069) (0.064) (0.061) (0.067) (0.049) (0.060) Months elapsed2 since survey 0.008*** 0.009*** 0.008*** 0.009*** 0.009*** 0.009*** (0.003) (0.003) (0.003) (0.003) (0.002) (0.003) Monthly earnings < median (1=Yes) 0.005* 0.004 0.003 (0.003) (0.003) (0.002) Daily earnings < median (1=Yes) 0.009*** 0.009*** 0.008*** (0.002) (0.002) (0.002) Num. days worked in a month -0.003*** -0.003*** -0.003*** -0.003*** -0.004*** -0.004*** (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) Pseudo-R2 0.11 0.11 0.11 0.11 0.11 0.11 Panel B: Median + SD Months elapsed since survey -0.081 -0.116* -0.083 -0.095* -0.093* -0.106** (0.060) (0.063) (0.062) (0.055) (0.056) (0.050) Months elapsed2 since survey 0.010*** 0.011*** 0.010*** 0.011*** 0.010*** 0.011*** (0.003) (0.003) (0.003) (0.002) (0.002) (0.002) Monthly warnings < median + SD (1=Yes) 0.010*** 0.010*** 0.008*** (0.001) (0.002) (0.002) Daily warnings < median + SD (1=Yes) 0.012*** 0.012*** 0.011*** (0.002) (0.002) (0.002) Num. days worked in a month -0.003*** -0.003*** -0.003*** -0.003*** -0.003*** -0.003*** (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) Pseudo-R2 0.052 0.053 0.053 0.054 0.055 0.057 Enumerator F.E. No No Yes Yes No No Village F.E. No No No No Yes Yes Mean dep. var. 0.11 0.11 0.11 0.11 0.11 0.11 N 2098 2098 2098 2098 2098 2098 LAD regression. The dependent variable is the absolute value of ϵit . The regressions include the following regressors (not displayed): Early in the questionnaire(1=Yes), Age, Literate(1=Yes), Wealth in 2006. The bootstrap standard errors are reported in brackets. The symbols ***,**,* represent signi�cance at the 1%, 5%, and 10% level. In columns (1) to (6) “Months elapsed since survey� and “Months elapsed2 since survey� are divided by 100. 34 References Akee, R.K.Q. 2011. “Errors in Self-Reported Earnings: The Role of Previous Earnings Volatility.� Journal of Development Economics, 97(2): 409–421. Angrist, J.D., and A.B. Krueger. 1999. “Empirical strategies in labor economics.� Handbook of Labor Economics, 3: 1277–1366. Baddeley, A.D., V. Lewis, and I. Nimmo-Smith. 1978. “When did you last ... ?� Practical Aspects of Memory, 77–83. Beckett, M., J. Da Vanzo, N. Sastry, C. Panis, and C. Peterson. 2001. “The quality of retrospective data: An examination of long-term recall in a developing country.� Journal of Human Resources, 36(3): 593–625. Beegle, K., C. Carletto, and K. Himelein. 2012. “Reliability of recall in agricultural data.� Journal of Development Economics, 98(1): 34–41. Beegle, K., J. De Weerdt, J. Friedman, and J. Gibson. 2012. “Methods of household consumption measurement through surveys: Experimental results from Tanzania.� Journal of Development Economics, 98(1): 3–18. Bound, J., and A.B. Krueger. 1991. “The Extent of Measurement Error in Longitudinal Earn- ings Data: Do Two Wrongs Make a Right?� Journal of Labor Economics, 9(1): 1–24. Bound, J., C. Brown, and N. Mathiowetz. 2001. “Measurement error in survey data.� Hand- book of Econometrics, 5: 3705–3843. Bound, J., C. Brown, G.J. Duncan, and W.L. Rodgers. 1994. “Evidence on the validity of cross-sectional and longitudinal labor market data.� Journal of Labor Economics, 12(3): 345–368. Bound, J., C.C. Brown, G. Duncan, and W.L. Rodgers. 1989. “Measurement error in cross- sectional and longitudinal labor market surveys: Results from two validation studies.� NBER Working Papers, W2884. Bradburn, N.M., L.J. Rips, and S.K. Shevell. 1987. “Answering autobiographical questions: The impact of memory and inference on surveys.� Science, 236(4798): 157. Brown, N.R., S.K. Shevell, and L.J. Rips. 1986. Public memories and their personal context. Cambridge University Press. Burt, C.D.B. 1992. “Reconstruction of the duration of autobiographical events.� Memory & Cognition, 20(2): 124–132. Conway, M.A., and D.A. Bekerian. 1987. “Organization in autobiographical memory.� Memory & Cognition, 15(2): 119. 35 a Das, J., J. Hammer, and C. S´nchez-Paramo. 2012. “The impact of recall periods on reported morbidity and health seeking behavior.� Journal of Development Economics, 98(1): 76–88. e Delavande, A., X. Gin´, and D. McKenzie. 2011. “Measuring subjective expectations in developing countries: A critical review and new evidence.� Journal of Development Economics, 94(2): 151–163. De Mel, S., D.J. McKenzie, and C. Woodruff. 2009. “Measuring microenterprise pro�ts: Must we ask how the sausage is made?� Journal of Development Economics, 88(1): 19–31. Dex, S. 1995. “The reliability of recall data: A literature review.� Bulletin de Methodologie Soci- ologique, 49(1): 58. Duncan, G.J., and D.H. Hill. 1985. “An investigation of the extent and consequences of mea- surement error in labor-economic survey data.� Journal of Labor Economics, 3(4): 508–532. Ferber, R. 1965. “The reliability of consumer surveys of �nancial holdings: Time deposits.� Journal of the American Statistical Association, 60(309): 148–163. Ferber, R. 1966. The reliability of consumer reports of �nancial assets and debts. Bureau of Economic and Business Research, University of Illinois. e Gin´, X., R.M. Townsend, and J. Vickery. 2009. “Forecasting when it matters: Evidence from Semi-Arid India.� Holmes, D.S. 1970. “Differential change in affective intensity and the forgetting of unpleasant personal experiences.� Journal of Personality and Social Psychology, 15(3): 234. Kahneman, D., and A. Deaton. 2010. “High income improves evaluation of life but not emo- tional well-being.� PNAS, 107(38): 16489–16493. Kemp, S. 1988. “Dating recent and historical events.� Applied Cognitive Psychology, 2(3): 181–188. Kennickell, A.B., and M. Starr-McCluer. 1997. “Retrospective reporting of household wealth: Evidence from the 1983-89 Survey of Consumer Finances.� Journal of Business and Economic Statistics, 15(4): 452–463. Kurbat, M.A., S.K. Shevell, and L.J. Rips. 1998. “A year’s memories: The calendar effect in autobiographical recall.� Memory & Cognition, 26(3): 532. Lansing, J.B. 1961. An investigation of response error. Bureau of Economic and Business Re- search, University of Illinois. Larsen, S.F. 1988. Remembering without experiencing: Memory for reported events. Cambridge University Press. Linton, M. 1975. “Memory for real-world events.� Explorations in Cognition, 376–404. 36 Mathiowetz, N.A., and G.J. Duncan. 1988. “Out of work, out of mind: Response errors in retrospective reports of unemployment.� Journal of Business & Economic Statistics, 6(2): 221– 229. McFadden, D.L., A.C. Bemmaor, F.G. Caro, J. Dominitz, B.H. Jun, A. Lewbel, R.L. Matzkin, F. Molinari, N. Schwarz, R.J. Willis, et al. 2005. “Statistical analysis of choice experiments and surveys.� Marketing Letters, 16(3): 183–196. Meghir, C., and L. Pistaferri. 2004. “Income Variance Dynamics and Heterogeneity.� Econo- metrica, 72(1): 1–32. Mullainathan, S. 2002. “A Memory-Based Model of Bounded Rationality.� Quarterly Journal of Economics, 117(3): 735–774. OECD. 2009. “Self-employment.� OECD Factbook 2009: Economic, Environmental and Social Statistics. Pischke, J.S. 1995. “Measurement error and earnings dynamics: Some estimates from the PSID validation study.� Journal of Business and Economic Statistics, 13(3): 305–314. Poterba, J.M., and L.H. Summers. 1986. “Reporting errors and labor market dynamics.� Econometrica, 54(6): 1319–1338. Rubin, D.C., and A.D. Baddeley. 1989. “Telescoping is not time compression: A model.� Memory & Cognition, 17(6): 653–661. Skowronski, J.J., A.L. Betz, C.P. Thompson, and L. Shannon. 1991. “Social memory in ev- eryday life: Recall of self-events and other-events.� Journal of Personality and Social Psychology, 60(6): 831–843. Sudman, S., and N.M. Bradburn. 1973. “Effects of time and memory factors on response in surveys.� Journal of the American Statistical Association, 68(344): 805–815. Thompson, C.P. 1996. Autobiographical memory: Remembering what and remembering when. Lawrence Erlbaum. Thompson, C.P., J.J. Skowronski, and D.J. Lee. 1988. “Telescoping in dating naturally occurring events.� Memory & Cognition, 16(5): 461–468. Tourangeau, R. 2000. “Remembering what happened: Memory errors and survey reports.� The science of self-report: Implications for research and practice, , ed. A.A. Stone, J.S. Turkkan, C.A. Bachrach, J.B. Jobe, H.S. Kurtzman and V.S. Cain Vol. 29, 47. Psychology Press. Tourangeau, R., L.J. Rips, and K.A. Rasinski. 2000. The psychology of survey response. Cambridge University Press. 37 Tversky, A., and D. Kahneman. 1991. “Loss aversion in riskless choice: A reference-dependent model.� The Quarterly Journal of Economics, 106(4): 1039–1061. Wagenaar, W.A. 1986. “My memory: A study of autobiographical memory over six years.� Cognitive Psychology, 18(2): 225–252. 38