WPS3932 WELFARE MEASUREMENT BIAS IN HOUSEHOLD AND ON-SITE SURVEYING OF WATER-BASED RECREATION: AN APPLICATION TO LAKE SEVAN, ARMENIA Craig Meisner and Hua Wang Development Research Group The World Bank and Benoît Laplante Independent Consultant, Montreal, Canada Keywords ­ On and off-site sampling, recreation demand, zero-inflated models, truncated count data models, endogenous stratification, Armenia. World Bank Policy Research Working Paper 3932, June 2006 The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent. Policy Research Working Papers are available online at http://econ.worldbank.org. Correspondence should be addressed to: Craig Meisner, MC2-205, World Bank, 1818 H Street, NW, Washington, DC 20433, cmeisner@worldbank.org. I. Introduction Several recent travel cost studies have aimed to compare recreational benefits derived from household and on-site surveys (e.g. Loomis, 2003; Shaw, 2003). If it can be shown that welfare estimates derived from cost-effective on-site surveying techniques are similar to household survey results, this may justify using on-site surveys in lieu of large and costly population-based surveys. However, a robust comparison of estimates obtained from each sample requires addressing a number of important statistical issues. In particular, household survey demand is typically censored due to the possibility of observing a large number of zeros (or non-users of the site). Simply treating all zeros in the sample as users of the site introduces an upward bias of the demand and welfare measures. On the other hand, on-site sample demand is truncated at one since it surveys only users at the site. In this case, estimates are prone to higher standard errors and an upward bias from over-sampling individuals whose characteristics may be correlated with higher trip frequencies (endogenous stratification - ES). In the case of household surveys, it is possible to resolve the issue by separating the recreation `participation' decision from the trip `quantity' decision, thus reducing the bias introduced by non-users of the site. In the case of on-site surveys, it is possible to correct for the potential bias by providing adjustments to the distribution function (Shaw, 1988; Englin and Shonkwiler, 1995). To our knowledge, none of the existing travel cost studies have attempted to correct for both biases when conducting comparative analyses of estimates obtained from household and on-site surveys.1 In this paper, we test the proposition of whether the household and on-site demand estimation yield similar welfare measures, after accounting for both biases discussed above. For this purpose, we use a household and on-site survey conducted at Lake Sevan, Armenia. This single-site comparison has two advantages. First, as the site is unique, we avoid problems of having to incorporate substitute sites into the decision to 1Loomis (2003) does not discuss the prevalence of zeros in his comparative household sample, and does not consider their relative influence on expected trip demand or welfare. 1 recreate. Second, since we are not valuing a change in the quality of the lake, we also avoid any quality change impacts on expected trip demand. The household survey consisted of 3,358 households across Armenia, and the on- site survey of 389 tourists recreating at Lake Sevan. Travel cost models were constructed and estimated using travel expenditure and socio-demographic information contained in each survey. As visitation rates in the household survey contained a large percentage of zeros and the presence of over-dispersion in trip frequency, a zero-inflated negative binomial model (ZINB) was estimated. For the on-site survey, two truncated negative binomial models were estimated with and without an adjustment for endogenous stratification (ES). Likelihood ratio tests for over-dispersion were rejected in favor of the negative binomial specification in both the household and on-site models. Results from the household model also reveal that the participation decision is indeed relevant to the household's recreation decision. However, in the case of the on-site sample, estimated coefficients for the ES and non-ES models were not significantly different. This may suggest that characteristics from the on-site sample are representative of the household sample. Other studies have found similar results where accounting for ES did not yield any significant differences in trip demand or welfare (Ovaskainen et al., 2001; Englin et al., 2003). Per trip consumers surplus was estimated to be $8.82 for the household sample, $8.73 for the on-site model without ES adjustment, and $8.21 with ES. The remainder of this paper is structured as follows. The next section provides a description of travel cost and count data models utilized in this study along with recommendations of how to remedy several dependent variable issues typically encountered with household and on-site recreational surveys. In Section III, the two surveys are described in more detail. In Section IV, the results of estimation are presented, along with a comparison in expected trip demand and estimated welfare measures. Section V provides a brief summary and discussion of the findings. 2 II. Travel Cost Modeling In travel cost modeling, the decision to recreate is typically modeled as a latent demand, yi , representing the number of trips taken in one year as a function of travel cost * (P), site quality attributes (Z) and individual demographic characteristics (X): Tripsi = yi = f (Pi, Xi, Zi) + i * i = 1, 2,..., N (1) Travel cost-modeling (TCM) can be implemented through household or on-site surveys. However, each sampling method involves a number of different statistical issues. (i) Household survey An important modeling issue when applying TCM pertains to the treatment of non-negative integers observed in individual recreational data, as one may encounter a large proportion of zeros in a general household survey (Shaw, 1988; Grogger and Carson, 1991; Hellerstein, 1991). Observing a zero implies that the services from the site do not enter into the utility function of the individual. In the utility maximization framework, it implies that the individual is currently at some choke price where he is consuming zero trips, and that if the current "market" price were to fall below the choke price, the individual would demand a positive number of trips. However, one may also observe a zero if for some reason (such as age, health-related reasons, etc.) services from the site would never enter an individuals' utility function (Habb and McConnell, 1996). Thus, there is an important distinction between observing zeros for those who are participants and for those who are non-participants (`true zeros'). Standard count data models such as the Poisson or negative binomial assume that all individuals surveyed are potential users of the good in question, and that the same variables influence all potential users similarly. In the presence of a large number of zeros, and where the participation question is relevant, this assumption may not be valid and should be tested for its significance. 3 To account for the participation issue, we consider two augmented count data models which account for the presence of a large number of zeros - the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) (Mullahy, 1986; Lambert, 1992; Greene, 1994; Haab and McConnell, 1996). By distinguishing between participants and non-participants, the zero observations may contain valuable information, and a gain in efficiency will be achieved by including all of the observations (Haab and McConnell, pg. 90).2 Empirically, zero-inflated count models change the mean structure to allow zeros to be generated by two distinct processes, one for the participation decision (logit or probit) and one for the mean number of trips (count model).3 By expanding the standard count model to allow for individual-specific characteristics which may keep an individual from entering the recreation market, one can separate factors which influence the participation issue from those that influence the quantity of trips taken to a recreation site (Haab and McConnell, 1996). In estimation, the ZIP model allows for over-dispersion in the Poisson data generating process by allowing a mass of zero observations independent of the true Poisson process. The distribution function for the ZIP model is: Pi + (1- Pi)e- i if yi = 0, Pr(yi | xi) = (1- Pi ) e- i i yi otherwise. (2) yi! where E(yi) = (1 - Pi)i, Var(yi) = (1 - Pi)(1 + Pii)i, and Pi is the probability of zero visitation, with mean i = exp(xi). Note that in this formulation, zeros can occur in either the binomial process (when yi = 0) or the Poisson process (when yi 1), since exp(- i)i /0! = exp(-i). Again, i can be modeled as exp(xi), and Pi as g(zi), where is a 0 vector of participation-decision parameters and zi is a vector of explanatory variables that may or may not be the same as those for the quantity decision, xi. The function g(·) can be modeled using either logit or probit (or cumulative standard normal) function as they 2In the past, one crude option was simply to drop the zeros from the sample. 3The zero-inflated models differ from the Heckman continuous two-stage model as they allow for zero observations in the second stage of the decision process (in the mean model). 4 both give similar results. In the presence of over-dispersion4 (variance>mean), the participation decision can be similarly decomposed in a zero-inflated negative binomial model as: Pi + (1- Pi)1+ 1 1 i if yi = 0, Pr(yi | xi) = 1 (1- Pi ) (yi +1)(1 )1+i (yi +1 ) 1 1 i yi otherwise. (3) + i where E(yi) = (1 - Pi)i and Var(yi) = (1 - Pi)[1 + i( + Pi)]i. The presence of the parameter in the calculation of the conditional variance of y (if greater than 0), guarantees that the variance is greater than the mean. As 0, the moments of the distribution converge to a Poisson distribution and so testing for =0 provides a case for selecting the negative binomial over the Poisson, and indirectly for the presence of over-dispersion. The flexibility of modeling the participation decision in this manner has lead to a number of interesting applications in recreational demand analysis, including beach trips (Shonkwiler and Shaw, 1996; Haab and McConnell, 1996), rock climbing (Shaw and Jakus, 1996), lake recreation, (Gurmu and Trivedi, 1996), water-based recreation (Curtis, 2003), and angling site choice (Scrogin et al., 2004). (ii) On-site sampling Interview surveys conducted on-site obviously avoid the non-participation issue, but as the dependent variable yi is strictly non-zero, the truncated demand relationship 4An undesirable feature of Poisson count models is the assumption that the conditional mean and variance are equal (Yen and Adamowicz, 1993). This is especially problematic in empirical research because conditional variances are typically greater than conditional means in socio-economic data (also known as over-dispersion, a form of heteroskedasticity). The presence of over-dispersion still allows for consistently estimated means of parameter estimates (Gourieroux et al. 1984), but causes the standard errors of these estimates to be biased downward, resulting in erroneous tests of their statistical significance (Cameron and Trivedi, 1986). The equality of the mean and the variance property of Poisson count models led to the development of negative binomial models (Hausman et al., 1984). This model allows for over-dispersion by combining the Poisson distribution with a gamma distribution and hence allowing for heterogeneity to be gamma distributed. 5 measures only those with smaller error terms. In addition, because the sample is on-site, there is a higher likelihood of intercepting a person whose characteristics are correlated with higher trip frequencies, or what is known as `endogenous stratification' in sampling. The implication is that the sample is not representative of the population at large, and in measuring welfare effects, consumers surplus estimates will be biased upwards as it is only capturing the effect of avid recreationists. Truncation and endogenous stratification was first explored by Shaw (1988) in the case of the Poisson distribution and extended by Englin and Shonkwiler (1995) to the negative binomial distribution. The basic implication is to weight individual observations by the inverse of the expected value of trips. Assuming that the density function of the ith person in the population is f(yi*|xi), Shaw (1988) shows that the density function of the same person in the on-site population is: Pr(yi | xi) = yi f (yi | xi ) (4) t f (t | xi) t=1 If the conditional density f(yi*| xi) is chosen to be Poisson with the location parameter i, then the on-site sample's density function is: Pr(yi | xi) =e- i i yi -1 (5) (yi -1)! where E(yi | xi) = i + 1 and Var(yi | xi) = i. Defining wi = yi - 1, the standard Poisson model can be estimated, substituting wi for yi in (5) above. In the presence of over-dispersion, the equality of the mean and variance is violated and thus the negative binomial model is preferred with the following density function (Englin and Shonkwiler, 1995): 1 Pr(yi | xi) =(yi +1)(1 )1+i yi(yi +1 ) 1 1+i yi yii-1 (6) 6 where E(yi | xi) = i + 1 + ii and Var(yi | xi) = i(1 + i + ii + i i). As the 2 specification in (6) cannot be transformed into any simpler form as in the case of the truncated Poisson, the likelihood function must be programmed directly into a likelihood maximization routine. The log likelihood function used in this context is:5 ln L = y N i=1 ln yi + ln((yi +1/ )) - ln((yi +1)) - ln((1/ )) + (7) iln + (yi -1)lni - (yi +1/ )ln(1+ i) Defining i as the expected number of person-day-trips6 individual i takes to the site in a year, the empirical demand relationship can be defined as: i = exp(Xi + i) = exp(ppi + xi + i) i = 1,...,n (8) where is a K x 1 vector of parameters, Xi is a 1 x K vector of explanatory variables for individual i, pi is the travel cost for individual i to the site, xi is the 1 x K ­1 vector of explanatory variables after pi is subtracted from Xi, p is the parameter on travel cost, and is the remaining vector of parameters corresponding to xi. (ii) Welfare measures The benefit (consumer surplus) of access to the site is defined as the area under the estimated Marshallian demand curve specified in (8) and above the current price level. By integrating the demand function over travel costs (prices) faced by individuals, we calculate expected consumers surplus as: E (CSi) = i dP = - i / p (9) where i is as defined in (8) and p is the estimated parameter on travel cost. Summed across all i, the area measures the total per trip willingness-to-pay by all individuals to recreate at the site. In the case of the ZINB model expected consumers surplus must be weighted by the probability of zero visitation (1 - Pi), where Pi is a function of variables 5 The likelihood function in (7) was entered into a modified zero-truncated negative binomial maximum likelihood routine provided by Hilbe (1999). 6 Person-day-trips were defined as the number of trips taken by the respondent in one year. All cost information was then divided by the number of days to form per-day trip costs. 7 that affect the participation decision. Compensating and equivalent variation measures can also be calculated from the expenditure function implied by the Marshallian demand relationship specified above. From a welfare perspective, CV and EV may be of interest as measures of potential compensation from those who degrade the resource. Table 1 summarizes the welfare measures used in the analysis. Table 1: Welfare measures Model Consumers Compensating Equivalent surplus variation variation Household sample: _ 1 ln1+ 1 Negative binomial i i - = -eX p p i p - ln1- i p _ i 1 i Zero-inflated negative binomial -(1- P) eX 1 p (1- P) i ln1- p -(1- P)i ln1+ p On-site sample: Trunc. negative binomial/ _ 1 ln1+ 1 Trunc. negative binomial - = -eX i i w/endogenous stratification p p i p - ln1- i p _ _ Note: = exp ( X ) from equation (8), where X represents the sample means; i is the coefficient on income. III. Application to Lake Sevan, Armenia Lake Sevan is the largest high altitude reservoir of freshwater in the Transcaucasus, and is one of the highest lakes in the world. However, over the course of last 50 years, the level of the lake has dropped by 18 m, its surface area has decreased by 15%, and the volume of water in Lake Sevan fell by more than 40% (from 58.5 to 34.6 km3). These changes had various significant adverse impacts on Lake Sevan's ecology. As it is located only 70 km away from the capital city Yerevan, Lake Sevan is the preferred and most accessible recreational site of most Armenians. The Government of Armenia has been working on a Lake Sevan protection action plan. The objectives under consideration by the Government of Armenia include preventing a further lowering of the level of Lake Sevan, and raising the level of the lake by at least 3 meters as quickly as possible. However to date, there has not been a 8 thorough measurement of the current recreational benefits to include in benefit-cost analysis. Welfare measurement would be useful to policymakers tasked with weighing the alternative options of restoring Lake Sevan. Our model and welfare comparison is also useful in this context as Lake Sevan is a single site, with no substitutes, so comparing the two samples is not confounded by alternative sites that may enter into an individuals' water-based recreation decision. Also, since we are measuring current recreational benefits, we avoid having to predict what the impact improvements would have on expected trip demand. To estimate benefits by the general population and users of the site, two surveys were conducted ­ one comprising of 3,358 households across Armenia and the other an interceptor survey of 389 on-site tourists recreating at Lake Sevan.7 Both were conducted in the year 2000, with the tourist survey during the summer to better capture the high season of annual recreational use at the lake. The household sample was selected and stratified by the 1996 Population Census of Armenia, while the on-site survey relied on tourist interception at the lake. Annual visitation to Lake Sevan by these two groups is reported in Table 2. Household survey responses indicate that nearly 75% did not visit the lake in the past year, with a sample mean of 0.81 day-trips. The tourist survey, obviously truncated at one as interviews took place at the lake, averaged 3.17 day-trips per year. The average person from the household survey was 44 years old, earned the equivalent of 1,383 USD per annum, had 10 years of formal education, and a household size of 4. The average person from the on-site survey was 36 years old, earned $2,933 USD per annum, had 10 years of education and a household size of 5 (see Appendix I for details). In Table 2 we also note that the standard deviation of visitation in each sample exceeds its mean, thus we suspect the presence of over-dispersion, and therefore formally test the 7 The detailed questionnaires included six major parts: (1) environmental attitudes and perceptions; (2) a Lake Sevan action plan for restoration; (3) contingent valuation questions; (4) socio-economic characteristics; (5) recreational use of Lake Sevan; and (6) interview debriefing questions. For the purposes of this paper, only sections (4) and (5) are used. 9 negative binomial counterpart of the Poisson distribution. In addition, given the large number of zeros in the household survey, this leads us to formally test the use of the zero- inflated negative binomial model for the household survey. Table 2: Frequency of visitation Household Tourist Person-day-trips frequency Percent frequency Percent 0 2516 74.93 0 0.00 1 455 13.55 185 47.56 2 152 4.53 94 24.16 3 84 2.50 41 10.54 4 30 0.89 25 6.43 5 37 1.10 14 3.60 6 12 0.36 5 1.29 7 7 0.21 0 0.00 8 5 0.15 0 0.00 9 0 0.00 0 0.00 10 26 0.77 5 1.29 10 to 15 12 0.36 6 1.54 15 to 20 10 0.30 6 1.54 20 to 30 3 0.09 4 1.03 30 to 40 3 0.09 2 0.51 40 to 50 1 0.03 2 0.51 50 to 100 5 0.15 0 0.00 Total 3358 100.00 389 100.00 Mean 0.81 3.17 Standard deviation 3.95 5.75 IV. Estimation Results (i) Determinants of visitation The household sample was initially modeled using the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB). The on- site sample was modeled using the truncated Poisson, truncated negative binomial (TRNB) and the truncated negative binomial with endogenous stratification (TRNBES). Comparative tests between each model were performed and are reported below. For brevity, only the estimation results for the household (NB and ZINB) and on-site models 10 (TRNB and TRNBES) are reported in Table 3 with marginal effects for the ZINB and TRNBES models listed in Table 4. From the empirical demand relationship in equation (8), we model the participation and trip quantity decisions using travel cost and several individual-specific variables that may co-vary with each decision - income, age, household size, education, and a Yerevan city dummy.8 Travel costs included: (1) transport costs; (2) on-site costs (per day); and (3) the value of time traveling to and at Lake Sevan. The value of time was elicited from the respondent by asking them how much they would have earned had they not traveled to Lake Sevan. This amount was then divided by the number of days they were at the lake to arrive at a trip-per-day cost. Note that for the household model, each equation (logit and mean) contain the same explanatory variables as they may contribute to either of the participation or quantity decisions. Beginning with the household survey results in the second and third columns of Table 3, we note that the likelihood ratio (LR) test of = 0 is rejected indicating the significance of over-dispersion and thus the selection of the negative binomial specification over the Poisson. A further formal specification test between the NB and ZINB is possible (Vuong, 1989). The test statistic is directional and distributed standard normal and for values |V| > 1.96, the zero-inflated version is supported. With a value of 4.86, the ZINB specification is favored over the NB. Parameter estimates of the household ZINB model reveal that income, age and education, along with respondents who reside in Yerevan significantly determine the household participation decision to recreate at Lake Sevan (see logit inflation model). The coefficients are interpreted relative to observing a zero count, thus the positive coefficient on age implies that older respondents are more likely to record zero participation, whereas individuals with higher income or education are less likely to report zero trips to Lake Sevan. Those who reside in Yerevan city are also more likely to 8A dummy variable to capture previous visitation to the lake was also initially considered for each model, however, over 94% of respondents in the household survey and over 95% in the tourist survey visited Lake Sevan at least once in the past three years (and thus insufficient statistical variation). 11 report zero visitation in the past year. Among those who do choose to participate (see mean model), increases in income and household size increase trip demand, while increases in travel costs and education decrease trip demand. For the on-site survey, first an LR test between a truncated Poisson and truncated negative binomial (TRNB) was rejected indicating that over-dispersion in visitation is significant, leading to us to favor the TRNB specification. Second, the TRNBES model was estimated to see whether higher trip frequencies have any systematic association with an individual's characteristics. Estimation results for both TRNB and TRNBES show that increases in travel costs, age and education decrease visitation, whereas increases household size increase trip demand. In the TRNB model, estimated coefficients and standard errors are higher leading to a lower significance across each explanatory variable. By correcting for ES, the magnitude of estimated coefficients falls, and standard errors fall by a greater extent such that significance rises among the major determinants of visitation. In the next section, we explore the consequences of these differences on expected trip demand as well as the implications on welfare estimates. 12 Table 3: Household and on-site model estimates of visitation to Lake Sevan Variable HH: NB HH: ZINB On-site: TRNB On-site: TRNBES Mean model Travel costs -0.0256*** -0.0153*** -0.0521*** -0.0519*** (-5.41) (-3.46) (-3.37) (-4.79) Income 0.00035*** 0.00015*** 0.000040 0.000013 (7.54) (3.63) (0.60) (0.32) Age -0.0233*** 0.0035 -0.0313*** -0.0263*** (-6.36) (0.78) (-3.45) (-4.58) Household size 0.1219*** 0.0974*** 0.2969*** 0.2711*** (4.02) (2.64) (3.57) (5.26) Education -0.0094 -0.0686*** -0.0912* -0.0926*** (-0.43) (-2.66) (-1.66) (-2.79) Constant -0.0392 0.2174 -10.7080 -15.4955 (-0.11) (0.56) (-0.33) (-0.12) Logit inflation model Travel costs 0.0109 (0.91) Income -0.0012*** (-4.77) Age 0.0903*** (8.47) Household size 0.0313 (0.43) Education -0.2768*** (-4.80) Yerevan city 0.8631*** (2.68) Constant -1.5611* (-1.83) 5.8005 3.7079 13.2317 17.0166 Log-likelihood -3,334.71 -3,249.60 -656.48 -679.79 LR test (=0) ~ 2 (d.f.) 6,469.23 (1) 3,271.69 (1) 846.11 (1) 799.49 (1) Vuong test ~ N (0,1) - 4.86 - - Number of observations 3,358 3,358 389 389 Non-zero observations 842 842 389 389 Zero observations 2,516 2,516 t-statistics in parentheses; * significant at the 10% level; ** significant at the 5% level; *** significant at the 1% level. (ii) Visitation sensitivity The sensitivity of trip demand for the household ZINB and tourist TRNBES models to changes in the parameter values is summarized in Table 4. Beginning with the household survey and under the binary participation equation, estimated coefficients 13 from the regression are interpreted as increasing or decreasing the odds of non- participation (or observing a zero). As this may be counter-intuitive, we reverse the signs on the estimated coefficients and re-interpret the results in terms of the odds of participation in Table 4. A unitary increase in age or household size of the respondent leads to a decrease in likelihood of participation by 9.5% and 3.2%, respectively, whereas an increase in one year of education increases the odds of participation by 75%. Income only marginally impacts trip demand with increases by $1 USD leading to an increase in participation of 0.12%. This relative insensitivity to income changes is a common finding among recreational demand studies. If the respondent lives in Yerevan, the likelihood of participation is decreased by an overwhelming 137%. This may be owing to the fact that in the household sample, over 80% of the sampled househols are from Yerevan, the capital city. For the trip count equation, a one unit increase in travel costs or education decreases the number of trips by 1.5% and 6.6%, respectively. Thus, although travel costs are not a significant determinant in the decision to recreate, they do impact the number of trips a person decides to take. Also, a person's education appears be important both decisions, but in opposite directions. Those with higher education tend to participate more often, but as one frequents the site more often this effect diminishes. Greater household size also works in opposite directions for the participation and quantity decisions. A one unit change in household size decreases participation by 3.2% but for those who do go, it increases the number of trips by 10.2%. Upon closer inspection of the data, it was found that households with more children were associated with higher trip frequencies. The impact of income on trip frequency was found to be negligible. 14 Table 4: Marginal effects on trip demand HOUSEHOLD: ZINB ON-SITE: TRNBES Visits Coefficient % trips Coefficient % trips Count participation equation Travel costs ($USD) -0.0153*** -1.52 -0.0519*** -5.06 Income ($USD) 0.00015*** 0.00 0.000013 0.00 Age (years) 0.0035 0.35 -0.0263*** -2.59 Household size (number) 0.0974*** 10.23 0.2711*** 31.13 Education (years) -0.0686*** -6.63 -0.0926*** -8.85 Participation % Pr(participation) Binary participation equation Travel costs ($USD) -0.0109 -1.10 Income ($USD) 0.0012*** 0.12 Age (years) -0.0903*** -9.45 Household size (number) -0.0313 -3.18 Education (years) 0.2768*** 75.82 Yerevan (1=lives in Yerevan) -0.8631*** -137.06 * significant at the 10% level; ** significant at the 5% level; *** significant at the 1% level For on-site trip demand, unitary increases in travel costs, age and education decrease the number of trips by 5.1%, 2.6% and 8.9%, respectively, and an increase in household size significantly increases trip frequency by 31%. With the exception of age, each impact has a similar interpretation as in the household model, but the effects are much larger. In the case of age, older individuals are significantly and negatively correlated with higher visitation. (iii) Estimated trip demand and welfare measures Using the parameter estimates from the four models in Table 3, the expected _ number of trips, E(yi | X ) , and consumers surplus (CS) measures were calculated (Table 5).9 The expected number of trips was estimated for each model using sample means of the independent variables. Comparing the NB with the ZINB, note that the expected number of trips falls once we account for the inflation of zeros (participation). Indeed, since the NB model is treating every zero as being a part of the quantity decision, this 9Although the CV and EV measures are not formally reported above, as the estimated coefficient on income, i, in both the ZINB and TRNBES models is small, CS is tightly bounded by CV and EV; for the ZINB model CV= $8.7984, EV=$8.8478 and for TRNBES model CV=$8.2137, EV=$8.2123. 15 biases the estimates upwards, whereas the ZINB recognizes that the zeros may come from different stochastic processes (participation or quantity). For the on-site model, TRNB, the expected number of trips far exceeds the demand estimated by the household survey. This seems reasonable since we are comparing casual versus avid users of the site. However, the expected number of trips is even higher after accounting for ES (TRNBES). At first glance this may seem counter- intuitive, but recall that expected trip demand is calculated as E(yi | xi) = i + 1 + ii), and note that the only substantial difference between the estimated parameters of TRNB and TRNBES is the value of the over-dispersion parameter, (see Table 3). Thus it is the overdispersion that is driving this result. This finding is similar to that found by Englin and Shonkwiler (1995), where expected trip demand is 1% higher for their sample-based `restricted negative binomial model' (analogous to our TRNBES model) and 63% higher for their population-based trip demand. Martinez-Espineira and Amoako-Tuffour (2005) also find an 18% higher expected trip demand in their ES model. Estimated household consumers surplus was $8.82 per trip whereas for the on-site sample CS was calculated as $8.73 without compensating for ES and $8.21 per trip with ES. Although all three results are close, it is rather surprising to find the closest estimate to be between the TRNB and ZINB models. One would initially expect the TRNBES to be the closest if ES were present in the on-site sample. The most plausible explanation is rooted in the very reason why one argues for ES adjustment; if adjustments for ES yield only small differences in expected demand or consumer surplus, this suggests that those surveyed at Lake Sevan possess characteristics similar to those in the household sample. This implies that either the TRNB or TRNBES model is sufficient for estimation. This can be more clearly seen if one views the mean function , and the similarity of estimated characteristics between the TRNB and TRNBES models (especially the similarity between the estimated coefficient on travel cost, p; which is the denominator in the CS calculation, - i / p. Ovaskainen et al. (2001) and Englin et al. (2003) also find similar results where the ES adjustment had little effect on coefficient and benefit estimates. 16 Table 5: Expected visitation and benefit estimates Measure Household: Household: On-site: On-site: NB ZINB TRNB TRNBES _ E(yi | X ) 0.8926 0.5787 5.8822 6.9664 CS ($USD per day-trip) 8.16 8.82 8.73 8.21 Total WTP1 ($USD) 6,362,295 6,875,160 6,802,126 6,399,840 Note: X is evaluated at the sample mean. 1 ­ Calculated for households as: CS * 779,230 households in 2001. V. Conclusion In this paper, a population-based household sample and an on-site sample are modeled in a travel cost framework to compare estimated consumers surplus for the value of site access. If each model is corrected for several dependent variable issues, we expect the models to produce similar welfare estimates. In the household model, we account for the potential for over-dispersion (variance>mean) by the use of a negative binomial distribution function, and for the possibility of observing a large number of zero visits (a recreation participation decision) by splitting the participation and quantity decisions directly in one censored model, the zero-inflated negative binomial (ZINB). For the on- site survey, there is a possibility of over-sampling those who recreate quite often, thus the truncated distribution function is augmented for endogenous stratification (e.g. the likelihood of surveying respondents whose characteristics are associated with higher trip frequencies). To compare the effect of ES, we model the on-site sample as a truncated negative binomial with and without endogenous stratification (TRNB and TRNBES, respectively). Each of these models are then applied to a unique water-based recreational site in Armenia, Lake Sevan. The site has few, if any, alternatives facilitating a comparative welfare exercise. In addition, as the surveys measured only current revealed preference behavior, no quality changes are present to confound the measurement of expected trips outside the current experience. 17 Results from the zero-inflated negative binomial model (ZINB) for households suggest that separating the participation and quantity decisions is significant in modeling household behavior. In this application, explanatory variables such as age, education and income were found to be significant factors in the binary decision to recreate at Lake Sevan. The quantity of trips was determined by travel costs, income, household size and education. Expected trip demand was found to be 0.58 trips per individual per annum, and the welfare measure calculated from the underlying demand function reveal a per trip consumers surplus of $8.82. From the on-site sample the TRNB and TRNBES models yielded expected trip demands of 5.9 and 7 trips per person per year with consumers surplus values of $8.79 and $8.21 per person per year, respectively. Expected trip demand from the on-site models is higher than the household sample due to the difference in sampling casual versus more avid users of the site. However, an even higher trip demand is found in the TRNBES model due to a higher estimated overdispersion parameter, used in the calculation of expected trip demand. All three models appear to yield similar welfare measures, but it appears that accounting for endogenous stratification in the TRNES model did not yield a significantly different estimate from the TRNB model. In fact, consumers surplus from the TRNB model is slightly closer to the household result than the TRNBES model. One possible explanation is that individual characteristics of the on-site sample are not correlated with higher trip frequencies (arguing against the precise reason we factor in ES). This does not imply that ES is not an important consideration in modeling on-site behavior, rather the results found here suggest that the on-site sample was merely representative of the population-based household survey. This finding is quite contrary to other studies where the ES bias in welfare measurement has been found to be quite significant (Shaw, 1988; Englin and Shonkwiler, 1995; Loomis, 2003; Martinez- Espineira and Amoako-Tuffour, 2005). Although we did not find any significant difference in accounting for ES, this does not negate the main result that when comparing household and on-site samples, either can be used to derive a consistent welfare measure of access to the site after 18 accounting for each dependent variable problem. As was previously mentioned, quite often the method of surveying is a constrained choice, usually by cost or time. It is therefore reassuring that if one is truly constrained in some sense, that by implementing the proper technique, the quality of the measure need not be in question. 19 References Cameron, A. C. and P. K. Trivedi. 1986. Econometric models based on count data: comparisons and application of some estimators and tests. Journal of Applied Econometrics. 1: 29-53. Curtis, J. 2003. Demand for water-based leisure activity. Journal of Environmental Planning and Management. 46(1): 65-77. Englin, J. and J. S. Shonkwiler. 1995. Estimating social welfare using count data models: an application to long-run recreational demand under conditions of endogenous stratification and truncation. The Review of Economics and Statistics. 77(1): 104- 112. Englin, J. T. Holmes and E. Sills. 2003. Estimating forest recreation demand using count data models. In E. Sills (Ed.), Forests in a Market Economy, Chapter 19, pp. 341- 359. Dordrecht, The Netherlands: Kluwer Academic Publishers. Gourieroux, C. A., A. Monfort, A. Trogon. 1984. Pseudo maximum likelihood methods: Applications. Econometrica. 52: 701-720. Green, W. 1994. Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. Working Paper EC-94-10, Department of Economics, Stern School of Business, New York University, New York, N.Y. Grogger, J. and R. Carson. 1991. Models for truncated counts. Journal of Applied Econometrics. 6: 225-238. Gurmu, S. and P. K. Trivedi. 1996. Excess zeros in count models for recreational trips. Journal of Business and Economics Statistics. 14: 469-477. Haab, T. C. and K. E. McConnell. 1996. Count data models and the problem of zeros in recreation demand analysis. American Journal of Agricultural Economics, 78: 89- 102. Hausman, J., B. Hall, Z. Griliches. 1984. Econometric models for count data with an application to the patents ­ R&D relationship. Econometrica. 52: 909-938. Hellerstein, D. M. 1991. Using count data models in travel cost analysis with aggregate data. American Journal of Agricultural Economics. 73: 860-866. Hilbe, J. 1999. sg102: Zero-truncated Poisson and negative binomial regression. STATA Technical Bulletin No. 47. Lambert, D. 1992. Zero-Inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 34: 1-14. 20 Loomis, J. 2003. Travel cost demand model based river recreation benefit estimates with on-site and household surveys: comparative results and a correction procedure. Water Resources Research. 39(4): 1105. Martinez-Espineira, R. and J. Amoako-Tuffour. 2005. Recreation demand analysis under truncation, overdispersion, and endogenous stratification: an application to Gros Morne National Park. Working Paper 2005-03. Department of Economics, St. Francis Xavier University: Canada. Mullahy, J. 1986. Specification and testing of some modified count data models. Journal of Econometrics. 33: 341-365. Ovaskainen, K., J. Mikkola and E. Pouta. 2001. Estimating recreation demand with on- site data: an application of truncated and endogenously stratified count data models. Journal of Forest Economics. 7(2): 125-144. Scrogin, D., K. Boyle, G. Parsons and A. Plantinga. 2004. Effects of regulations on expected catch, expected harvest, and site choice of recreational anglers. American Journal of Agricultural Economics. 86(4): 963-974. Shaw, D. 1988. On-site samples' regression: problems of non-negative integers, truncation and endogenous stratification. Journal of Econometrics. 37: 211-223. Shaw, W. D. and P. Jakus. 1996. Travel cost models of the demand for rock climbing. Agricultural and Resource Economics Review. 25: 133-142. Shaw, W. D., E. Fadali, and F. Lupi. 2003. Comparing consumer's surplus estimates calculated from intercept and general survey data. Proceedings of the W-133 (U.S.D.A.) Regional Economics Group, compiled by J. S. Shonkwiler. Las Vegas, Nevada, February. Shonkwiler, J. S. and W. D. Shaw. 1996. Hurdle count-data models in recreation demand analysis. Journal of Agricultural and Resource Economics. 21: 210-219. Vuong, Q. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica. 57: 307-334. Yen, S. T. and W. L. Adamowicz. 1993. Statistical properties of welfare measures from count-data models of recreation demand. Review of Agricultural Economics. 15: 203- 215. 21 Appendix 1: Descriptive statistics for the Household (HH) and Tourist survey (Tourist) Variable Mean Standard deviation Minimum Maximum HH w/ HH w/ Tourist HH w/ HH w/ Tourist HH w/ HH w/ Tourist HH w/ HH w/ Tourist Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1 Visits (person-day-trips) 3.24 0.81 3.17 7.36 3.95 5.75 1 0 1 100 100 50 Travel costs ($USD) 9.42 9.00 10.23 10.28 5.15 7.58 0.06 0.06 0.1 147 147 41 Income ($USD) 1,861 1,383 2,933 1,623 1,246 2,052 150 120 480 14,976 14,976 15,120 Age (years) 39 44 36 12 14 13 18 18 18 76 81 71 Household size 5 4 5 2 2 1 1 1 2 12 13 8 Education (years) 11 10 10 2 2 2 0 0 5 14 14 14 Past visitation (1=yes) 1.0 0.95 0.94 0 0.22 0.24 1 0 0 1 1 1 Yerevan city (1=yes) 0.80 0.82 - 0.40 0.38 - 0 0 - 1 1 - Lake Sevan (1=yes) 0.12 0.06 1.00 0.33 0.24 0.00 0 0 1 1 1 1 Observations 842 3358 389 22