WPS3932

                              WELFARE MEASUREMENT BIAS IN
                          HOUSEHOLD AND ON-SITE SURVEYING
                               OF WATER-BASED RECREATION:


                      AN APPLICATION TO LAKE SEVAN, ARMENIA




                                               Craig Meisner
                                                     and
                                                 Hua Wang
                                     Development Research Group
                                              The World Bank

                                                     and

                                              Benoīt Laplante
                            Independent Consultant, Montreal, Canada




Keywords ­ On and off-site sampling, recreation demand, zero-inflated models, truncated count data
models, endogenous stratification, Armenia.



World Bank Policy Research Working Paper 3932, June 2006

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the
exchange of ideas about development issues. An objective of the series is to get the findings out quickly,
even if the presentations are less than fully polished. The papers carry the names of the authors and should
be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely
those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors,
or the countries they represent. Policy Research Working Papers are available online at
http://econ.worldbank.org.

Correspondence should be addressed to: Craig Meisner, MC2-205, World Bank, 1818 H Street, NW,
Washington, DC 20433, cmeisner@worldbank.org.

I.      Introduction


        Several recent travel cost studies have aimed to compare recreational benefits

derived from household and on-site surveys (e.g. Loomis, 2003; Shaw, 2003). If it can be

shown that welfare estimates derived from cost-effective on-site surveying techniques are

similar to household survey results, this may justify using on-site surveys in lieu of large

and costly population-based surveys. However, a robust comparison of estimates

obtained from each sample requires addressing a number of important statistical issues.

In particular, household survey demand is typically censored due to the possibility of

observing a large number of zeros (or non-users of the site). Simply treating all zeros in

the sample as users of the site introduces an upward bias of the demand and welfare

measures. On the other hand, on-site sample demand is truncated at one since it surveys

only users at the site. In this case, estimates are prone to higher standard errors and an

upward bias from over-sampling individuals whose characteristics may be correlated with

higher trip frequencies (endogenous stratification - ES).


        In the case of household surveys, it is possible to resolve the issue by separating

the recreation `participation' decision from the trip `quantity' decision, thus reducing the

bias introduced by non-users of the site. In the case of on-site surveys, it is possible to

correct for the potential bias by providing adjustments to the distribution function (Shaw,

1988; Englin and Shonkwiler, 1995). To our knowledge, none of the existing travel cost

studies have attempted to correct for both biases when conducting comparative analyses
of estimates obtained from household and on-site surveys.1


        In this paper, we test the proposition of whether the household and on-site

demand estimation yield similar welfare measures, after accounting for both biases

discussed above. For this purpose, we use a household and on-site survey conducted at

Lake Sevan, Armenia. This single-site comparison has two advantages. First, as the site

is unique, we avoid problems of having to incorporate substitute sites into the decision to



1Loomis (2003) does not discuss the prevalence of zeros in his comparative household sample, and does
not consider their relative influence on expected trip demand or welfare.


                                                                                                     1

recreate. Second, since we are not valuing a change in the quality of the lake, we also

avoid any quality change impacts on expected trip demand.


        The household survey consisted of 3,358 households across Armenia, and the on-

site survey of 389 tourists recreating at Lake Sevan. Travel cost models were constructed

and estimated using travel expenditure and socio-demographic information contained in

each survey. As visitation rates in the household survey contained a large percentage of

zeros and the presence of over-dispersion in trip frequency, a zero-inflated negative

binomial model (ZINB) was estimated. For the on-site survey, two truncated negative

binomial models were estimated with and without an adjustment for endogenous

stratification (ES).


        Likelihood ratio tests for over-dispersion were rejected in favor of the negative

binomial specification in both the household and on-site models. Results from the

household model also reveal that the participation decision is indeed relevant to the

household's recreation decision. However, in the case of the on-site sample, estimated

coefficients for the ES and non-ES models were not significantly different. This may

suggest that characteristics from the on-site sample are representative of the household

sample. Other studies have found similar results where accounting for ES did not yield

any significant differences in trip demand or welfare (Ovaskainen et al., 2001; Englin et

al., 2003). Per trip consumers surplus was estimated to be $8.82 for the household

sample, $8.73 for the on-site model without ES adjustment, and $8.21 with ES.


        The remainder of this paper is structured as follows. The next section provides a

description of travel cost and count data models utilized in this study along with

recommendations of how to remedy several dependent variable issues typically

encountered with household and on-site recreational surveys. In Section III, the two

surveys are described in more detail. In Section IV, the results of estimation are

presented, along with a comparison in expected trip demand and estimated welfare

measures. Section V provides a brief summary and discussion of the findings.




                                                                                       2

II.     Travel Cost Modeling


        In travel cost modeling, the decision to recreate is typically modeled as a latent

demand, yi , representing the number of trips taken in one year as a function of travel cost
           *


(P), site quality attributes (Z) and individual demographic characteristics (X):


        Tripsi = yi = f (Pi, Xi, Zi) + i
                    *                                        i = 1, 2,..., N        (1)


Travel cost-modeling (TCM) can be implemented through household or on-site surveys.

However, each sampling method involves a number of different statistical issues.


(i)     Household survey


        An important modeling issue when applying TCM pertains to the treatment of

non-negative integers observed in individual recreational data, as one may encounter a

large proportion of zeros in a general household survey (Shaw, 1988; Grogger and

Carson, 1991; Hellerstein, 1991). Observing a zero implies that the services from the site

do not enter into the utility function of the individual. In the utility maximization

framework, it implies that the individual is currently at some choke price where he is

consuming zero trips, and that if the current "market" price were to fall below the choke

price, the individual would demand a positive number of trips. However, one may also

observe a zero if for some reason (such as age, health-related reasons, etc.) services from

the site would never enter an individuals' utility function (Habb and McConnell, 1996).

Thus, there is an important distinction between observing zeros for those who are

participants and for those who are non-participants (`true zeros'). Standard count data

models such as the Poisson or negative binomial assume that all individuals surveyed are

potential users of the good in question, and that the same variables influence all potential

users similarly. In the presence of a large number of zeros, and where the participation

question is relevant, this assumption may not be valid and should be tested for its

significance.




                                                                                          3

        To account for the participation issue, we consider two augmented count data

models which account for the presence of a large number of zeros - the zero-inflated

Poisson (ZIP) and zero-inflated negative binomial (ZINB) (Mullahy, 1986; Lambert,

1992; Greene, 1994; Haab and McConnell, 1996).                          By distinguishing between

participants and non-participants, the zero observations may contain valuable

information, and a gain in efficiency will be achieved by including all of the observations
(Haab and McConnell, pg. 90).2 Empirically, zero-inflated count models change the

mean structure to allow zeros to be generated by two distinct processes, one for the

participation decision (logit or probit) and one for the mean number of trips (count
model).3      By expanding the standard count model to allow for individual-specific

characteristics which may keep an individual from entering the recreation market, one

can separate factors which influence the participation issue from those that influence the

quantity of trips taken to a recreation site (Haab and McConnell, 1996). In estimation, the

ZIP model allows for over-dispersion in the Poisson data generating process by allowing

a mass of zero observations independent of the true Poisson process.


        The distribution function for the ZIP model is:


                            Pi + (1- Pi)e- i          if yi = 0,
        Pr(yi | xi) =

                           (1- Pi )  e- i
                                       i    yi
                                                      otherwise.                          (2)
                                        yi!

where E(yi) = (1 - Pi)i, Var(yi) = (1 - Pi)(1 + Pii)i, and Pi is the probability of zero

visitation, with mean i = exp(xi). Note that in this formulation, zeros can occur in

either the binomial process (when yi = 0) or the Poisson process (when yi  1), since exp(-

i)i /0! = exp(-i). Again, i can be modeled as exp(xi), and Pi as g(zi), where  is a
     0


vector of participation-decision parameters and zi is a vector of explanatory variables that

may or may not be the same as those for the quantity decision, xi. The function g(·) can

be modeled using either logit or probit (or cumulative standard normal) function as they


2In the past, one crude option was simply to drop the zeros from the sample.
3The zero-inflated models differ from the Heckman continuous two-stage model as they allow for zero
observations in the second stage of the decision process (in the mean model).


                                                                                                 4

both give similar results. In the presence of over-dispersion4 (variance>mean), the

participation decision can be similarly decomposed in a zero-inflated negative binomial

model as:


                     Pi + (1- Pi)1+          1 
                                      1
                                       i                                           if yi = 0,

Pr(yi | xi) =
                                                            1 
                    (1- Pi ) (yi +1)(1 )1+i
                                (yi +1 )  1                1 i              yi
                                                                                   otherwise.      (3)
                                                                  + i

where E(yi) = (1 - Pi)i and Var(yi) = (1 - Pi)[1 + i( + Pi)]i. The presence of the 

parameter in the calculation of the conditional variance of y (if greater than 0), guarantees

that the variance is greater than the mean. As   0, the moments of the distribution

converge to a Poisson distribution and so testing for =0 provides a case for selecting the

negative binomial over the Poisson, and indirectly for the presence of over-dispersion.


         The flexibility of modeling the participation decision in this manner has lead to a

number of interesting applications in recreational demand analysis, including beach trips

(Shonkwiler and Shaw, 1996; Haab and McConnell, 1996), rock climbing (Shaw and

Jakus, 1996), lake recreation, (Gurmu and Trivedi, 1996), water-based recreation (Curtis,

2003), and angling site choice (Scrogin et al., 2004).


(ii)     On-site sampling


         Interview surveys conducted on-site obviously avoid the non-participation issue,

but as the dependent variable yi is strictly non-zero, the truncated demand relationship


4An undesirable feature of Poisson count models is the assumption that the conditional mean and variance
are equal (Yen and Adamowicz, 1993). This is especially problematic in empirical research because
conditional variances are typically greater than conditional means in socio-economic data (also known as
over-dispersion, a form of heteroskedasticity). The presence of over-dispersion still allows for consistently
estimated means of parameter estimates (Gourieroux et al. 1984), but causes the standard errors of these
estimates to be biased downward, resulting in erroneous tests of their statistical significance (Cameron and
Trivedi, 1986). The equality of the mean and the variance property of Poisson count models led to the
development of negative binomial models (Hausman et al., 1984). This model allows for over-dispersion
by combining the Poisson distribution with a gamma distribution and hence allowing for heterogeneity to
be gamma distributed.


                                                                                                           5

measures only those with smaller error terms. In addition, because the sample is on-site,

there is a higher likelihood of intercepting a person whose characteristics are correlated

with higher trip frequencies, or what is known as `endogenous stratification' in sampling.

The implication is that the sample is not representative of the population at large, and in

measuring welfare effects, consumers surplus estimates will be biased upwards as it is

only capturing the effect of avid recreationists.


        Truncation and endogenous stratification was first explored by Shaw (1988) in the

case of the Poisson distribution and extended by Englin and Shonkwiler (1995) to the

negative binomial distribution. The basic implication is to weight individual observations
by the inverse of the expected value of trips. Assuming that the density function of the ith

person in the population is f(yi*|xi), Shaw (1988) shows that the density function of the

same person in the on-site population is:



               Pr(yi | xi) = yi f (yi | xi )                                        (4)
                              
                            t    f (t | xi)
                             t=1


If the conditional density f(yi*| xi) is chosen to be Poisson with the location parameter i,

then the on-site sample's density function is:



               Pr(yi | xi) =e- i
                               i   yi -1
                                                                                    (5)
                             (yi -1)!

where E(yi | xi) = i + 1 and Var(yi | xi) = i. Defining wi = yi - 1, the standard Poisson

model can be estimated, substituting wi for yi in (5) above.


        In the presence of over-dispersion, the equality of the mean and variance is

violated and thus the negative binomial model is preferred with the following density

function (Englin and Shonkwiler, 1995):

                                                      1 

               Pr(yi | xi) =(yi +1)(1 )1+i
                              yi(yi +1 )  1            1+i        yi
                                                                    yii-1           (6)




                                                                                          6

where E(yi | xi) = i + 1 + ii and Var(yi | xi) = i(1 + i + ii + i i). As the            2


specification in (6) cannot be transformed into any simpler form as in the case of the

truncated Poisson, the likelihood function must be programmed directly into a likelihood
maximization routine. The log likelihood function used in this context is:5



         ln L =  y N


                  i=1 ln yi + ln((yi +1/ )) - ln((yi +1)) - ln((1/ )) +                       (7)
                        iln + (yi -1)lni - (yi +1/ )ln(1+ i)

Defining i as the expected number of person-day-trips6 individual i takes to the site in a

year, the empirical demand relationship can be defined as:


         i = exp(Xi  + i) = exp(ppi + xi + i)               i = 1,...,n                       (8)

where  is a K x 1 vector of parameters, Xi is a 1 x K vector of explanatory variables for

individual i, pi is the travel cost for individual i to the site, xi is the 1 x K ­1 vector of

explanatory variables after pi is subtracted from Xi, p is the parameter on travel cost, and

 is the remaining vector of parameters corresponding to xi.



(ii)     Welfare measures


         The benefit (consumer surplus) of access to the site is defined as the area under

the estimated Marshallian demand curve specified in (8) and above the current price

level. By integrating the demand function over travel costs (prices) faced by individuals,

we calculate expected consumers surplus as:


                  E (CSi) =  i dP = - i / p                                                   (9)

where i is as defined in (8) and p is the estimated parameter on travel cost. Summed

across all i, the area measures the total per trip willingness-to-pay by all individuals to

recreate at the site. In the case of the ZINB model expected consumers surplus must be

weighted by the probability of zero visitation (1 - Pi), where Pi is a function of variables

5 The likelihood function in (7) was entered into a modified zero-truncated negative binomial maximum
likelihood routine provided by Hilbe (1999).
6 Person-day-trips were defined as the number of trips taken by the respondent in one year. All cost
information was then divided by the number of days to form per-day trip costs.


                                                                                                    7

that affect the participation decision. Compensating and equivalent variation measures

can also be calculated from the expenditure function implied by the Marshallian demand

relationship specified above. From a welfare perspective, CV and EV may be of interest

as measures of potential compensation from those who degrade the resource. Table 1

summarizes the welfare measures used in the analysis.


                                 Table 1: Welfare measures


Model                              Consumers           Compensating               Equivalent
                                      surplus             variation                variation
Household sample:
                                             _
                                                       1  ln1+
                                                                                  1   
 Negative binomial                                               i                         i
                                  -     = -eX 
                                    p       p          i         p              - ln1-
                                                                                  i         p  
                                             _                     i                 1         i
 Zero-inflated negative binomial  -(1- P)  eX              1   

                                           p       (1- P) i ln1-   p        -(1- P)i ln1+      p  

On-site sample:
 Trunc. negative binomial/                   _
                                                       1  ln1+
                                                                                  1   
 Trunc. negative binomial         -     = -eX                    i                         i

   w/endogenous stratification      p       p          i         p              - ln1-
                                                                                  i         p  

                _                            _
Note:  = exp ( X ) from equation (8), where X represents the sample means; i is the coefficient on
income.



III.     Application to Lake Sevan, Armenia

         Lake Sevan is the largest high altitude reservoir of freshwater in the

Transcaucasus, and is one of the highest lakes in the world. However, over the course of

last 50 years, the level of the lake has dropped by 18 m, its surface area has decreased by

15%, and the volume of water in Lake Sevan fell by more than 40% (from 58.5 to 34.6
km3). These changes had various significant adverse impacts on Lake Sevan's ecology.

As it is located only 70 km away from the capital city Yerevan, Lake Sevan is the

preferred and most accessible recreational site of most Armenians.


         The Government of Armenia has been working on a Lake Sevan protection action

plan. The objectives under consideration by the Government of Armenia include

preventing a further lowering of the level of Lake Sevan, and raising the level of the lake

by at least 3 meters as quickly as possible. However to date, there has not been a


                                                                                                  8

thorough measurement of the current recreational benefits to include in benefit-cost

analysis. Welfare measurement would be useful to policymakers tasked with weighing

the alternative options of restoring Lake Sevan. Our model and welfare comparison is

also useful in this context as Lake Sevan is a single site, with no substitutes, so

comparing the two samples is not confounded by alternative sites that may enter into an

individuals' water-based recreation decision. Also, since we are measuring current

recreational benefits, we avoid having to predict what the impact improvements would

have on expected trip demand.


        To estimate benefits by the general population and users of the site, two surveys

were conducted ­ one comprising of 3,358 households across Armenia and the other an
interceptor survey of 389 on-site tourists recreating at Lake Sevan.7 Both were conducted

in the year 2000, with the tourist survey during the summer to better capture the high

season of annual recreational use at the lake. The household sample was selected and

stratified by the 1996 Population Census of Armenia, while the on-site survey relied on

tourist interception at the lake.


        Annual visitation to Lake Sevan by these two groups is reported in Table 2.

Household survey responses indicate that nearly 75% did not visit the lake in the past

year, with a sample mean of 0.81 day-trips. The tourist survey, obviously truncated at one

as interviews took place at the lake, averaged 3.17 day-trips per year. The average person

from the household survey was 44 years old, earned the equivalent of 1,383 USD per

annum, had 10 years of formal education, and a household size of 4. The average person

from the on-site survey was 36 years old, earned $2,933 USD per annum, had 10 years of

education and a household size of 5 (see Appendix I for details).


In Table 2 we also note that the standard deviation of visitation in each sample exceeds

its mean, thus we suspect the presence of over-dispersion, and therefore formally test the


7 The detailed questionnaires included six major parts: (1) environmental attitudes and perceptions; (2) a
Lake Sevan action plan for restoration; (3) contingent valuation questions; (4) socio-economic
characteristics; (5) recreational use of Lake Sevan; and (6) interview debriefing questions.     For the
purposes of this paper, only sections (4) and (5) are used.


                                                                                                        9

negative binomial counterpart of the Poisson distribution. In addition, given the large

number of zeros in the household survey, this leads us to formally test the use of the zero-

inflated negative binomial model for the household survey.


                               Table 2: Frequency of visitation

                                        Household          Tourist
                     Person-day-trips   frequency Percent frequency Percent

                            0               2516   74.93          0    0.00
                            1                455   13.55       185   47.56
                            2                152     4.53        94   24.16
                            3                 84     2.50        41  10.54
                            4                 30     0.89        25    6.43
                            5                 37     1.10        14    3.60
                            6                 12     0.36         5    1.29
                            7                   7    0.21         0    0.00
                            8                   5    0.15         0    0.00
                            9                   0    0.00         0    0.00
                           10                 26     0.77         5    1.29
                         10 to 15             12     0.36         6    1.54
                         15 to 20             10     0.30         6    1.54
                         20 to 30               3    0.09         4    1.03
                         30 to 40               3    0.09         2    0.51
                         40 to 50               1    0.03         2    0.51
                        50 to 100               5    0.15         0    0.00
                          Total             3358   100.00       389  100.00
                    Mean                     0.81              3.17
                    Standard deviation       3.95              5.75




IV.    Estimation Results


(i)    Determinants of visitation


       The household sample was initially modeled using the Poisson, negative binomial

(NB), zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB). The on-

site sample was modeled using the truncated Poisson, truncated negative binomial

(TRNB) and the truncated negative binomial with endogenous stratification (TRNBES).

Comparative tests between each model were performed and are reported below. For

brevity, only the estimation results for the household (NB and ZINB) and on-site models




                                                                                         10

(TRNB and TRNBES) are reported in Table 3 with marginal effects for the ZINB and

TRNBES models listed in Table 4.


        From the empirical demand relationship in equation (8), we model the

participation and trip quantity decisions using travel cost and several individual-specific

variables that may co-vary with each decision - income, age, household size, education,
and a Yerevan city dummy.8 Travel costs included: (1) transport costs; (2) on-site costs

(per day); and (3) the value of time traveling to and at Lake Sevan. The value of time

was elicited from the respondent by asking them how much they would have earned had

they not traveled to Lake Sevan. This amount was then divided by the number of days

they were at the lake to arrive at a trip-per-day cost. Note that for the household model,

each equation (logit and mean) contain the same explanatory variables as they may

contribute to either of the participation or quantity decisions.


        Beginning with the household survey results in the second and third columns of

Table 3, we note that the likelihood ratio (LR) test of  = 0 is rejected indicating the

significance of over-dispersion and thus the selection of the negative binomial

specification over the Poisson. A further formal specification test between the NB and

ZINB is possible (Vuong, 1989). The test statistic is directional and distributed standard

normal and for values |V| > 1.96, the zero-inflated version is supported. With a value of

4.86, the ZINB specification is favored over the NB.


        Parameter estimates of the household ZINB model reveal that income, age and

education, along with respondents who reside in Yerevan significantly determine the

household participation decision to recreate at Lake Sevan (see logit inflation model).

The coefficients are interpreted relative to observing a zero count, thus the positive

coefficient on age implies that older respondents are more likely to record zero

participation, whereas individuals with higher income or education are less likely to

report zero trips to Lake Sevan. Those who reside in Yerevan city are also more likely to


8A dummy variable to capture previous visitation to the lake was also initially considered for each model,
however, over 94% of respondents in the household survey and over 95% in the tourist survey visited Lake
Sevan at least once in the past three years (and thus insufficient statistical variation).


                                                                                                        11

report zero visitation in the past year. Among those who do choose to participate (see

mean model), increases in income and household size increase trip demand, while

increases in travel costs and education decrease trip demand.


        For the on-site survey, first an LR test between a truncated Poisson and truncated

negative binomial (TRNB) was rejected indicating that over-dispersion in visitation is

significant, leading to us to favor the TRNB specification. Second, the TRNBES model

was estimated to see whether higher trip frequencies have any systematic association with

an individual's characteristics. Estimation results for both TRNB and TRNBES show

that increases in travel costs, age and education decrease visitation, whereas increases

household size increase trip demand. In the TRNB model, estimated coefficients and

standard errors are higher leading to a lower significance across each explanatory

variable. By correcting for ES, the magnitude of estimated coefficients falls, and standard

errors fall by a greater extent such that significance rises among the major determinants

of visitation. In the next section, we explore the consequences of these differences on

expected trip demand as well as the implications on welfare estimates.




                                                                                        12

      Table 3: Household and on-site model estimates of visitation to Lake Sevan

 Variable                        HH: NB             HH: ZINB           On-site: TRNB       On-site: TRNBES

 Mean model
   Travel costs                  -0.0256***          -0.0153***          -0.0521***          -0.0519***
                                   (-5.41)             (-3.46)             (-3.37)              (-4.79)
   Income                         0.00035***          0.00015***          0.000040            0.000013
                                   (7.54)              (3.63)               (0.60)              (0.32)
   Age                           -0.0233***           0.0035             -0.0313***          -0.0263***
                                   (-6.36)             (0.78)               (-3.45)             (-4.58)
   Household size                 0.1219***           0.0974***           0.2969***           0.2711***
                                   (4.02)              (2.64)               (3.57)              (5.26)
   Education                     -0.0094             -0.0686***          -0.0912*            -0.0926***
                                   (-0.43)             (-2.66)             (-1.66)              (-2.79)
   Constant                      -0.0392              0.2174             -10.7080            -15.4955
                                   (-0.11)             (0.56)               (-0.33)             (-0.12)

 Logit inflation model
   Travel costs                                       0.0109
                                                       (0.91)
   Income                                            -0.0012***
                                                       (-4.77)
   Age                                                0.0903***
                                                       (8.47)
   Household size                                     0.0313
                                                       (0.43)
   Education                                         -0.2768***
                                                       (-4.80)
   Yerevan city                                       0.8631***
                                                       (2.68)
   Constant                                           -1.5611*
                                                       (-1.83)

                                  5.8005              3.7079              13.2317             17.0166

 Log-likelihood                   -3,334.71           -3,249.60          -656.48             -679.79
 LR test (=0) ~ 2 (d.f.)           6,469.23 (1)       3,271.69 (1)        846.11 (1)          799.49 (1)
 Vuong test ~ N (0,1)                  -                   4.86               -                   -
 Number of observations            3,358               3,358               389                  389
 Non-zero observations               842                 842               389                  389
 Zero observations                 2,516               2,516
    t-statistics in parentheses; * significant at the 10% level; ** significant at the 5% level; *** significant
    at the 1% level.


(ii)     Visitation sensitivity


         The sensitivity of trip demand for the household ZINB and tourist TRNBES

models to changes in the parameter values is summarized in Table 4. Beginning with the

household survey and under the binary participation equation, estimated coefficients



                                                                                                              13

from the regression are interpreted as increasing or decreasing the odds of non-

participation (or observing a zero). As this may be counter-intuitive, we reverse the

signs on the estimated coefficients and re-interpret the results in terms of the odds of

participation in Table 4.


        A unitary increase in age or household size of the respondent leads to a decrease

in likelihood of participation by 9.5% and 3.2%, respectively, whereas an increase in one

year of education increases the odds of participation by 75%. Income only marginally

impacts trip demand with increases by $1 USD leading to an increase in participation of

0.12%.    This relative insensitivity to income changes is a common finding among

recreational demand studies. If the respondent lives in Yerevan, the likelihood of

participation is decreased by an overwhelming 137%. This may be owing to the fact that

in the household sample, over 80% of the sampled househols are from Yerevan, the

capital city. For the trip count equation, a one unit increase in travel costs or education

decreases the number of trips by 1.5% and 6.6%, respectively. Thus, although travel

costs are not a significant determinant in the decision to recreate, they do impact the

number of trips a person decides to take. Also, a person's education appears be important

both decisions, but in opposite directions. Those with higher education tend to participate

more often, but as one frequents the site more often this effect diminishes. Greater

household size also works in opposite directions for the participation and quantity

decisions. A one unit change in household size decreases participation by 3.2% but for

those who do go, it increases the number of trips by 10.2%. Upon closer inspection of

the data, it was found that households with more children were associated with higher trip

frequencies. The impact of income on trip frequency was found to be negligible.




                                                                                        14

                                   Table 4: Marginal effects on trip demand

                                                           HOUSEHOLD: ZINB                          ON-SITE: TRNBES
Visits                                          Coefficient                   %  trips            Coefficient  %  trips
Count participation equation
 Travel costs ($USD)                            -0.0153***                           -1.52        -0.0519***     -5.06
 Income ($USD)                                    0.00015***                          0.00        0.000013        0.00
 Age (years)                                      0.0035                              0.35        -0.0263***     -2.59
 Household size (number)                          0.0974***                         10.23         0.2711***     31.13
 Education (years)                              -0.0686***                           -6.63        -0.0926***     -8.85

Participation                                                         %  Pr(participation)
Binary participation equation
 Travel costs ($USD)                            -0.0109                              -1.10
 Income ($USD)                                    0.0012***                           0.12
 Age (years)                                    -0.0903***                           -9.45
 Household size (number)                        -0.0313                              -3.18
 Education (years)                                0.2768***                         75.82
 Yerevan (1=lives in Yerevan)                   -0.8631***                       -137.06
* significant at the 10% level; ** significant at the 5% level; *** significant at the 1% level


           For on-site trip demand, unitary increases in travel costs, age and education

decrease the number of trips by 5.1%, 2.6% and 8.9%, respectively, and an increase in

household size significantly increases trip frequency by 31%. With the exception of age,

each impact has a similar interpretation as in the household model, but the effects are

much larger. In the case of age, older individuals are significantly and negatively

correlated with higher visitation.


(iii)      Estimated trip demand and welfare measures


           Using the parameter estimates from the four models in Table 3, the expected

                                    _
number of trips, E(yi | X ) , and consumers surplus (CS) measures were calculated (Table

5).9 The expected number of trips was estimated for each model using sample means of

the independent variables. Comparing the NB with the ZINB, note that the expected

number of trips falls once we account for the inflation of zeros (participation). Indeed,

since the NB model is treating every zero as being a part of the quantity decision, this




9Although the CV and EV measures are not formally reported above, as the estimated coefficient on
income, i, in both the ZINB and TRNBES models is small, CS is tightly bounded by CV and EV; for the
ZINB model CV= $8.7984, EV=$8.8478 and for TRNBES model CV=$8.2137, EV=$8.2123.


                                                                                                                     15

biases the estimates upwards, whereas the ZINB recognizes that the zeros may come

from different stochastic processes (participation or quantity).


        For the on-site model, TRNB, the expected number of trips far exceeds the

demand estimated by the household survey.           This seems reasonable since we are

comparing casual versus avid users of the site. However, the expected number of trips is

even higher after accounting for ES (TRNBES). At first glance this may seem counter-

intuitive, but recall that expected trip demand is calculated as E(yi | xi) = i + 1 + ii), and

note that the only substantial difference between the estimated parameters of TRNB and

TRNBES is the value of the over-dispersion parameter,  (see Table 3). Thus it is the

overdispersion that is driving this result. This finding is similar to that found by Englin

and Shonkwiler (1995), where expected trip demand is 1% higher for their sample-based

`restricted negative binomial model' (analogous to our TRNBES model) and 63% higher

for their population-based trip demand. Martinez-Espineira and Amoako-Tuffour (2005)

also find an 18% higher expected trip demand in their ES model.


        Estimated household consumers surplus was $8.82 per trip whereas for the on-site

sample CS was calculated as $8.73 without compensating for ES and $8.21 per trip with

ES. Although all three results are close, it is rather surprising to find the closest estimate

to be between the TRNB and ZINB models. One would initially expect the TRNBES to

be the closest if ES were present in the on-site sample. The most plausible explanation is

rooted in the very reason why one argues for ES adjustment; if adjustments for ES yield

only small differences in expected demand or consumer surplus, this suggests that those

surveyed at Lake Sevan possess characteristics similar to those in the household sample.

This implies that either the TRNB or TRNBES model is sufficient for estimation. This

can be more clearly seen if one views the mean function , and the similarity of estimated

characteristics between the TRNB and TRNBES models (especially the similarity

between the estimated coefficient on travel cost, p; which is the denominator in the CS

calculation, - i / p. Ovaskainen et al. (2001) and Englin et al. (2003) also find similar

results where the ES adjustment had little effect on coefficient and benefit estimates.




                                                                                            16

                        Table 5: Expected visitation and benefit estimates


  Measure                            Household:             Household:         On-site:    On-site:
                                          NB                    ZINB           TRNB        TRNBES
          _
   E(yi | X )                           0.8926                0.5787           5.8822      6.9664

  CS ($USD per day-trip)                   8.16                  8.82            8.73        8.21

  Total WTP1 ($USD)                 6,362,295              6,875,160         6,802,126   6,399,840

        Note: X is evaluated at the sample mean.
        1 ­ Calculated for households as: CS * 779,230 households in 2001.




V.      Conclusion


        In this paper, a population-based household sample and an on-site sample are

modeled in a travel cost framework to compare estimated consumers surplus for the value

of site access. If each model is corrected for several dependent variable issues, we expect

the models to produce similar welfare estimates. In the household model, we account for

the potential for over-dispersion (variance>mean) by the use of a negative binomial

distribution function, and for the possibility of observing a large number of zero visits (a

recreation participation decision) by splitting the participation and quantity decisions

directly in one censored model, the zero-inflated negative binomial (ZINB). For the on-

site survey, there is a possibility of over-sampling those who recreate quite often, thus the

truncated distribution function is augmented for endogenous stratification (e.g. the

likelihood of surveying respondents whose characteristics are associated with higher trip

frequencies). To compare the effect of ES, we model the on-site sample as a truncated

negative binomial with and without endogenous stratification (TRNB and TRNBES,

respectively).


        Each of these models are then applied to a unique water-based recreational site in

Armenia, Lake Sevan. The site has few, if any, alternatives facilitating a comparative

welfare exercise. In addition, as the surveys measured only current revealed preference

behavior, no quality changes are present to confound the measurement of expected trips

outside the current experience.




                                                                                                    17

       Results from the zero-inflated negative binomial model (ZINB) for households

suggest that separating the participation and quantity decisions is significant in modeling

household behavior. In this application, explanatory variables such as age, education and

income were found to be significant factors in the binary decision to recreate at Lake

Sevan. The quantity of trips was determined by travel costs, income, household size and

education. Expected trip demand was found to be 0.58 trips per individual per annum,

and the welfare measure calculated from the underlying demand function reveal a per trip

consumers surplus of $8.82. From the on-site sample the TRNB and TRNBES models

yielded expected trip demands of 5.9 and 7 trips per person per year with consumers

surplus values of $8.79 and $8.21 per person per year, respectively. Expected trip

demand from the on-site models is higher than the household sample due to the

difference in sampling casual versus more avid users of the site. However, an even

higher trip demand is found in the TRNBES model due to a higher estimated

overdispersion parameter,  used in the calculation of expected trip demand.


       All three models appear to yield similar welfare measures, but it appears that

accounting for endogenous stratification in the TRNES model did not yield a

significantly different estimate from the TRNB model. In fact, consumers surplus from

the TRNB model is slightly closer to the household result than the TRNBES model. One

possible explanation is that individual characteristics of the on-site sample are not

correlated with higher trip frequencies (arguing against the precise reason we factor in

ES). This does not imply that ES is not an important consideration in modeling on-site

behavior, rather the results found here suggest that the on-site sample was merely

representative of the population-based household survey. This finding is quite contrary

to other studies where the ES bias in welfare measurement has been found to be quite

significant (Shaw, 1988; Englin and Shonkwiler, 1995; Loomis, 2003; Martinez-

Espineira and Amoako-Tuffour, 2005).


       Although we did not find any significant difference in accounting for ES, this

does not negate the main result that when comparing household and on-site samples,

either can be used to derive a consistent welfare measure of access to the site after



                                                                                        18

accounting for each dependent variable problem. As was previously mentioned, quite

often the method of surveying is a constrained choice, usually by cost or time. It is

therefore reassuring that if one is truly constrained in some sense, that by implementing

the proper technique, the quality of the measure need not be in question.




                                                                                       19

References

Cameron, A. C. and P. K. Trivedi. 1986. Econometric models based on count data:
   comparisons and application of some estimators and tests.         Journal of Applied
   Econometrics. 1: 29-53.

Curtis, J. 2003. Demand for water-based leisure activity. Journal of Environmental
   Planning and Management. 46(1): 65-77.

Englin, J. and J. S. Shonkwiler. 1995. Estimating social welfare using count data models:
   an application to long-run recreational demand under conditions of endogenous
   stratification and truncation. The Review of Economics and Statistics. 77(1): 104-
   112.

Englin, J. T. Holmes and E. Sills. 2003. Estimating forest recreation demand using count
   data models. In E. Sills (Ed.), Forests in a Market Economy, Chapter 19, pp. 341-
   359. Dordrecht, The Netherlands: Kluwer Academic Publishers.

Gourieroux, C. A., A. Monfort, A. Trogon. 1984. Pseudo maximum likelihood methods:
   Applications. Econometrica. 52: 701-720.

Green, W. 1994. Accounting for excess zeros and sample selection in Poisson and
   negative binomial regression models. Working Paper EC-94-10, Department of
   Economics, Stern School of Business, New York University, New York, N.Y.

Grogger, J. and R. Carson. 1991. Models for truncated counts. Journal of Applied
   Econometrics. 6: 225-238.

Gurmu, S. and P. K. Trivedi. 1996. Excess zeros in count models for recreational trips.
   Journal of Business and Economics Statistics. 14: 469-477.

Haab, T. C. and K. E. McConnell. 1996. Count data models and the problem of zeros in
   recreation demand analysis. American Journal of Agricultural Economics, 78: 89-
   102.

Hausman, J., B. Hall, Z. Griliches. 1984. Econometric models for count data with an
   application to the patents ­ R&D relationship. Econometrica. 52: 909-938.

Hellerstein, D. M. 1991. Using count data models in travel cost analysis with aggregate
   data. American Journal of Agricultural Economics. 73: 860-866.

Hilbe, J. 1999. sg102: Zero-truncated Poisson and negative binomial regression. STATA
   Technical Bulletin No. 47.

Lambert, D. 1992. Zero-Inflated Poisson regression, with an application to defects in
   manufacturing. Technometrics. 34: 1-14.



                                                                                     20

Loomis, J. 2003. Travel cost demand model based river recreation benefit estimates with
   on-site and household surveys: comparative results and a correction procedure. Water
   Resources Research. 39(4): 1105.

Martinez-Espineira, R. and J. Amoako-Tuffour. 2005. Recreation demand analysis under
   truncation, overdispersion, and endogenous stratification: an application to Gros
   Morne National Park.      Working Paper 2005-03. Department of Economics, St.
   Francis Xavier University: Canada.

Mullahy, J. 1986. Specification and testing of some modified count data models. Journal
   of Econometrics. 33: 341-365.

Ovaskainen, K., J. Mikkola and E. Pouta. 2001. Estimating recreation demand with on-
   site data: an application of truncated and endogenously stratified count data models.
   Journal of Forest Economics. 7(2): 125-144.

Scrogin, D., K. Boyle, G. Parsons and A. Plantinga. 2004. Effects of regulations on
   expected catch, expected harvest, and site choice of recreational anglers. American
   Journal of Agricultural Economics. 86(4): 963-974.

Shaw, D. 1988. On-site samples' regression: problems of non-negative integers,
   truncation and endogenous stratification. Journal of Econometrics. 37: 211-223.

Shaw, W. D. and P. Jakus. 1996. Travel cost models of the demand for rock climbing.
   Agricultural and Resource Economics Review. 25: 133-142.

Shaw, W. D., E. Fadali, and F. Lupi. 2003. Comparing consumer's surplus estimates
   calculated from intercept and general survey data. Proceedings of the W-133
   (U.S.D.A.) Regional Economics Group, compiled by J. S. Shonkwiler. Las Vegas,
   Nevada, February.

Shonkwiler, J. S. and W. D. Shaw. 1996. Hurdle count-data models in recreation demand
   analysis. Journal of Agricultural and Resource Economics. 21: 210-219.

Vuong, Q. 1989. Likelihood ratio tests for model selection and non-nested hypotheses.
   Econometrica. 57: 307-334.

Yen, S. T. and W. L. Adamowicz. 1993. Statistical properties of welfare measures from
   count-data models of recreation demand. Review of Agricultural Economics. 15: 203-
   215.




                                                                                    21

                            Appendix 1: Descriptive statistics for the Household (HH) and Tourist survey (Tourist)

         Variable                    Mean                       Standard deviation                    Minimum                       Maximum


                           HH w/     HH w/    Tourist    HH w/       HH w/        Tourist   HH w/     HH w/     Tourist    HH w/     HH w/     Tourist
                          Trips > 0 Trips  0  Trips  1  Trips > 0   Trips  0     Trips  1  Trips > 0  Trips  0  Trips  1  Trips > 0  Trips  0  Trips  1
Visits (person-day-trips)      3.24      0.81      3.17      7.36        3.95         5.75          1         0         1       100        100       50
Travel costs ($USD)            9.42      9.00    10.23     10.28         5.15         7.58       0.06      0.06       0.1       147        147       41
Income ($USD)                1,861     1,383     2,933     1,623        1,246       2,052        150        120       480    14,976    14,976    15,120
Age (years)                      39       44         36        12          14          13         18         18        18         76       81        71
Household size                    5         4         5         2           2            1          1         1         2         12        13        8
Education (years)                11       10         10         2           2            2          0         0         5         14        14       14
Past visitation (1=yes)         1.0      0.95      0.94         0        0.22         0.24          1         0         0          1         1        1
Yerevan city (1=yes)           0.80      0.82         -      0.40        0.38            -          0         0         -          1         1        -
Lake Sevan (1=yes)             0.12      0.06      1.00      0.33        0.24         0.00          0         0         1          1         1        1


Observations                    842     3358        389




                                                                                                                                                       22