Estimation and Inference For Threshold Effects in Panel Data Stochastic Frontier Models
Estimation and Inference For Threshold Effects in Panel Data Stochastic Frontier Models
frontier models
Clment Ylou
Center for Research on the Economics of Agrifood (CREA).
Mailing address: 4424 Pavillon Paul-Comtois,
FSAA, Universit Laval, Qubec, Qc, Canada, G1K 7P4.
TEL: (418) 656-2131 ext. 7241; FAX: (418) 656-7821
E-mail: [email protected]
Bruno Larue
Holder of the Canada Research Chair in International Agri-Food Trade,
Director of the Center for Research on the Economics of Agrifood (CRA),
Universit Laval. Mailing address: 4417 Pavillon Paul-Comtois, FSAA,
Universit Laval, Qubec, Qubec Canada G1K 7P4.
TEL: (418) 656 2131 ext. 5098. FAX: (418) 656 7821.
Email: [email protected]
Kien C. Tran
Department of Economics, University of Lethbridge.
Mailing address: 4401 University Drive, Lethbridge,
Alberta, T1K 3M4 Canada; E-mail: [email protected]
Copyright 2007 by [Ylou, C.; Larue, B. and Tran, K.]. All rights reserved. Readers may make
verbatim copies of this document for non-commercial purposes by any means, provided that this
copyright notice appears on all such copies.
1
Abstract
One of the most enduring problems in cross-section or panel data models is heterogeneity
among individual observations. Different approaches have been proposed to deal with this issue,
but threshold regression models offer intuitively appealing econometric methods to account for
heterogeneity. We propose three different estimators that can accommodate multiple thresholds.
The first two, allowing respectively for fixed and random effects, assume that the firms specific
inefficiency scores are time-invariant while the third one allows for time-varying inefficiency scores.
We rely on a likelihood ratio test with m 1 regimes under the null against m regimes. Testing
for threshold effects is problematic because of the presence of a nuisance parameter which is not
identified under the null hypothesis. This is known as Davies problem. We apply procedures
pioneered by Hansen (1999) to test for the presence of threshold effects and to obtain a confidence
set for the threshold parameter. These procedures specifically account for Davies problem and are
based on non-standard asymptotic theory. Finally, we perform an empirical application of the fixed
effects model on a panel of Quebec dairy farms. The specifications involving a trend and the Cobb-
Douglas and Translog functional forms support three thresholds or four regimes based on farm size.
The efficiency scores vary between 0.95 and 1 in models with and without thresholds. Therefore,
productivity differences across farm sizes are most likely due to technological heterogeneity.
Key words: Stochastic frontier models; threshold regression; technical efficiency; bootstrap;
dairy production.
i
Contents
1 Introduction 1
2 Framework 3
3 Estimation methods 5
3.1 Time-invariant fixed effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Time-invariant random effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Independent time-varying technical inefficiency model . . . . . . . . . . . . . . . . . 6
6 Empirical application 12
6.1 Data sources and descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 A stochastic production frontier with a homogenous technology . . . . . . . . . . . . 13
6.3 A stochastic production frontier with threshold(s) . . . . . . . . . . . . . . . . . . . . 14
7 Conclusion 17
List of Tables
1 Summary statistics for dairy production variables . . . . . . . . . . . . . . . . . . . . 12
2 Summary statistics for estimated technical efficiency scores : production frontier
without any threshold effects under fixed-effects inefficiency . . . . . . . . . . . . . . 13
3 Tests of m-1 thresholds against m under fixed-effects inefficiency: bootstrap p-values 14
4 Point estimates and 95% level confidence set for threshold parameters in a m thresh-
olds model under fixed-effects inefficiency . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Empirical coverage rates for the delta method and the Fieller method based con-
fidence intervals for a parameter ratio in a multinomial probit model with a logit
kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Summary statistics for estimated technical efficiency scores: production frontier with
threshold effects under fixed-effects inefficiency . . . . . . . . . . . . . . . . . . . . . 16
ii
1 Introduction
Structural change and threshold effects are two related issues that have motivated considerable
empirical and theoretical research in time series econometrics (e.g. Tsay (1989, 1998), Enders and
Granger (1998), Hansen (2000b, 2000a)). This paper considers statistical inference methods for
threshold effects in panel data stochastic frontier models. One of the most enduring problems in
cross-section or panel data models is heterogeneity among individual observations. One approach
to address the heterogeneity issue is to compare a regression function that is identical across all
observations in a sample to a set of regression functions that allow for observations to fall into
discrete classes as in Hansen (1999).
Threshold regression models offer intuitively appealing econometric methods to account for het-
erogeneity. In the context of stochastic production frontier models, the question may be whether
large firms use a production technology that differs from that of small firms. This would allow
researchers to determine whether the higher productivity of large firms stems from the use of a
different technology or simply a more efficient use of inputs given the constraints imposed by the
common technology as measured by technical efficiency scores (see Tran and Tsionas (2006)). Re-
lated methods that allow for heterogeneity in stochastic frontier models include latent class models
(Greene (2002, 2005); Orea and Kumbhakar (2004)), random coefficients models (Tsionas (2002);
Greene (2002, 2005)) and Markov switching frontier models (Tsionas and Kumbhakar (2004)). The
distinguishing feature of threshold models is that they assume that heterogeneity is induced by an
observable exogenous variable, e.g. firm size, while in the other methods cited above heterogeneity
is introduced in the models through exogenous variables or unobservable random terms.
Recently, Tsionas and Tran (2006) have proposed various models to allow for heterogeneity in
technology and in the distribution of technical inefficiency. Bayesian inference methods are proposed
for the estimation of these models and for model comparisons. Bayesian tools such as the posterior
odds ratio and the Bayes factor are proposed for model selection, including the comparison of a
threshold model against a model without threshold effects. These statistics are used as evidence
pertaining to the presence of threshold effects in the data. However, from a classical inference
approach, such evidence needs to be based on a test of the null hypothesis of no threshold effect.
Testing for threshold effects is problematic and requires non standard tools because of the presence
of a nuisance parameter which is not identified under the null hypothesis. This is known as Davies
problem and appropriate techniques have been proposed in Davies (1987), Andrews (1993) and
Hansen (1996, 1999, 2000a). For our specific threshold effects problem, the nuisance parameter
is the value of the threshold. In this paper, we consider one of the threshold models analyzed
in Tsionas and Tran (2006), the simple threshold stochastic frontier model and provide a testing
1
strategy for the presence of threshold effects in a parametric stochastic frontier model with panel
data.
Our methodology is anchored on three formulations of the panel data stochastic frontier model,
which differ by the time dependence of the inefficiency term as follows: (i) a fixed effect time
invariant inefficiency term, (ii) a random effect time invariant inefficiency term, and (iii) a random
time varying inefficiency term. For specifications (i)-(ii), we assume that the technical inefficiency
term is a firm-specific constant, so we obtain a fixed effects or random effects panel data model as in
Schmidt and Sickles (1984), Horrace and Schmidt (1996) and Greene (1997). These specifications
of the panel data stochastic frontier model have the advantage of not requiring any distributional
assumption for technical inefficiency. Therefore, for the fixed effects case we apply procedures
pioneered by Hansen (1999) to test for the presence of threshold effects and to obtain a confidence
set for the threshold parameter. These procedures are based on non-standard asymptotic theory
and specifically account for Davies problem. We then examine the extension of these procedures to
random effects the case. However, these time invariant specifications for the inefficiency term may
not be adequate for panel data with a number of time periods large enough to jeopardize the validity
of the assumption of constant technical inefficiency. For long panels, our alternative specification
(iii) is more appropriate. With this specification, we assume a half-normal distribution for the
inefficiency term and a normal distribution for the two-sided error term of the model. We consider
sup-type tests initially proposed by Davies (1987) and extended by Andrews (1993) and Hansen
(1996). Given a known specific value for the threshold parameter, the model is estimated by the
maximum likelihood method without threshold effects (the model under the null hypothesis) and
with threshold effects (the model under the alternative hypothesis). For both models, we measure
technical inefficiency using the Jondrow, Lovell, Materov and Schmidt (1982) estimator. As in
Hansen (1999, 2000a), our test statistic is a LR-type statistic defined from the residuals sums of
squares under the null and the alternative hypotheses respectively. Since the value of the threshold
is unknown, we consider a supremum of the test statistic over a relevant subset of values of the
threshold parameter. The problem under consideration is more complex than the one considered
in Hansen (1999, 2000a) because we address Davies problem for a highly nonlinear model. As
a result, the asymptotic theory for inference on the threshold parameter is non-standard and we
propose a bootstrap strategy to obtain an asymptotic p-value and to construct a confidence set. Our
bootstrap method involves a combination of bootstrap techniques used for the stochastic frontier
model (Hall, Hrdle and Simar (1995), Simar and Wilson (2000), Kim, Kim and Schmidt (2006))
and the bootstrap procedure proposed in Hansen (2000a). The test procedures discussed in this
paper have wide-ranging empirical applications. To illustrate the applicability of the proposed tests,
we report results from one empirical application involving a panel of 302 dairy farms located in the
2
province of Quebec and observed during 11 years, over the period 1993-2003. For this application,
the threshold variable is the number of dairy cows, a proxy for farm size.
The rest of the paper is organized as follows. Section 2 describes the basic framework under
which our estimators and testing procedures are developed. The three different estimators are
presented in Section 3 while Sections 4 describes the test statistic about a single regime/technology.
Section 5 focuses on inference issues pertaining to the threshold parameter and methods to address
them. Section 6 presents results from an application involving Quebec dairy farms. This section
showcases our fixed effects estimator and our testing procedure to identify the presence of one
or more thresholds. The concluding section summarizes our contribution to the literature and
discusses future research avenues.
2 Framework
We consider the following threshold effects panel data stochastic frontier model
yit = + 1 xit I (qit ) + 2 xit I (qit > ) uit + vit , uit 0, (2.1)
where for firm i at time period t, i = 1, ..., N, t = 1, ..., T , yit is the logarithm of output, xit Rk is a
vector of logarithm of inputs, I (.) is the indicator function, 1 and 2 are two vectors of parameters
associated with two different technologies 1 and 2 . qit is an exogenous and observable threshold
variable that governs the technology regime of firms. is the threshold value such that if qit
then firm i adopts the technology 1 at time period t, otherwise firm i adopts technology 2 . vit is
statistical error term, and uit 0 represents technical inefficiency. We assume throughout that the
error term vit is independent and identically distributed with mean zero and finite variance 2v . For
1 = 2 , we get the basic panel data stochastic frontier model (see Pitt and Lee (1981), Schmidt
and Sickles (1984), Cornwell and Schmidt (1995), Greene (1997)). As in Hansen (1999), this model
can be written in a more compact form as follows. Let
xit I (qit )
xit () = ,
xit I (qit > )
and = 1 , 2 . With this notation, equation (2.1) can be written as
Statistical procedures to test for threshold effects in this model will strongly depend on distribu-
tional and time dependence assumptions made on the inefficiency term uit . Our analysis considers
in turn the following cases:
3
Case 1 uit is a fixed time invariant effect, uit i , for all t = 1, ..., T.
Case 2 uit is a time-invariant random variable ui .
Case 3 uit is a time-varying random variable.
Under Case 1, model (2.2) can be written as a fixed effects panel data model. Let i = i ;
then i for all i and i may take positive or negative values. Therefore, we can re-write model
(2.2) as the following non-dynamic panel model with firm-specific fixed effects:
Model (2.3) assumes absence of any unmeasured time invariant heterogeneity across firms (for
further details see Greene (2005, p. 277))1 . The time invariance assumption for technical inefficiency
may be an unreasonable one in long panels. Kumbhakar (1990) argued that this assumption is
inadequate because firms aware of their relative inefficiency would take steps to catch-up over time.
However, this fixed effects formulation is standard in the panel data stochastic frontier literature
and has the obvious advandage that no distributional or independence assumption on inefficiency
terms is needed (Schmidt and Sickles (1984), Greene (1997), Horrace and Schmidt (1996), Kim
et al. (2006)). For least squares estimation and asymptotic inference on threshold effects in this
model, we rely on Hansen (1999).
Under Case 2, we get the random effects stochastic frontier model (see Pitt and Lee (1981),
Schmidt and Sickles (1984))
One further assumes that inefficiencies ui are uncorrelated with the regressors, which implies that
any unmeasured heterogeneity across firms must be independent of the inputs variables.
Finally, Case 3 represents a more flexible and realistic model by having inefficiencies vary over
time for each firm. This is an obvious advantage when dealing with long panels. For simplicity, we
assume in addition that uit and vit are independent over time and across individuals, so no specific
panel data treatment is needed (Greene (1997)). For various formulations and specifications for the
time dependence of technical inefficiency, see Cornwell, Schmidt and Sickles (1990), Kumbhakar
(1990), Lee and Schmidt (1993) and Battese and Coelli (1992, 1995) among others; we defer the
extension of our test methods to accomodate these models to future research.
1
This model is different from the true fixed effects stochastic frontier model, which is subject (i) to practical
estimation problems as the number of firms in the sample is very large, and (ii) to the incidental parameters problem
Greene (2005, p. 277).
4
3 Estimation methods
Under Case 1, the stochastic frontier model, written in the form (2.3), is the standard threshold
regression for non-dynamic panel with individual-specific fixed effects discussed by Hansen (1999).
Estimates for threshold and slopes parameters can be obtained using a least squares estimation.
Specifically, the estimation proceeds as follows. Assume that is known and let
T T T
y i = T 1 yit , xi () = T 1 xit () , vi = T 1 vit ; i = 1, ..., N.
t=1 t=1 t=1
where
yit = yit y i , xit () = xit () xi () , vit
= vit vi ; i = 1, ..., N, t = 1, ..., T.
Y = X () + v , (3.6)
where Y , X () and v are the data stacked over all N firms and over T time periods as follows:
) where y = (y , y , ..., y ) ; proceed similarly to obtain X () and
for Y , form Y = (y1 , ..., yN i i1 i2 iT
v . From (3.6), the ordinary least squares estimator of as a function of is given by
1
F () = X () X () X () Y ,
Since is unknown, it must be estimated from the data set. Least squares estimation of can be
done by minimization of the residual sum of squares as
The minimization in (3.8) can be restricted to a specific subset , where is the set of all
possible values of , if we want a minimal percentage of the observations to lie in each of the two
technology regimes defined by the threshold. A grid search over values in is used in practice to
solve this problem; see Hansen (1999, pp. 349-350) for details. The final estimate of the regression
coefficients is F = F ( F ) ; the vector of residuals is vF = Y X ( F ) F ( F ) and the error
variance is estimated by 2vF = (1/N T ) SF ( F ).
5
3.2 Time-invariant random effects model
We now consider the stochastic frontier model defined by (2.4). For any given , the inefficiency
terms ui are assumed to be uncorrelated with the inputs variables xit (). In addition, we assume
that the ui are i.i.d. with E (ui ) = and V ar (ui ) = 2u and that ui are independent of the vit . It
is convenient to rewrite the model as follows. Let = , and ui = ui . Then, (2.4) is
equivalent to
yit = + xit () ui + vit ; i = 1, ..., N, t = 1, ..., T.
Then, we form the residual and the residual sum of squares of the random effects model as
T N
vitR () = yit R () xit ()
i () , SR () = vitR () .
t=1 i=1
As is the case of the fixed effects model, needs to be estimated from the data set, and we also
rely on least squares estimation method. Thus, R is defined by
Under Case 3 and under the assumption that the inefficiency terms uit are serially and contempo-
raneously uncorrelated we get, for any given , the panel data version of the standard stochastic
frontier model. These assumptions correspond to that maintained in the various threshold stochas-
tic frontier models discussed in Tsionas and Tran (2006) and imply that despite its variation over
time, there is non persistance effect in technical inefficiency. Estimation proceeds as set in Aigner,
K. and Schmidt (1977) and Jondrow et al. (1982) for the case of cross-sectional data.
6
i.i.d.
Assuming that is known, let it = vit uit , where vit N(0, 2v ), and uit = |Uit | , Uit
N(, 2u ), i = 1, ..., N, t = 1, ..., T. Under these distributional assumptions, the parameters of the
models
can be estimated using the maximum likelihood (ML) method.
Let I () , I () , 2uI () , 2vI () denote the ML estimates of , , 2u , 2v , given a specified
value . The technical inefficiency term can then be estimated by the ML estimate of the conditional
expectation E (uit |it = eit ), where E (.|it = eit ) is the conditional expectation operator conditioned
on it = eit . The result is as follows:
eit () / () e ()
uit () = E (uit |it = eit ()) =
it () ,
1 eit () / () ()
where and denotes the standard normal density and cumulative distribution function and
H0 : 1 = 2 . (4.11)
7
Clearly, under H0 the model (2.1) takes the form
which does not involve the threshold parameter . So for the problem at hand, the parameter is
not identified under the null hypothesis and usual test statistics have non-standard distributions.
This is the so-called Davies Problem (Davies (1977, 1987)). For this problem, Hansen (1999)
suggested to simulate the non-standard asymptotic distribution of the likelihood ratio (LR) test
using a bootstrap method. The test procedure proposed in Hansen (1999) works as follows.
For Case 1 (it is similar for Cases 2 and 3), we estimate the fixed-effects panel data stochastic
frontier model associated to model 4.12 under Case 1 using the fixed-effect transformation as
described in section 3.1. Let us write the model after the within transformation as
yit = 1 xit + vit
, (4.13)
where yit , xit , and vit are the within transformation version of yit , xit , and vit respectively (see
section 3.1). For further reference, let 1F denote the within estimator of 1 . Let vF denote the
vector of residuals and S0F = (vF ) (vF ) be the residual sum of squares under H0 .The LR test
statistic may be defined as
LRF = (S0F SF ( F )) / 2vF . (4.14)
The statistic LRF has a non-standard asymptotic distribution whose characteristics may be affected
by the asymmetric distribution of the technical efficiency terms. This is likely to be problematic in
the case of random-effects and time varying technical inefficiency models. We rely on the bootstrap
procedure proposed by Hansen (1999) for the standard fixed-effects panel model, even though
its validity has not been established yet for the latter two cases. The resampling is based on the
sample of firms, and once a firm is selected all its observations over
the T periods areincluded in the
bootstrap sample. We resample residuals as follows. Let vF,i
= v , v , ..., v
F,i1 F,i2 F,iT , i = 1, ..., N ,
denote the T 1 vector
of residuals computed
for firm i from the model assuming
threshold effects.
Then form the sample vF,1 F,2 F,N . The empirical distribution of vF,1 , vF,2 , ..., vF,N
, v , ..., v is
used for bootstrapresampling, i.e. we draw randomly with replacement a sample of size N from
F,N . These draws are treated as errors to be used to create a bootstrap sample
, v , ..., v
vF,1 F,2
(b) (b) (b)
under H0 . For each bootstrap replication b = 1, ..., B, let v1 , ..., vi , ..., vN represents the
bootstrap draw. We should generate the output variable using
(b) (b)
yit = yit + vit ,
8
where yit is the predicted value of yit under H0 . In the case of the fixed-effects model, we consider
= x , while for the random-effects and time varying technical inefficiency models,
yit yit 1F it
prediction of yit under H0 should explicitly account
for the estimated value of the inefficiency term
(b)
ui or uit . Using the bootstrap sample data yit , xit , we estimate in turn the model under H0
and without imposing H0 . For the fixed-effects model, these correspond to models (4.13) and (3.5)
(b)
respectively. We compute the bootstrap value LRF of the LR test statistic using 4.14. If we let
LR0F denote the value of the test statistic calculated from the observed data, we can define the
approximate bootstrap p-value pB LR0F as
0 B GB LR0F + 1
pB LRF = , (4.15)
B+1
(b)
where B GB LR0F is the number of bootstrap statistics LRF greater than or equal to LR0F . A
test of level , 0 < < 1, is defined by the critical region pB LR0F ; that is, we reject the null
hypothesis at level if pB LR0F , 0 < < 1.
9
where we index on m to emphasize that the test statistic is defined for any of the three model for-
mulations and corresponding estimation methods. Hansen (1999, 2000a) shows that the asymptotic
distribution of LRm ( 0 ) under H0 ( 0 ) is non-standard and free of nuisance parameters.
asy
Under regularity conditions, LRm ( 0 ) , where is a random variable with distribution
function P ( x) = (1 exp (x/2))2 . The critical value of the latter distribution at level , 0 <
< 1, is c () = 2 ln 1 1 . An asymptotic test of H0 ( 0 ) rejects at level if LRm ( 0 ) >
c (). A (1 )-level confidence set for can be defined by the no-rejection region of the LR test
as
CS (; ) = { 0 : LRm ( 0 ) c ()} . (5.17)
The asymptotic validity of this confidence set requires, among other conditions (Hansen (2000a, p.
579)), that the difference in the slope parameters between the two regimes be small and tend to
zero as the sample size increases. This confidence set is rather asymptotically conservative if the
error terms vit are i.i.d. N 0, 2v and strictly independent of the regressors and of the threshold
variable (see Hansen (2000a, Theorem 3)). Even if the gaussian errors assumption is not unusual
in the literature on parametric stochastic frontier models, we also consider an alternative bootstrap
approach to confidence set estimation of the threshold parameter.
10
(b)
Then, using the bootstrap data set Zit : i = 1, ..., N ; t = 1, ..., T , estimate the stochastic fron-
tier model 2.1 using any of the three formulations and corresponding estimation techniques; let
(b) denote the bootstrap estimate of . The key result of the bootstrap is that, conditionally
on
the observed data {Zit : i = 1, ..., N ; t = 1, ..., T }, the asymptotic distribution of N 1/2 (b)
approximates the asymptotic sampling distribution of
N 1/2 ( ) for any b = 1, ..., B. The con-
ditional distribution of the bootstrap estimator N 1/2 (b) can be approximated by Monte
Carlo replication of the resampling procedure. So, the collection (b) : b = 1, 2, ..., B can be
treated as a random sample from the asymptotic distribution of . So, this sample can be used
to construct a confidence interval for .
To obtain a confidence interval based on the percentile method, we need to compute the quantiles
q () of the empirical distribution (b) : b = 1, 2, ..., B as q () = G1
,B () , 0 1, where
Moreover, due to bias in the sample estimate , there is some bias in the position of the
bootstrap estimates (b) relative to . Therefore, generally it does not hold that G,B () = 1/2,
which means that the bootstrap sample (b) : b = 1, 2, ..., B is not centered around the sample
estimate . We can construct a bias-corrected confidence interval for as follows. Let be the
standard normal cumulative distribution function and z denote the standard normal cut-off point
of level , 0 1; then, q () = G1
,B ( (z )). Define
qbc ( ) = G1 1
,B [ (m + (m + z ))] = G,B [ (2m + z )] , 0 < < 1, (5.19)
where m = 1 (G,B ()) is a bias-correction term. Then, the lower and upper confidence limits
of a bias-corrected confidence interval for with asymptotic confidence level (1 ) , 0 1
are respectively given by
bc bc
L, = q (1 /2) , bc bc
U, = q (/2) . (5.20)
The accuracy of these confidence intervals in term of coverage rate strongly relies on the quality of
the bootstrap approximation.
We next report results from an empirical application of one of the methods discussed previously
to an empirical data set featuring a panel of dairy farms located in the province of Quebec.
11
Table 1. Summary statistics for dairy production variables
6 Empirical application
We consider a balanced panel covering 11 annual observations for 302 dairy farms that were in
business between 1993 and 2003. Thus, our data set has a total of 3322 observations. This so-
called Agritel database was collected by the Federation of Management Clubs in the province of
Quebec. Summary statistics on the different variables used in our stochastic frontier production
models and the threshold variable are presented in Table 1.
Canadas dairy production is governed by a supply management policy featuring tight import
controls and domestic production quotas to insure a fair return for dairy producers. Basically,
supply is constrained to achieve a domestic price target (Larue, Gervais and Pouliot (2007)). Indi-
vidual production licences or quotas are traded between producers within the province of Quebec
through a double-auction. The value of these individual quotas has steadily increased over time
and represents a significant financial barrier deterring entry and expansion. This explains why
the average number of cows is low compared to U.S. standards and why there are so few large
dairy farms in Quebec2 . The inputs selected as arguments of the production function are the most
important ones in terms of cost shares. The standard deviations are much smaller than the means
because there is a significant proportion of farms that are quite similar size-wise. We begin our
investigation with a fixed effects stochastic frontier model without threshold(s).
2
According to http://www.dairyfarmingtoday.org/DairyFarmingToday/Learn-More/Facts-And-Figures/ con-
sulted on May 30, 2007, the average herd size in the U.S. is 135 cows. See also Romain and Sumner (2001) on
comparisons between the Canadian and U.S. dairy industries.
12
Table 2. Summary statistics for estimated technical efficiency scores derived from a fixed-effects
production frontier without threshold(s)
The fixed effects stochastic frontier model without threshold can be considered as our benchmark.
We estimated four different versions to assess the robustness of the results. We consider two
different functional forms for the production technology which could be specified with or without
a trend. The most popular functional forms used in the applied literature are the Cobb-Douglas
and the Translog. The latter is more flexible than the former, but it involves the estimation of
more parameters which increases the risk of convergence problems. The presence of a trend allows
for dynamic effects or structural change. The summary statistics for estimated technical efficiency
scores derived from the four competing specifications are presented in Table 2. Our results suggest
that the choice of the functional form does not have much influence on the central tendency and
dispersion statistics of the (time-invariant) efficiency scores. The mean and median are very close
to 96% in all cases. The standard deviations are very small, which is not surprising given that the
minima vary between 94% and 95%. Such high efficiency scores for Quebec dairy farms are to be
expected because the supply management policy has been in place for a long time and, despite all
of its flaws, it cannot be denied that it has contributed to create a stable environment for dairy
farmers. Technical efficiency is a relative concept since the frontier is defined by the firms included
in the sample. The Quebec dairy industry is subject to far less volatility than the U.S. dairy
industry and this should make management easier.
13
Table 3. Tests of m 1 thresholds against m in a fixed-effects production frontier: bootstrap
p-values
Even though Quebec has a high proportion of small dairy farms, not all of the farms use the same
milking system. Some farms are large enough to mix their feed on the farm. Some have little land or
are located in areas where it is difficult to produce corn. Hence, it is not inappropriate to entertain
the possibility that farms need not have the exact same technology. In this section, we posit that
technological jumps occur at various farm sizes. The methodology presented previously focused on a
single threshold parameter allowing for two regimes or production technologies. However, it is easy
to accommodate multiple thresholds and to use the LR statistic to find the appropriate thresholds
consistent with the data (see Hansen (1999, Section 5)). We find numerically the least squares
estimates of the threshold parameters through a grid search over 500 quantiles of the empirical
distribution of the threshold variable; we trimmed out top and bottom 1% or 5%. We used 500
replications for the bootstrap tests, which implies that 250000 regressions were needed to run a
test.
In our application, we allowed for up to three thresholds supporting four different regimes.
Table 3 reports test results pertaining to the number of thresholds. Under the null hypothesis,
the model has m 1 thresholds while the alternative has m thresholds. The presence of a trend
in the specification makes a huge difference and in the Cobb-Douglas and Translog cases, there is
empirical evidence for three thresholds. For the Translog without trend, there is apparently only
one threshold (interpreting a p-value of 0.08 as rejection at 10% level). For the Cobb-Douglas case
without trend, the tests results suggest that there is no evidence for the presence of any threshold
value in the model.
The point estimates for the threshold parameters are presented in Table 4 along with lower
14
Table 4. Point estimates and 95% level confidence set for threshold parameters in a m thresholds
fixed-effects production frontier
Table 5. Regression estimates: triple threshold model for Cobb-Douglas technology with a trend
under fixed-effects inefficiency
15
Table 6. Summary statistics for estimated technical efficiency scores derived from a threshold effects
stochastic production frontier with fixed-effects inefficiency
and upper bounds of the corresponding 95% confidence sets for the Cobb-Douglas and Translog
forms with and without a trend. The presence of thresholds in the Cobb-Douglas model without
a trend did not significantly improve the model without threshold and this is why there are no
thresholds reported. In contrast, the Cobb-Douglas frontier with trend has three thresholds whose
point estimates are 34, 45 and 66. The second and third thresholds have narrow confidence sets,
but the first threshold has a high upper bound. The point estimates obtained from the Translog
with a trend are nearly identical, but the confidence sets differ. In this instance, the confidence set
for the first threshold is very narrow while the second and third thresholds have low lower bounds.
The Translog frontier without a trend supports a single threshold. The latters point estimate is
48 with a lower bound of 46 and an upper bound of 49. Some of our confidence sets are skewed,
as either the lower bound or the upper bound of the bootstrap confidence set are very close to
the reported point estimate. This is also apparent in Hansen (1999) but to a lesser degree. The
implication is that the probability that the true threshold be far away from the point estimate is
quite low. This is why for instance the null of two thresholds is soundly rejected (p-value equals
.006) even though the confidence set of the first threshold spans the confidence set of the second
threshold.
Table 5 reports estimates of the coefficients characterizing the production technologies of the
four regimes associated with the Cobb-Douglas with trend frontier. The concentrate coefficients
vary between 0.095 and 0.161 across regimes while the range for the forage coefficients is 0.031-
0.059. The coefficients on capital are small and not significantly different from zeros for the three
smallest categories of farms. In contrast, labour is most important for the smallest farm group.
16
The labour coefficient for the smallest farms is roughly 50% larger than that for the largest farms.
The trend coefficients are very similar across regimes.
Results about the efficiency scores associated with the threshold models are presented in Table
6. The mean efficiency level is close to 96% in all cases. This is what we got with the estimation of
a stochastic frontier without thresholds. This suggests that productivity advantage of larger dairy
farms over smaller farms are due to technological advantages and not to technical efficiency.
7 Conclusion
Heterogeneity among individual observations in cross-section or panel data models is an issue
that has motivated a rapidly-increasing literature. Applied econometricians estimating panel data
stochastic frontier models are routinely confronted to this problem. In this paper, we propose three
different estimators allowing for multiple thresholds to address the heterogeneity issue. Inference
is problematic in threshold models because of nuisance parameters not identified under the null
hypothesis. We built on procedures developed by Hansen (1999) in developing a likelihood ratio
test enabling us to test for m 1 regimes under the null against m regimes. We also develop a
bootstrap procedure to conduct statistical inference about the threshold parameters.
Our empirical application features the estimation of a fixed effects stochastic frontier model on
a panel of Quebec dairy farms. We found evidence of threshold effects, but the latter depend on the
presence or absence of a trend and the choice of functional form. The efficiency scores are highly
concentrated at the top for models with and without thresholds. We conclude that productivity
differences across farm sizes are most likely due to technological heterogeneity.
Future version of this paper will showcase applications of the other proposed estimators and
analyse the distributions of efficiency scores within and between regimes.
References
Aigner, D. J., K., L. C. A. and Schmidt, P. (1977), Formulation and estimation of stochastic
frontier production functions, Journal of Econometrics 6, 2137.
Andrews, D. W. K. (1993), Tests for parameter instability and structural change with unknown
change point, Econometrica 61, 821856.
Bai, J., Lumsdaine, R. L. and Stock, J. H. (1998), Testing and dating common breaks in multi-
variate time series, The Review of Economic Studies 65(3), 395432.
17
Battese, G. E. and Coelli, T. J. (1992), Frontier production functions, technical efficiency and panel
data with application to paddy farmers in India, Journal of Productivity Analysis 3, 153169.
Battese, G. E. and Coelli, T. J. (1995), A model for technical inefficiency effects in a stochastic
frontier production function for panel data, Empirical Economics 20, 325332.
Cornwell, C. and Schmidt, P. (1995), Production frontiers and efficiency measurement, in L. Matyas
and P. Sevestre, eds, Econometrics of Panel Data : Handbook of Theory and Applications,
2nd Edition, Kluwer Academic Publishers, Boston.
Cornwell, C., Schmidt, P. and Sickles, R. C. (1990), Production frontiers with cross-sectional and
time-series variation in efficiency levels, Journal of Econometrics 46(1-2), 185200.
Davies, R. B. (1977), Hypothesis testing when a nuisance parameter is present only under the
alternative, Biometrika 64, 247254.
Davies, R. B. (1987), Hypothesis testing when a nuisance parameter is present only under the
alternative, Biometrika 74, 3343.
Enders, W. and Granger, C. W. J. (1998), Unit-root tests and asymmetric adjustment with an
example using the term structure of interest rates, Journal of Business & Economic Statistics
16(3), 30411.
Greene, W. H. (2002), Alternative panel data estimators for stochastic frontier models, Working
papers, Department of Economics, Stern School of Business, NYU.
Hall, P., Hrdle, W. and Simar, L. (1995), Iterated bootstrap with applications to frontier models,
Journal of Productivity Analysis 6, 63 76.
Hansen, B. E. (1996), Inference when a nuisance parameter is not identified under the null hy-
pothesis, Econometrica 64, 413430.
Hansen, B. E. (1999), Threshold effects in non-dynamic panels: Estimation, testing and inference,
Journal of Econometrics 93, 345368.
18
Hansen, B. E. (2000a), Sample splitting and threshold estimation, Econometrica 68, 575603.
Hansen, B. E. (2000b), Testing for structural change in conditional models, Journal of Economet-
rics 97, 93115.
Horrace, W. C. and Schmidt, P. (1996), Confidence statements for efficiency estimates from sto-
chastic frontier models, Journal of Productivity Analysis 7, 257282.
Jondrow, J., Lovell, C. A. K., Materov, I. S. and Schmidt, P. (1982), On the estimation of technical
inefficiency in the stochastic frontier production function model, Journal of Econometrics
19, 23338.
Kim, M., Kim, Y. and Schmidt, P. (2006), On the accuracy of bootstrap confidence intervals for
efficiency levels in stochastic frontier models with panel data, Technical Report October 2006,
Michigan State University, USA.
Kumbhakar, S. C. (1990), Production frontiers, panel data, and time varying technical inefficiency,
Journal of Econometrics 46, 201211.
Larue, B., Gervais, J. and Pouliot, S. (2007), Should tariff-rate quotas mimic quotas? implications
for liberalization under a supply management policy, North American Journal of Economics
and Finance Forthcoming.
Lee, Y. and Schmidt, P. (1993), A production frontier model with flexible temporal variation in
technical efficiency, in H. K. Fried, K. Lovell and S. Schmidt, eds, The Measurement of
Productive Efficiency, Oxford University Press, New York.
Orea, L. and Kumbhakar, S. C. (2004), Efficiency measurement using a stochastic frontier latent
class model, Empirical Economics 29, 6983.
Pitt, M. M. and Lee, M.-F. (1981), The measurement and sources of technical inefficiency in the
indonesian weaving industry, Journal of Development Economics 9, 4364.
Romain, R. and Sumner, D. (2001), Dairy economic and policy issues between Canada and the
United States, Canadian Journal of Agricultural Economics 49, 479492.
Schmidt, P. and Sickles, R. C. (1984), Production frontiers and panel data, Journal of Business
and Economic Statistics 2, 367374.
Seo, M. H. and Linton, O. (2007), A smoothed least squares estimator for threshold regression
models, Journal of Econometrics Forthcoming.
19
Simar, L. and Wilson, P. W. (2000), A general methodology for boostrapping in non-parametric
frontier models, Journal of Applied Statistics 27(6), 779802.
Tran, K. C. and Tsionas, E. G. (2006), Fixed effect threshold stochastic frontier model with an
application, Technical report, Department of economics, Athens University of Economics and
Business, Athens, Greece.
Tsay, R. S. (1989), Testing and modeling threshold autoregressive processes, Journal of the Amer-
ican Statistical Association 84, 231240.
Tsay, R. S. (1998), Testing and modeling multivariate threshold models, Journal of the American
Statistical Association 93(443), 11881202.
Tsionas, E. G. (2002), Stochastic frontier models with random coefficients, Journal of Applied
Econometrics 17, 127147.
Tsionas, E. G. and Kumbhakar, S. C. (2004), Markov switching stochastic frontier model, The
Econometrics Journal 7, 128.
Tsionas, E. G. and Tran, K. C. (2006), Bayesian inference in threshold stochastic frontier models,
Technical report, Department of economics, Athens University of Economics and Business,
Athens, Greece.
20