0% found this document useful (0 votes)
45 views23 pages

Estimation and Inference For Threshold Effects in Panel Data Stochastic Frontier Models

Yélou paper

Uploaded by

ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views23 pages

Estimation and Inference For Threshold Effects in Panel Data Stochastic Frontier Models

Yélou paper

Uploaded by

ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Estimation and Inference for Threshold effects in panel data stochastic

frontier models

Clment Ylou
Center for Research on the Economics of Agrifood (CREA).
Mailing address: 4424 Pavillon Paul-Comtois,
FSAA, Universit Laval, Qubec, Qc, Canada, G1K 7P4.
TEL: (418) 656-2131 ext. 7241; FAX: (418) 656-7821
E-mail: [email protected]

Bruno Larue
Holder of the Canada Research Chair in International Agri-Food Trade,
Director of the Center for Research on the Economics of Agrifood (CRA),
Universit Laval. Mailing address: 4417 Pavillon Paul-Comtois, FSAA,
Universit Laval, Qubec, Qubec Canada G1K 7P4.
TEL: (418) 656 2131 ext. 5098. FAX: (418) 656 7821.
Email: [email protected]

Kien C. Tran
Department of Economics, University of Lethbridge.
Mailing address: 4401 University Drive, Lethbridge,
Alberta, T1K 3M4 Canada; E-mail: [email protected]

Selected Paper prepared for presentation at the American Agricultural Economics


Association Annual Meeting, Portland, OR, July 29-August 1, 2007

Copyright 2007 by [Ylou, C.; Larue, B. and Tran, K.]. All rights reserved. Readers may make
verbatim copies of this document for non-commercial purposes by any means, provided that this
copyright notice appears on all such copies.

1
Abstract
One of the most enduring problems in cross-section or panel data models is heterogeneity
among individual observations. Different approaches have been proposed to deal with this issue,
but threshold regression models offer intuitively appealing econometric methods to account for
heterogeneity. We propose three different estimators that can accommodate multiple thresholds.
The first two, allowing respectively for fixed and random effects, assume that the firms specific
inefficiency scores are time-invariant while the third one allows for time-varying inefficiency scores.
We rely on a likelihood ratio test with m 1 regimes under the null against m regimes. Testing
for threshold effects is problematic because of the presence of a nuisance parameter which is not
identified under the null hypothesis. This is known as Davies problem. We apply procedures
pioneered by Hansen (1999) to test for the presence of threshold effects and to obtain a confidence
set for the threshold parameter. These procedures specifically account for Davies problem and are
based on non-standard asymptotic theory. Finally, we perform an empirical application of the fixed
effects model on a panel of Quebec dairy farms. The specifications involving a trend and the Cobb-
Douglas and Translog functional forms support three thresholds or four regimes based on farm size.
The efficiency scores vary between 0.95 and 1 in models with and without thresholds. Therefore,
productivity differences across farm sizes are most likely due to technological heterogeneity.

Key words: Stochastic frontier models; threshold regression; technical efficiency; bootstrap;
dairy production.

Journal of Economic Literature classification: C12, C13, C23, C52.

i
Contents
1 Introduction 1

2 Framework 3

3 Estimation methods 5
3.1 Time-invariant fixed effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Time-invariant random effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Independent time-varying technical inefficiency model . . . . . . . . . . . . . . . . . 6

4 Testing for a threshold 7

5 Inference about the threshold parameter 9


5.1 Inverting a likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Bootstrap confidence set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Empirical application 12
6.1 Data sources and descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 A stochastic production frontier with a homogenous technology . . . . . . . . . . . . 13
6.3 A stochastic production frontier with threshold(s) . . . . . . . . . . . . . . . . . . . . 14

7 Conclusion 17

List of Tables
1 Summary statistics for dairy production variables . . . . . . . . . . . . . . . . . . . . 12
2 Summary statistics for estimated technical efficiency scores : production frontier
without any threshold effects under fixed-effects inefficiency . . . . . . . . . . . . . . 13
3 Tests of m-1 thresholds against m under fixed-effects inefficiency: bootstrap p-values 14
4 Point estimates and 95% level confidence set for threshold parameters in a m thresh-
olds model under fixed-effects inefficiency . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Empirical coverage rates for the delta method and the Fieller method based con-
fidence intervals for a parameter ratio in a multinomial probit model with a logit
kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Summary statistics for estimated technical efficiency scores: production frontier with
threshold effects under fixed-effects inefficiency . . . . . . . . . . . . . . . . . . . . . 16

ii
1 Introduction
Structural change and threshold effects are two related issues that have motivated considerable
empirical and theoretical research in time series econometrics (e.g. Tsay (1989, 1998), Enders and
Granger (1998), Hansen (2000b, 2000a)). This paper considers statistical inference methods for
threshold effects in panel data stochastic frontier models. One of the most enduring problems in
cross-section or panel data models is heterogeneity among individual observations. One approach
to address the heterogeneity issue is to compare a regression function that is identical across all
observations in a sample to a set of regression functions that allow for observations to fall into
discrete classes as in Hansen (1999).
Threshold regression models offer intuitively appealing econometric methods to account for het-
erogeneity. In the context of stochastic production frontier models, the question may be whether
large firms use a production technology that differs from that of small firms. This would allow
researchers to determine whether the higher productivity of large firms stems from the use of a
different technology or simply a more efficient use of inputs given the constraints imposed by the
common technology as measured by technical efficiency scores (see Tran and Tsionas (2006)). Re-
lated methods that allow for heterogeneity in stochastic frontier models include latent class models
(Greene (2002, 2005); Orea and Kumbhakar (2004)), random coefficients models (Tsionas (2002);
Greene (2002, 2005)) and Markov switching frontier models (Tsionas and Kumbhakar (2004)). The
distinguishing feature of threshold models is that they assume that heterogeneity is induced by an
observable exogenous variable, e.g. firm size, while in the other methods cited above heterogeneity
is introduced in the models through exogenous variables or unobservable random terms.
Recently, Tsionas and Tran (2006) have proposed various models to allow for heterogeneity in
technology and in the distribution of technical inefficiency. Bayesian inference methods are proposed
for the estimation of these models and for model comparisons. Bayesian tools such as the posterior
odds ratio and the Bayes factor are proposed for model selection, including the comparison of a
threshold model against a model without threshold effects. These statistics are used as evidence
pertaining to the presence of threshold effects in the data. However, from a classical inference
approach, such evidence needs to be based on a test of the null hypothesis of no threshold effect.
Testing for threshold effects is problematic and requires non standard tools because of the presence
of a nuisance parameter which is not identified under the null hypothesis. This is known as Davies
problem and appropriate techniques have been proposed in Davies (1987), Andrews (1993) and
Hansen (1996, 1999, 2000a). For our specific threshold effects problem, the nuisance parameter
is the value of the threshold. In this paper, we consider one of the threshold models analyzed
in Tsionas and Tran (2006), the simple threshold stochastic frontier model and provide a testing

1
strategy for the presence of threshold effects in a parametric stochastic frontier model with panel
data.
Our methodology is anchored on three formulations of the panel data stochastic frontier model,
which differ by the time dependence of the inefficiency term as follows: (i) a fixed effect time
invariant inefficiency term, (ii) a random effect time invariant inefficiency term, and (iii) a random
time varying inefficiency term. For specifications (i)-(ii), we assume that the technical inefficiency
term is a firm-specific constant, so we obtain a fixed effects or random effects panel data model as in
Schmidt and Sickles (1984), Horrace and Schmidt (1996) and Greene (1997). These specifications
of the panel data stochastic frontier model have the advantage of not requiring any distributional
assumption for technical inefficiency. Therefore, for the fixed effects case we apply procedures
pioneered by Hansen (1999) to test for the presence of threshold effects and to obtain a confidence
set for the threshold parameter. These procedures are based on non-standard asymptotic theory
and specifically account for Davies problem. We then examine the extension of these procedures to
random effects the case. However, these time invariant specifications for the inefficiency term may
not be adequate for panel data with a number of time periods large enough to jeopardize the validity
of the assumption of constant technical inefficiency. For long panels, our alternative specification
(iii) is more appropriate. With this specification, we assume a half-normal distribution for the
inefficiency term and a normal distribution for the two-sided error term of the model. We consider
sup-type tests initially proposed by Davies (1987) and extended by Andrews (1993) and Hansen
(1996). Given a known specific value for the threshold parameter, the model is estimated by the
maximum likelihood method without threshold effects (the model under the null hypothesis) and
with threshold effects (the model under the alternative hypothesis). For both models, we measure
technical inefficiency using the Jondrow, Lovell, Materov and Schmidt (1982) estimator. As in
Hansen (1999, 2000a), our test statistic is a LR-type statistic defined from the residuals sums of
squares under the null and the alternative hypotheses respectively. Since the value of the threshold
is unknown, we consider a supremum of the test statistic over a relevant subset of values of the
threshold parameter. The problem under consideration is more complex than the one considered
in Hansen (1999, 2000a) because we address Davies problem for a highly nonlinear model. As
a result, the asymptotic theory for inference on the threshold parameter is non-standard and we
propose a bootstrap strategy to obtain an asymptotic p-value and to construct a confidence set. Our
bootstrap method involves a combination of bootstrap techniques used for the stochastic frontier
model (Hall, Hrdle and Simar (1995), Simar and Wilson (2000), Kim, Kim and Schmidt (2006))
and the bootstrap procedure proposed in Hansen (2000a). The test procedures discussed in this
paper have wide-ranging empirical applications. To illustrate the applicability of the proposed tests,
we report results from one empirical application involving a panel of 302 dairy farms located in the

2
province of Quebec and observed during 11 years, over the period 1993-2003. For this application,
the threshold variable is the number of dairy cows, a proxy for farm size.
The rest of the paper is organized as follows. Section 2 describes the basic framework under
which our estimators and testing procedures are developed. The three different estimators are
presented in Section 3 while Sections 4 describes the test statistic about a single regime/technology.
Section 5 focuses on inference issues pertaining to the threshold parameter and methods to address
them. Section 6 presents results from an application involving Quebec dairy farms. This section
showcases our fixed effects estimator and our testing procedure to identify the presence of one
or more thresholds. The concluding section summarizes our contribution to the literature and
discusses future research avenues.

2 Framework
We consider the following threshold effects panel data stochastic frontier model

yit = + 1 xit I (qit ) + 2 xit I (qit > ) uit + vit , uit 0, (2.1)

where for firm i at time period t, i = 1, ..., N, t = 1, ..., T , yit is the logarithm of output, xit Rk is a
vector of logarithm of inputs, I (.) is the indicator function, 1 and 2 are two vectors of parameters
associated with two different technologies 1 and 2 . qit is an exogenous and observable threshold
variable that governs the technology regime of firms. is the threshold value such that if qit
then firm i adopts the technology 1 at time period t, otherwise firm i adopts technology 2 . vit is
statistical error term, and uit 0 represents technical inefficiency. We assume throughout that the
error term vit is independent and identically distributed with mean zero and finite variance 2v . For
1 = 2 , we get the basic panel data stochastic frontier model (see Pitt and Lee (1981), Schmidt
and Sickles (1984), Cornwell and Schmidt (1995), Greene (1997)). As in Hansen (1999), this model
can be written in a more compact form as follows. Let
 
xit I (qit )
xit () = ,
xit I (qit > )
 
and = 1 , 2 . With this notation, equation (2.1) can be written as

yit = +  xit () uit + vit , uit 0. (2.2)

Statistical procedures to test for threshold effects in this model will strongly depend on distribu-
tional and time dependence assumptions made on the inefficiency term uit . Our analysis considers
in turn the following cases:

3
Case 1 uit is a fixed time invariant effect, uit i , for all t = 1, ..., T.
Case 2 uit is a time-invariant random variable ui .
Case 3 uit is a time-varying random variable.
Under Case 1, model (2.2) can be written as a fixed effects panel data model. Let i = i ;
then i for all i and i may take positive or negative values. Therefore, we can re-write model
(2.2) as the following non-dynamic panel model with firm-specific fixed effects:

yit = i +  xit () + vit ; i = 1, ..., N, t = 1, ..., T. (2.3)

Model (2.3) assumes absence of any unmeasured time invariant heterogeneity across firms (for
further details see Greene (2005, p. 277))1 . The time invariance assumption for technical inefficiency
may be an unreasonable one in long panels. Kumbhakar (1990) argued that this assumption is
inadequate because firms aware of their relative inefficiency would take steps to catch-up over time.
However, this fixed effects formulation is standard in the panel data stochastic frontier literature
and has the obvious advandage that no distributional or independence assumption on inefficiency
terms is needed (Schmidt and Sickles (1984), Greene (1997), Horrace and Schmidt (1996), Kim
et al. (2006)). For least squares estimation and asymptotic inference on threshold effects in this
model, we rely on Hansen (1999).
Under Case 2, we get the random effects stochastic frontier model (see Pitt and Lee (1981),
Schmidt and Sickles (1984))

yit = +  xit () ui + vit , ui 0; i = 1, ..., N, t = 1, ..., T. (2.4)

One further assumes that inefficiencies ui are uncorrelated with the regressors, which implies that
any unmeasured heterogeneity across firms must be independent of the inputs variables.
Finally, Case 3 represents a more flexible and realistic model by having inefficiencies vary over
time for each firm. This is an obvious advantage when dealing with long panels. For simplicity, we
assume in addition that uit and vit are independent over time and across individuals, so no specific
panel data treatment is needed (Greene (1997)). For various formulations and specifications for the
time dependence of technical inefficiency, see Cornwell, Schmidt and Sickles (1990), Kumbhakar
(1990), Lee and Schmidt (1993) and Battese and Coelli (1992, 1995) among others; we defer the
extension of our test methods to accomodate these models to future research.
1
This model is different from the true fixed effects stochastic frontier model, which is subject (i) to practical
estimation problems as the number of firms in the sample is very large, and (ii) to the incidental parameters problem
Greene (2005, p. 277).

4
3 Estimation methods

3.1 Time-invariant fixed effects model

Under Case 1, the stochastic frontier model, written in the form (2.3), is the standard threshold
regression for non-dynamic panel with individual-specific fixed effects discussed by Hansen (1999).
Estimates for threshold and slopes parameters can be obtained using a least squares estimation.
Specifically, the estimation proceeds as follows. Assume that is known and let
T T T
y i = T 1 yit , xi () = T 1 xit () , vi = T 1 vit ; i = 1, ..., N.
t=1 t=1 t=1

If we apply a fixed-effect transformation to (2.3) in order to remove firm-specific means, we get

yit =  xit () + vit , (3.5)

where

yit = yit y i , xit () = xit () xi () , vit

= vit vi ; i = 1, ..., N, t = 1, ..., T.

Model (3.5) can be written in matrix form as

Y = X () + v , (3.6)

where Y , X () and v are the data stacked over all N firms and over T time periods as follows:
) where y = (y , y , ..., y ) ; proceed similarly to obtain X () and
for Y , form Y = (y1 , ..., yN i i1 i2 iT
v . From (3.6), the ordinary least squares estimator of as a function of is given by
 1
F () = X () X () X () Y ,

and the residual sum of squares is


  
SF () = Y X () () Y X () ()

 1
= Y  I X () X () X () X () Y . (3.7)

Since is unknown, it must be estimated from the data set. Least squares estimation of can be
done by minimization of the residual sum of squares as

F = arg min SF () . (3.8)


The minimization in (3.8) can be restricted to a specific subset , where is the set of all
possible values of , if we want a minimal percentage of the observations to lie in each of the two
technology regimes defined by the threshold. A grid search over values in is used in practice to
solve this problem; see Hansen (1999, pp. 349-350) for details. The final estimate of the regression
coefficients is F = F ( F ) ; the vector of residuals is vF = Y X ( F ) F ( F ) and the error
variance is estimated by 2vF = (1/N T ) SF ( F ).

5
3.2 Time-invariant random effects model

We now consider the stochastic frontier model defined by (2.4). For any given , the inefficiency
terms ui are assumed to be uncorrelated with the inputs variables xit (). In addition, we assume
that the ui are i.i.d. with E (ui ) = and V ar (ui ) = 2u and that ui are independent of the vit . It
is convenient to rewrite the model as follows. Let = , and ui = ui . Then, (2.4) is
equivalent to
yit = +  xit () ui + vit ; i = 1, ..., N, t = 1, ..., T.

where the error terms ui and vit have zero mean.


Assuming that N is large, we can obtain a consistent estimate 2u () of 2u , and we also assume
that a consitent estimator 2v () of 2v is available. Then, the regression coefficients can be
estimated by R () using feasible generalized least squares. Provided T , for firm i, i =
ui can be consistently estimated by
1 T

i () =
yit R () xit () ; i = 1, ..., N.
T t=1

Then, we form the residual and the residual sum of squares of the random effects model as
T N
vitR () = yit R () xit ()
i () , SR () = vitR () .
t=1 i=1

As is the case of the fixed effects model, needs to be estimated from the data set, and we also
rely on least squares estimation method. Thus, R is defined by

R = arg min SR () . (3.9)


The final estimator of the regression coefficients is R = R ( R ) ; the error variance 2v is


estimated by 2vR = (1/N T ) SR ( R ).

3.3 Independent time-varying technical inefficiency model

Under Case 3 and under the assumption that the inefficiency terms uit are serially and contempo-
raneously uncorrelated we get, for any given , the panel data version of the standard stochastic
frontier model. These assumptions correspond to that maintained in the various threshold stochas-
tic frontier models discussed in Tsionas and Tran (2006) and imply that despite its variation over
time, there is non persistance effect in technical inefficiency. Estimation proceeds as set in Aigner,
K. and Schmidt (1977) and Jondrow et al. (1982) for the case of cross-sectional data.

6
i.i.d.
Assuming that is known, let it = vit uit , where vit N(0, 2v ), and uit = |Uit | , Uit
N(, 2u ), i = 1, ..., N, t = 1, ..., T. Under these distributional assumptions, the parameters of the
models
can be estimated using the maximum likelihood (ML) method.
 
Let I () , I () , 2uI () , 2vI () denote the ML estimates of , , 2u , 2v , given a specified
value . The technical inefficiency term can then be estimated by the ML estimate of the conditional
expectation E (uit |it = eit ), where E (.|it = eit ) is the conditional expectation operator conditioned
on it = eit . The result is as follows:


eit () / () e ()
uit () = E (uit |it = eit ()) =
it () ,
1 eit () / () ()

where and denotes the standard normal density and cumulative distribution function and

() = uI () /vI () , 2 () = 2uI () + 2vI () ,


2vI () 2uI () 
() = 2 2 , eit () = yit I () + I () xit () .
vI () + uI ()
We define the residual and the residual sum of squares as
 T N
vitI () = yit I () + I () xit () uit () , SI () = vitI () .
t=1 i=1

The least squares estimator I of is defined by

I = arg min SI () . (3.10)


The final estimator of the model parameters are obtained as





I , I , 2uI , 2vI = I ( I ) , I ( I ) , 2uI ( I ) , 2vI ( I ) .

4 Testing for a threshold


The model formulation (2.1) and the estimation methods discussed in the previous section assumed
that there exists some threshold effect in the data. However, since this formulation introduces an
extra (threshold) parameter in the model, estimation problems may arise due to specification error
when there is actually no threshold effects in the data. Therefore, it is important to assess the
presence of a threshold using a formal statistical test. We rely on the likelihood ratio test proposed
in Hansen (1999).
The null hypothesis of no threshold effect in the model (2.1) can be written as

H0 : 1 = 2 . (4.11)

7
Clearly, under H0 the model (2.1) takes the form

yit = + 1 xit uit + vit , uit 0, (4.12)


i = 1, ..., N, t = 1, ..., T,

which does not involve the threshold parameter . So for the problem at hand, the parameter is
not identified under the null hypothesis and usual test statistics have non-standard distributions.
This is the so-called Davies Problem (Davies (1977, 1987)). For this problem, Hansen (1999)
suggested to simulate the non-standard asymptotic distribution of the likelihood ratio (LR) test
using a bootstrap method. The test procedure proposed in Hansen (1999) works as follows.
For Case 1 (it is similar for Cases 2 and 3), we estimate the fixed-effects panel data stochastic
frontier model associated to model 4.12 under Case 1 using the fixed-effect transformation as
described in section 3.1. Let us write the model after the within transformation as


yit = 1 xit + vit

, (4.13)

where yit , xit , and vit are the within transformation version of yit , xit , and vit respectively (see
section 3.1). For further reference, let 1F denote the within estimator of 1 . Let vF denote the
vector of residuals and S0F = (vF ) (vF ) be the residual sum of squares under H0 .The LR test
statistic may be defined as
LRF = (S0F SF ( F )) / 2vF . (4.14)

The statistic LRF has a non-standard asymptotic distribution whose characteristics may be affected
by the asymmetric distribution of the technical efficiency terms. This is likely to be problematic in
the case of random-effects and time varying technical inefficiency models. We rely on the bootstrap
procedure proposed by Hansen (1999) for the standard fixed-effects panel model, even though
its validity has not been established yet for the latter two cases. The resampling is based on the
sample of firms, and once a firm is selected all its observations over

the T periods are included in the

bootstrap sample. We resample residuals as follows. Let vF,i
= v , v , ..., v
F,i1 F,i2 F,iT , i = 1, ..., N ,
denote the T 1 vector

of residuals computed
for firm i from the model assuming

threshold effects.

Then form the sample vF,1 F,2 F,N . The empirical distribution of vF,1 , vF,2 , ..., vF,N
, v , ..., v is
used for bootstrap resampling, i.e. we draw randomly with replacement a sample of size N from

F,N . These draws are treated as errors to be used to create a bootstrap sample
, v , ..., v
vF,1 F,2


(b) (b) (b)
under H0 . For each bootstrap replication b = 1, ..., B, let v1 , ..., vi , ..., vN represents the
bootstrap draw. We should generate the output variable using

(b) (b)
yit = yit + vit ,

8
where yit is the predicted value of yit under H0 . In the case of the fixed-effects model, we consider
=  x , while for the random-effects and time varying technical inefficiency models,
yit yit 1F it
prediction of yit under H0 should explicitly account

for the estimated value of the inefficiency term
(b)
ui or uit . Using the bootstrap sample data yit , xit , we estimate in turn the model under H0
and without imposing H0 . For the fixed-effects model, these correspond to models (4.13) and (3.5)
(b)
respectively. We compute the bootstrap value LRF of the LR test statistic using 4.14. If we let
LR0F denote the value of the test statistic calculated from the observed data, we can define the
 
approximate bootstrap p-value p B LR0F as
 
 0  B GB LR0F + 1
pB LRF = , (4.15)
B+1
  (b)
where B GB LR0F is the number of bootstrap statistics LRF greater than or equal to LR0F . A
 
test of level , 0 < < 1, is defined by the critical region pB LR0F ; that is, we reject the null
 
hypothesis at level if pB LR0F , 0 < < 1.

5 Inference about the threshold parameter


In the presence of threshold effects, it would be useful to make a statistical inference about the
threshold parameter in addition to have its point estimate discussed in section 3. Indeed, in
the related time series structural change literature a confidence set for the break date can be
constructed using the asymptotic distribution of the estimator of the break point parameter (see
Bai, Lumsdaine and Stock (1998)). In Hansen (2000a), it is shown that the asymptotic distribution
of the threshold estimator = F , R , I is highly non-standard and this distribution depends
on unknown parameters. In such contexts, a confidence set based on the inversion of Wald or t
statistics may behave very poorly in finite sample.

5.1 Inverting a likelihood ratio test

The asymptotic distribution of = F , R , I is highly non-standard and Wald or t statistics-based


confidence sets may not be reliable, particularly in finite sample; Hansen (2000a) recommended
confidence set estimation based on inverting likelihood ratio tests on . Inverting a test with
respect to a parameter means that we collect all the values of this parameter for which the test is
not significant. So we consider the test of the hypothesis H0 ( 0 ) : = 0 , where 0 is any specified
value for . The LR statistic to test H0 ( 0 ) is

LRm ( 0 ) = (Sm ( 0 ) Sm ( m )) /2vm , m = F, R, I, (5.16)

9
where we index on m to emphasize that the test statistic is defined for any of the three model for-
mulations and corresponding estimation methods. Hansen (1999, 2000a) shows that the asymptotic
distribution of LRm ( 0 ) under H0 ( 0 ) is non-standard and free of nuisance parameters.
asy
Under regularity conditions, LRm ( 0 ) , where is a random variable with distribution
function P ( x) = (1 exp (x/2))2 . The critical value of the latter distribution at level , 0 <
 
< 1, is c () = 2 ln 1 1 . An asymptotic test of H0 ( 0 ) rejects at level if LRm ( 0 ) >
c (). A (1 )-level confidence set for can be defined by the no-rejection region of the LR test
as
CS (; ) = { 0 : LRm ( 0 ) c ()} . (5.17)

The asymptotic validity of this confidence set requires, among other conditions (Hansen (2000a, p.
579)), that the difference in the slope parameters between the two regimes be small and tend to
zero as the sample size increases. This confidence set is rather asymptotically conservative if the
 
error terms vit are i.i.d. N 0, 2v and strictly independent of the regressors and of the threshold
variable (see Hansen (2000a, Theorem 3)). Even if the gaussian errors assumption is not unusual
in the literature on parametric stochastic frontier models, we also consider an alternative bootstrap
approach to confidence set estimation of the threshold parameter.

5.2 Bootstrap confidence set

In spite of the presence of unknown parameters in the asymptotic distribution of , we suggest


the use of a bootstrap method to obtain an approximation to the sampling distribution of . The
validity of the bootstrap in this context can be justified using the same arguments as for the case
of bootstrapping the asymptotic distribution of the statistic LRF defined in (4.14) (see Hansen
(1999, 2000a)). We suggest using an i.i.d. resampling scheme as opposed to resampling regression
residuals. The i.i.d. resampling has been recently used by Seo and Linton (2007) for bootstrap
inference on any scalar function of the parameters of a threshold regression model estimated through
a smoothed least squares estimator; note, however, that in Seo and Linton (2007), the estimator
is shown to be asymptotically normal and thus its asymptotic distribution is free of nuisance
parameters.
Let {Zit : i = 1, ..., N ; t = 1, ..., T } denote the data set, with Zit = (yit , xit ) . Then, let Zi =
(Zi1 , Zi2 , ..., ZiT ). In order to account for the panel structure, the empirical distribution to be used
for bootstrapping is (Z1 , Z2 , ..., ZN ) ; that is, resampling is based on firms and once a firm is resam-
pled, all its observations over the T time periods enter the

bootstrap sample. For b = 1, 2, ..., B,
(b) (b) (b)
where B is the number of bootstrap replications used, let Z1 , Z2 , ..., ZN be a random sample


(b) (b) (b) (b)
drawn with replacement from (Z1 , Z2 , ..., ZN ); let Zi = Zi1 , Zi2 , ..., ZiT for all i = 1, 2, ..., N .

10
 
(b)
Then, using the bootstrap data set Zit : i = 1, ..., N ; t = 1, ..., T , estimate the stochastic fron-
tier model 2.1 using any of the three formulations and corresponding estimation techniques; let
(b) denote the bootstrap estimate of . The key result of the bootstrap is that, conditionally

on
the observed data {Zit : i = 1, ..., N ; t = 1, ..., T }, the asymptotic distribution of N 1/2 (b)
approximates the asymptotic sampling distribution of
N 1/2 ( ) for any b = 1, ..., B. The con-

ditional distribution of the bootstrap estimator N 1/2 (b) can be approximated by Monte
 
Carlo replication of the resampling procedure. So, the collection (b) : b = 1, 2, ..., B can be
treated as a random sample from the asymptotic distribution of . So, this sample can be used
to construct a confidence interval for .
To obtain a confidence interval based on the percentile method, we need to compute the quantiles
 
q () of the empirical distribution (b) : b = 1, 2, ..., B as q () = G1
,B () , 0 1, where

 ,B denotes the empirical


G 
cumulative distribution function of
(b)
: b = 1, 2, ..., B . For 0 1, a confidence set of asymptotic level (1 ) for is given by

[ + q (/2) , + q (1 /2)] . (5.18)

Moreover, due to bias in the sample estimate , there is some bias in the position of the
bootstrap estimates (b) relative to . Therefore, generally it does not hold that G,B () = 1/2,
 
which means that the bootstrap sample (b) : b = 1, 2, ..., B is not centered around the sample
estimate . We can construct a bias-corrected confidence interval for as follows. Let be the
standard normal cumulative distribution function and z denote the standard normal cut-off point
of level , 0 1; then, q () = G1
,B ( (z )). Define

qbc ( ) = G1 1
,B [ (m + (m + z ))] = G,B [ (2m + z )] , 0 < < 1, (5.19)

where m = 1 (G,B ()) is a bias-correction term. Then, the lower and upper confidence limits
of a bias-corrected confidence interval for with asymptotic confidence level (1 ) , 0 1
are respectively given by

bc bc
L, = q (1 /2) , bc bc
U, = q (/2) . (5.20)

The accuracy of these confidence intervals in term of coverage rate strongly relies on the quality of
the bootstrap approximation.
We next report results from an empirical application of one of the methods discussed previously
to an empirical data set featuring a panel of dairy farms located in the province of Quebec.

11
Table 1. Summary statistics for dairy production variables

Variables mean Std. dev. Min. Max.


Production function:
Total volume of milk/cow (litre) 8304.03 1281.12 4557.87 12253.09
Concentrates (kg) 2879.73 741.77 632.30 6417.81
Forages (kg) 5273.25 949.78 390.44 9270.93
Capital ($) 4801.67 2545.28 372.84 34917.92
Labor (hour) 57.28 13.92 23.49 120.93
Threshold:
Number of cows 51.64 25.58 18.70 451.90

6 Empirical application

6.1 Data sources and descriptive statistics

We consider a balanced panel covering 11 annual observations for 302 dairy farms that were in
business between 1993 and 2003. Thus, our data set has a total of 3322 observations. This so-
called Agritel database was collected by the Federation of Management Clubs in the province of
Quebec. Summary statistics on the different variables used in our stochastic frontier production
models and the threshold variable are presented in Table 1.
Canadas dairy production is governed by a supply management policy featuring tight import
controls and domestic production quotas to insure a fair return for dairy producers. Basically,
supply is constrained to achieve a domestic price target (Larue, Gervais and Pouliot (2007)). Indi-
vidual production licences or quotas are traded between producers within the province of Quebec
through a double-auction. The value of these individual quotas has steadily increased over time
and represents a significant financial barrier deterring entry and expansion. This explains why
the average number of cows is low compared to U.S. standards and why there are so few large
dairy farms in Quebec2 . The inputs selected as arguments of the production function are the most
important ones in terms of cost shares. The standard deviations are much smaller than the means
because there is a significant proportion of farms that are quite similar size-wise. We begin our
investigation with a fixed effects stochastic frontier model without threshold(s).
2
According to http://www.dairyfarmingtoday.org/DairyFarmingToday/Learn-More/Facts-And-Figures/ con-
sulted on May 30, 2007, the average herd size in the U.S. is 135 cows. See also Romain and Sumner (2001) on
comparisons between the Canadian and U.S. dairy industries.

12
Table 2. Summary statistics for estimated technical efficiency scores derived from a fixed-effects
production frontier without threshold(s)

Specification Cobb-Douglas Translog


Statistics No trend Trend No trend Trend
Mean 96.03 96.64 95.69 96.58
Stand. dev. .69 .65 .72 .64
Median 96 96.65 95.62 96.60
Minimum 94.27 95.09 94.04 95
Note. This table reports descriptive statistics for technical efficiency scores (in %) estimated in the framework
of a panel data stochastic production frontier model with fixed-effects inefficieny terms. The estimation
method assumes that there is at least one fully efficient firm in the sample, so the maximum value is 100 for
all model specifications.

6.2 A stochastic production frontier with a homogenous technology

The fixed effects stochastic frontier model without threshold can be considered as our benchmark.
We estimated four different versions to assess the robustness of the results. We consider two
different functional forms for the production technology which could be specified with or without
a trend. The most popular functional forms used in the applied literature are the Cobb-Douglas
and the Translog. The latter is more flexible than the former, but it involves the estimation of
more parameters which increases the risk of convergence problems. The presence of a trend allows
for dynamic effects or structural change. The summary statistics for estimated technical efficiency
scores derived from the four competing specifications are presented in Table 2. Our results suggest
that the choice of the functional form does not have much influence on the central tendency and
dispersion statistics of the (time-invariant) efficiency scores. The mean and median are very close
to 96% in all cases. The standard deviations are very small, which is not surprising given that the
minima vary between 94% and 95%. Such high efficiency scores for Quebec dairy farms are to be
expected because the supply management policy has been in place for a long time and, despite all
of its flaws, it cannot be denied that it has contributed to create a stable environment for dairy
farmers. Technical efficiency is a relative concept since the frontier is defined by the firms included
in the sample. The Quebec dairy industry is subject to far less volatility than the U.S. dairy
industry and this should make management easier.

13
Table 3. Tests of m 1 thresholds against m in a fixed-effects production frontier: bootstrap
p-values

Specification Cobb-Douglas Translog


m No trend Trend No trend Trend
1 .627 .007 .076 .004
2 .406 .001 .650 .004
3 .771 .006 .720 .018
Note. The numbers in this table are bootstrap p-values for the test of the null hypothesis that there exists
m 1 threshold values for the production function against the alternative of m, m = 1, 2, 3. For a test of
level , the null hypothesis is rejected if the reported p-value is less than or equal to .

6.3 A stochastic production frontier with threshold(s)

Even though Quebec has a high proportion of small dairy farms, not all of the farms use the same
milking system. Some farms are large enough to mix their feed on the farm. Some have little land or
are located in areas where it is difficult to produce corn. Hence, it is not inappropriate to entertain
the possibility that farms need not have the exact same technology. In this section, we posit that
technological jumps occur at various farm sizes. The methodology presented previously focused on a
single threshold parameter allowing for two regimes or production technologies. However, it is easy
to accommodate multiple thresholds and to use the LR statistic to find the appropriate thresholds
consistent with the data (see Hansen (1999, Section 5)). We find numerically the least squares
estimates of the threshold parameters through a grid search over 500 quantiles of the empirical
distribution of the threshold variable; we trimmed out top and bottom 1% or 5%. We used 500
replications for the bootstrap tests, which implies that 250000 regressions were needed to run a
test.
In our application, we allowed for up to three thresholds supporting four different regimes.
Table 3 reports test results pertaining to the number of thresholds. Under the null hypothesis,
the model has m 1 thresholds while the alternative has m thresholds. The presence of a trend
in the specification makes a huge difference and in the Cobb-Douglas and Translog cases, there is
empirical evidence for three thresholds. For the Translog without trend, there is apparently only
one threshold (interpreting a p-value of 0.08 as rejection at 10% level). For the Cobb-Douglas case
without trend, the tests results suggest that there is no evidence for the presence of any threshold
value in the model.
The point estimates for the threshold parameters are presented in Table 4 along with lower

14
Table 4. Point estimates and 95% level confidence set for threshold parameters in a m thresholds
fixed-effects production frontier

Specification Cobb-Douglas Translog


Parameter Trend No trend Trend
1 34.4 42.6 34.1
1 1L 34.0 42.5 34.9
1U 67.3 48.4 34.6
2 45.1 - 44.7
2 2L 44.7 - 26.2
2U 50.0 - 45.5
3 66.3 - 66.7
3 3L 65.6 - 44.7
3U 68.1 - 67.7
Note. This table reports the point estimates and the lower and upper bounds of 95% level confidence sets
for the threshold parameters constructed by inverting an LR test statistic in a model with fixed-effects
inefficiency terms. The threshold parameters are 1 , 2 , 3 ; i , i = 1, ..., 3 denote the point estimate of i ;
iL and iU respectively denote the lower and upper bounds of the confidence set.

Table 5. Regression estimates: triple threshold model for Cobb-Douglas technology with a trend
under fixed-effects inefficiency

regime 1 regime 2 regime 3 regime 4


Variables Estimate t-ratio Estimate t-ratio Estimate t-ratio Estimate t-ratio
Concent. .1185 7.57 .1495 14.22 0.1612 15.30 0.0950 5.14
Forages .0487 3.75 .0317 3.87 0.0362 3.79 0.0588 4.72
Capital -.0034 -.53 .0042 0.89 0.0029 0.67 0.0152 2.45
Labor .0911 5.22 .0424 3.26 0.0150 1.11 0.0582 3.29
Trend .0257 22.32 .0256 33.24 0.0205 25.89 0.0252 21.67
Note. Results for the estimation of stochastic production frontier with three thresholds values with fixed
effects inefficiency; the production function relies on a Cobb-Douglas technology with a trend; t-ratios based
on White-corrected standard errors are in parentheses.

15
Table 6. Summary statistics for estimated technical efficiency scores derived from a threshold effects
stochastic production frontier with fixed-effects inefficiency

Specification Cobb-Douglas Translog


Statistics Trend No trend Trend
Mean 96.68 95.93 96.64
Stand. dev. .62 .73 .66
Median 96.72 95.84 96.67
Minimum 95.11 94.31 95.03
Note. This table reports descriptive statistics for technical efficiency scores (in %) estimated in the framework
of a threshold panel data stochastic production frontier model with fixed-effects inefficieny terms. The
estimation method assumes that there is at least one fully efficient firm in the sample, so the maximum value
is 100 for all model specifications.

and upper bounds of the corresponding 95% confidence sets for the Cobb-Douglas and Translog
forms with and without a trend. The presence of thresholds in the Cobb-Douglas model without
a trend did not significantly improve the model without threshold and this is why there are no
thresholds reported. In contrast, the Cobb-Douglas frontier with trend has three thresholds whose
point estimates are 34, 45 and 66. The second and third thresholds have narrow confidence sets,
but the first threshold has a high upper bound. The point estimates obtained from the Translog
with a trend are nearly identical, but the confidence sets differ. In this instance, the confidence set
for the first threshold is very narrow while the second and third thresholds have low lower bounds.
The Translog frontier without a trend supports a single threshold. The latters point estimate is
48 with a lower bound of 46 and an upper bound of 49. Some of our confidence sets are skewed,
as either the lower bound or the upper bound of the bootstrap confidence set are very close to
the reported point estimate. This is also apparent in Hansen (1999) but to a lesser degree. The
implication is that the probability that the true threshold be far away from the point estimate is
quite low. This is why for instance the null of two thresholds is soundly rejected (p-value equals
.006) even though the confidence set of the first threshold spans the confidence set of the second
threshold.
Table 5 reports estimates of the coefficients characterizing the production technologies of the
four regimes associated with the Cobb-Douglas with trend frontier. The concentrate coefficients
vary between 0.095 and 0.161 across regimes while the range for the forage coefficients is 0.031-
0.059. The coefficients on capital are small and not significantly different from zeros for the three
smallest categories of farms. In contrast, labour is most important for the smallest farm group.

16
The labour coefficient for the smallest farms is roughly 50% larger than that for the largest farms.
The trend coefficients are very similar across regimes.
Results about the efficiency scores associated with the threshold models are presented in Table
6. The mean efficiency level is close to 96% in all cases. This is what we got with the estimation of
a stochastic frontier without thresholds. This suggests that productivity advantage of larger dairy
farms over smaller farms are due to technological advantages and not to technical efficiency.

7 Conclusion
Heterogeneity among individual observations in cross-section or panel data models is an issue
that has motivated a rapidly-increasing literature. Applied econometricians estimating panel data
stochastic frontier models are routinely confronted to this problem. In this paper, we propose three
different estimators allowing for multiple thresholds to address the heterogeneity issue. Inference
is problematic in threshold models because of nuisance parameters not identified under the null
hypothesis. We built on procedures developed by Hansen (1999) in developing a likelihood ratio
test enabling us to test for m 1 regimes under the null against m regimes. We also develop a
bootstrap procedure to conduct statistical inference about the threshold parameters.
Our empirical application features the estimation of a fixed effects stochastic frontier model on
a panel of Quebec dairy farms. We found evidence of threshold effects, but the latter depend on the
presence or absence of a trend and the choice of functional form. The efficiency scores are highly
concentrated at the top for models with and without thresholds. We conclude that productivity
differences across farm sizes are most likely due to technological heterogeneity.
Future version of this paper will showcase applications of the other proposed estimators and
analyse the distributions of efficiency scores within and between regimes.

References
Aigner, D. J., K., L. C. A. and Schmidt, P. (1977), Formulation and estimation of stochastic
frontier production functions, Journal of Econometrics 6, 2137.

Andrews, D. W. K. (1993), Tests for parameter instability and structural change with unknown
change point, Econometrica 61, 821856.

Bai, J., Lumsdaine, R. L. and Stock, J. H. (1998), Testing and dating common breaks in multi-
variate time series, The Review of Economic Studies 65(3), 395432.

17
Battese, G. E. and Coelli, T. J. (1992), Frontier production functions, technical efficiency and panel
data with application to paddy farmers in India, Journal of Productivity Analysis 3, 153169.

Battese, G. E. and Coelli, T. J. (1995), A model for technical inefficiency effects in a stochastic
frontier production function for panel data, Empirical Economics 20, 325332.

Cornwell, C. and Schmidt, P. (1995), Production frontiers and efficiency measurement, in L. Matyas
and P. Sevestre, eds, Econometrics of Panel Data : Handbook of Theory and Applications,
2nd Edition, Kluwer Academic Publishers, Boston.

Cornwell, C., Schmidt, P. and Sickles, R. C. (1990), Production frontiers with cross-sectional and
time-series variation in efficiency levels, Journal of Econometrics 46(1-2), 185200.

Davies, R. B. (1977), Hypothesis testing when a nuisance parameter is present only under the
alternative, Biometrika 64, 247254.

Davies, R. B. (1987), Hypothesis testing when a nuisance parameter is present only under the
alternative, Biometrika 74, 3343.

Enders, W. and Granger, C. W. J. (1998), Unit-root tests and asymmetric adjustment with an
example using the term structure of interest rates, Journal of Business & Economic Statistics
16(3), 30411.

Greene, W. H. (1997), Frontier production functions, in H. M. Pesaran and P. Schmidt, eds,


Handbook of Applied Econometrics, Volume II : Microeconomics, Blackwell Publishers, Great
Britain, pp. 81166.

Greene, W. H. (2002), Alternative panel data estimators for stochastic frontier models, Working
papers, Department of Economics, Stern School of Business, NYU.

Greene, W. H. (2005), Reconsidering heterogeneity in panel data estimators of the stochastic


frontier model, Journal of Econometrics 126, 269303.

Hall, P., Hrdle, W. and Simar, L. (1995), Iterated bootstrap with applications to frontier models,
Journal of Productivity Analysis 6, 63 76.

Hansen, B. E. (1996), Inference when a nuisance parameter is not identified under the null hy-
pothesis, Econometrica 64, 413430.

Hansen, B. E. (1999), Threshold effects in non-dynamic panels: Estimation, testing and inference,
Journal of Econometrics 93, 345368.

18
Hansen, B. E. (2000a), Sample splitting and threshold estimation, Econometrica 68, 575603.

Hansen, B. E. (2000b), Testing for structural change in conditional models, Journal of Economet-
rics 97, 93115.

Horrace, W. C. and Schmidt, P. (1996), Confidence statements for efficiency estimates from sto-
chastic frontier models, Journal of Productivity Analysis 7, 257282.

Jondrow, J., Lovell, C. A. K., Materov, I. S. and Schmidt, P. (1982), On the estimation of technical
inefficiency in the stochastic frontier production function model, Journal of Econometrics
19, 23338.

Kim, M., Kim, Y. and Schmidt, P. (2006), On the accuracy of bootstrap confidence intervals for
efficiency levels in stochastic frontier models with panel data, Technical Report October 2006,
Michigan State University, USA.

Kumbhakar, S. C. (1990), Production frontiers, panel data, and time varying technical inefficiency,
Journal of Econometrics 46, 201211.

Larue, B., Gervais, J. and Pouliot, S. (2007), Should tariff-rate quotas mimic quotas? implications
for liberalization under a supply management policy, North American Journal of Economics
and Finance Forthcoming.

Lee, Y. and Schmidt, P. (1993), A production frontier model with flexible temporal variation in
technical efficiency, in H. K. Fried, K. Lovell and S. Schmidt, eds, The Measurement of
Productive Efficiency, Oxford University Press, New York.

Orea, L. and Kumbhakar, S. C. (2004), Efficiency measurement using a stochastic frontier latent
class model, Empirical Economics 29, 6983.

Pitt, M. M. and Lee, M.-F. (1981), The measurement and sources of technical inefficiency in the
indonesian weaving industry, Journal of Development Economics 9, 4364.

Romain, R. and Sumner, D. (2001), Dairy economic and policy issues between Canada and the
United States, Canadian Journal of Agricultural Economics 49, 479492.

Schmidt, P. and Sickles, R. C. (1984), Production frontiers and panel data, Journal of Business
and Economic Statistics 2, 367374.

Seo, M. H. and Linton, O. (2007), A smoothed least squares estimator for threshold regression
models, Journal of Econometrics Forthcoming.

19
Simar, L. and Wilson, P. W. (2000), A general methodology for boostrapping in non-parametric
frontier models, Journal of Applied Statistics 27(6), 779802.

Tran, K. C. and Tsionas, E. G. (2006), Fixed effect threshold stochastic frontier model with an
application, Technical report, Department of economics, Athens University of Economics and
Business, Athens, Greece.

Tsay, R. S. (1989), Testing and modeling threshold autoregressive processes, Journal of the Amer-
ican Statistical Association 84, 231240.

Tsay, R. S. (1998), Testing and modeling multivariate threshold models, Journal of the American
Statistical Association 93(443), 11881202.

Tsionas, E. G. (2002), Stochastic frontier models with random coefficients, Journal of Applied
Econometrics 17, 127147.

Tsionas, E. G. and Kumbhakar, S. C. (2004), Markov switching stochastic frontier model, The
Econometrics Journal 7, 128.

Tsionas, E. G. and Tran, K. C. (2006), Bayesian inference in threshold stochastic frontier models,
Technical report, Department of economics, Athens University of Economics and Business,
Athens, Greece.

20

You might also like