0% found this document useful (0 votes)
1 views64 pages

sfaR

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 64

Package ‘sfaR’

October 29, 2024


Title Stochastic Frontier Analysis Routines
Version 1.0.1
Description Maximum likelihood estimation for stochastic frontier
analysis (SFA) of production (profit) and cost functions. The package
includes the basic stochastic frontier for cross-sectional or pooled
data with several distributions for the one-sided error term (i.e.,
Rayleigh, gamma, Weibull, lognormal, uniform, generalized exponential
and truncated skewed Laplace), the latent class stochastic frontier
model (LCM) as described in Dakpo et al. (2021)
<doi:10.1111/1477-9552.12422>, for cross-sectional and pooled data,
and the sample selection model as described in Greene (2010)
<doi:10.1007/s11123-009-0159-1>, and applied in Dakpo et al. (2021)
<doi:10.1111/agec.12683>. Several possibilities in terms of
optimization algorithms are proposed.
License GPL (>= 3)

URL https://github.com/hdakpo/sfaR

BugReports https://github.com/hdakpo/sfaR/issues
Depends R (>= 3.5.0)
Imports cubature, fastGHQuad, Formula, marqLevAlg, maxLik, methods,
mnorm, nleqslv, plm, qrng, randtoolbox, sandwich, stats,
texreg, trustOptim, ucminf
Suggests lmtest
Encoding UTF-8
Language en-US
LazyData true
RoxygenNote 7.3.1
NeedsCompilation no
Author K Hervé Dakpo [aut, cre],
Yann Desjeux [aut],
Arne Henningsen [aut],
Laure Latruffe [aut]

1
2 sfaR-package

Maintainer K Hervé Dakpo <k-herve.dakpo@inrae.fr>


Repository CRAN
Date/Publication 2024-10-29 08:40:01 UTC

Contents
sfaR-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
dairynorway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
dairyspain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
electricity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
fitted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
ic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
logLik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
marginal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
nobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ricephil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
sfacross . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
sfalcmcross . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
sfaR-deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
sfaselectioncross . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
skewnessTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
swissrailways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
vcov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
worldprod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Index 63

sfaR-package sfaR: A package for estimating stochastic frontier models

Description
The sfaR package provides a set of tools (maximum likelihood - ML and maximum simulated
likelihood - MSL) for various specifications of stochastic frontier analysis (SFA).

Details
Three categories of functions are available: sfacross, sfalcmcross, sfaselectioncross, which
estimate different types of frontiers and offer eleven alternative optimization algorithms (i.e., "bfgs",
"bhhh", "nr", "nm", "cg", "sann", "ucminf", "mla", "sr1", "sparse", "nlminb").
coef 3

sfacross

sfacross estimates the basic stochastic frontier analysis (SFA) for cross-sectional or pooled data
and allows for ten different distributions for the one-sided error term. These distributions include the
exponential, the gamma, the generalized exponential, the half normal, the lognormal, the truncated
normal, the truncated skewed Laplace, the Rayleigh, the uniform, and the Weibull distributions. In
the case of the gamma, lognormal, and Weibull distributions, maximum simulated likelihood (MSL)
is used with the possibility of four specific distributions to construct the draws: halton, generalized
halton, sobol and uniform. Heteroscedasticity in both error terms can be implemented, in addition
to heterogeneity in the truncated mean parameter in the case of the truncated normal and lognormal
distributions. In addition, in the case of the truncated normal distribution, the scaling property can
be estimated.

sfalcmcross

sfalcmcross estimates latent class stochastic frontier models (LCM) for cross-sectional or pooled
data. It accounts for technological heterogeneity by splitting the observations into a maximum num-
ber of five classes. The classification operates based on a logit functional form that can be specified
using some covariates (namely, the separating variables allowing the separation of observations in
several classes). Only the half normal distribution is available for the one-sided error term. Het-
eroscedasticity in both error terms is possible. The choice of the number of classes can be guided
by several information criteria (i.e., AIC, BIC, or HQIC).

sfaselectioncross

sfaselectioncross estimates the frontier for cross-sectional or pooled data in the presence of
sample selection. The model solves the selection bias due to the correlation between the two-
sided error terms in both the selection and the frontier equations. The likelihood can be estimated
using five different possibilities: gauss-kronrod quadrature, adaptive integration over hypercubes
(hcubature and pcubature), gauss-hermite quadrature, and maximum simulated likelihood. Only
the half normal distribution is available for the one-sided error term. Heteroscedasticity in both
error terms is possible.

Bugreport

Any bug or suggestion can be reported using the sfaR tracker facilities at: https://github.com/
hdakpo/sfaR/issues

Author(s)

K Hervé Dakpo, Yann Desjeux, Arne Henningsen and Laure Latruffe

coef Extract coefficients of stochastic frontier models


4 coef

Description

From an object of class 'summary.sfacross', 'summary.sfalcmcross', or 'summary.sfaselectioncross',


coef extracts the coefficients, their standard errors, z-values, and (asymptotic) P-values.
From on object of class 'sfacross', 'sfalcmcross', or 'sfaselectioncross', it extracts only
the estimated coefficients.

Usage

## S3 method for class 'sfacross'


coef(object, extraPar = FALSE, ...)

## S3 method for class 'summary.sfacross'


coef(object, ...)

## S3 method for class 'sfalcmcross'


coef(object, extraPar = FALSE, ...)

## S3 method for class 'summary.sfalcmcross'


coef(object, ...)

## S3 method for class 'sfaselectioncross'


coef(object, extraPar = FALSE, ...)

## S3 method for class 'summary.sfaselectioncross'


coef(object, ...)

Arguments

object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross,


or an object of class 'summary.sfacross', 'summary.sfalcmcross', or
'summary.sfaselectioncross'.
extraPar Logical (default = FALSE). If TRUE, additional parameters are returned:
sigmaSq = sigmauSq + sigmavSq
lambdaSq = sigmauSq/sigmavSq
sigmauSq = exp (W u) = exp (δ ′ Zu )
sigmavSq = exp (W v) = exp (ϕ′ Zv )
sigma = sigmaSq^0.5
lambda = lambdaSq^0.5
sigmau = sigmauSq^0.5
sigmav = sigmavSq^0.5
gamma = sigmauSq/(sigmauSq + sigmavSq)
... Currently ignored.
dairynorway 5

Value
For objects of class 'summary.sfacross', 'summary.sfalcmcross', or 'summary.sfaselectioncross',
coef returns a matrix with four columns. Namely, the estimated coefficients, their standard errors,
z-values, and (asymptotic) P-values.
For objects of class 'sfacross', 'sfalcmcross', or 'sfaselectioncross', coef returns a nu-
meric vector of the estimated coefficients. If extraPar = TRUE, additional parameters, detailed in
the section ‘Arguments’, are also returned. In the case of object of class 'sfalcmcross', each
additional parameter ends with '#' that represents the class number.

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
coef(tl_u_ts, extraPar = TRUE)
coef(summary(tl_u_ts))

## End(Not run)

dairynorway Data on Norwegian dairy farms

Description
This dataset contains nine years (1998-2006) of information on Norwegian dairy farms.

Format
A data frame with 2,727 observations on the following 23 variables.

farmid Farm identification.


year Year identification.
6 dairynorway

y1 Milk sold (1000 liters).


y2 Meat (1000 NOK).
y3 Support payments (1000 NOK).
y4 Other outputs (1000 NOK).
p1 Milk price (NOK/liter).
p2 Meat price (cattle index).
p3 Support payments price (CP index).
p4 Other outputs price index.
x1 Land (decare (daa) = 0.1 ha).
x2 Labour (1000 hours).
x3 Purchase feed (1000 NOK).
x4 Other variable costs (1000 NOK).
x5 Cattle capital (1000 NOK).
x6 Other capital (1000 NOK).
w1 Land price (NOK/daa).
w2 Labour price (NOK/hour).
w3 Feed price index.
w4 Other variable cost index.
w5 Cattle capital rent.
w6 Other capital rent and depreciation.
tc Total cost.

Source

https://sites.google.com/view/sfbook-stata/home

References

Kumbhakar, S.C., H.J. Wang, and A. Horncastle. 2014. A Practitioner’s Guide to Stochastic Fron-
tier Analysis Using Stata. Cambridge University Press.

Examples
str(dairynorway)
summary(dairynorway)
dairyspain 7

dairyspain Data on Spanish dairy farms

Description
This dataset contains six years of observations on 247 dairy farms in northern Spain, drawn from
1993-1998. The original data consist in the farm and year identifications, plus measurements on
one output (i.e. milk), and four inputs (i.e. cows, land, labor and feed).

Format
A data frame with 1,482 observations on the following 29 variables.
FARM Farm identification.
AGEF Age of the farmer.
YEAR Year identification.
COWS Number of milking cows.
LAND Agricultural area.
MILK Milk production.
LABOR Labor.
FEED Feed.
YIT Log of MILK.
X1 Log of COWS.
X2 Log of LAND.
X3 Log of LABOR.
X4 Log of FEED.
X11 1/2 * X1^2.
X22 1/2 * X2^2.
X33 1/2 * X3^2.
X44 1/2 * X4^2.
X12 X1 * X2.
X13 X1 * X3.
X14 X1 * X4.
X23 X2 * X3.
X24 X2 * X4.
X34 X3 * X4.
YEAR93 Dummy for YEAR = 1993.
YEAR94 Dummy for YEAR = 1994.
YEAR95 Dummy for YEAR = 1995.
YEAR96 Dummy for YEAR = 1996.
YEAR97 Dummy for YEAR = 1997.
YEAR98 Dummy for YEAR = 1998.
8 efficiencies

Details
This dataset has been used in Alvarez et al. (2004). The data have been normalized so that the logs
of the inputs sum to zero over the 1,482 observations.

Source
https://pages.stern.nyu.edu/~wgreene/Text/Edition7/tablelist8new.htm

References
Alvarez, A., C. Arias, and W. Greene. 2004. Accounting for unobservables in production models:
management and inefficiency. Econometric Society, 341:1–20.

Examples
str(dairyspain)
summary(dairyspain)

efficiencies Compute conditional (in-)efficiency estimates of stochastic frontier


models

Description
efficiencies returns (in-)efficiency estimates of models estimated with sfacross, sfalcmcross,
or sfaselectioncross.

Usage
## S3 method for class 'sfacross'
efficiencies(object, level = 0.95, newData = NULL, ...)

## S3 method for class 'sfalcmcross'


efficiencies(object, level = 0.95, newData = NULL, ...)

## S3 method for class 'sfaselectioncross'


efficiencies(object, level = 0.95, newData = NULL, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
level A number between between 0 and 0.9999 used for the computation of (in-
)efficiency confidence intervals (defaut = 0.95). Only used when udist =
'hnormal', 'exponential', 'tnormal' or 'uniform' in sfacross.
newData Optional data frame that is used to calculate the efficiency estimates. If NULL
(the default), the efficiency estimates are calculated for the observations that
were used in the estimation. In the case of object of class sfaselectioncross
... Currently ignored.
efficiencies 9

Details
In general, the conditional inefficiency is obtained following Jondrow et al. (1982) and the condi-
tional efficiency is computed following Battese and Coelli (1988). In some cases the conditional
mode is also returned (Jondrow et al. 1982). The confidence interval is computed following Horrace
and Schmidt (1996), Hjalmarsson et al. (1996), or Berra and Sharma (1999) (see ‘Value’ section).
In the case of the half normal distribution for the one-sided error term, the formulae are as follows
(for notations, see the ‘Details’ section of sfacross or sfalcmcross):

• The conditional inefficiency is:

 
µi∗
ϕ σ∗
E [ui |ϵi ] = µi∗ + σ∗  
µi∗
Φ σ∗

where

−Sϵi σu2
µi∗ =
σu2 + σv2

and

σu2 σv2
σ∗2 =
σu2 + σv2

• The Battese and Coelli (1988) conditional efficiency is obtained with:

 
µi∗
1 2 Φ σ∗ − σ∗
 
E [exp (−ui )|ϵi ] = exp −µi∗ + σ∗  
2 Φ µσi∗∗

• The reciprocal of the Battese and Coelli (1988) conditional efficiency is obtained with:

 
µi∗
1 2 Φ σ∗ + σ∗
 
E [exp (ui )|ϵi ] = exp µi∗ + σ∗  
2 Φ µσi∗∗

• The conditional mode is computed using:

M [ui |ϵi ] = µi∗ For µi∗ > 0

and

M [ui |ϵi ] = 0 For µi∗ ≤ 0

• The confidence intervals are obtained with:


10 efficiencies

µi∗ + IL σ∗ ≤ E [ui |ϵi ] ≤ µi∗ + IU σ∗

with LBi = µi∗ + IL σ∗ and U Bi = µi∗ + IU σ∗


and
   
−1
 α µi∗
IL = Φ 1− 1− 1−Φ −
2 σ∗

and
   
α µi∗
IU = Φ−1 1 − 1−Φ −
2 σ∗

Thus

exp (−U Bi ) ≤ E [exp (−ui )|ϵi ] ≤ exp (−LBi )

In the case of the sample selection, as underlined in Greene (2010), the conditional inefficiency
could be computed using Jondrow et al. (1982). However, here the conditionanl (in)efficiency
is obtained using the properties of the closed skew-normal (CSN) distribution (Lai, 2015). The
conditional efficiency can be obtained using the moment generating functions of a CSN distribution
(see Gonzalez-Farias et al. (2004)). We have:
 
˜ + D̃Σ̃D̃′
Φ2 D̃Σ̃t; κ̃, ∆  
1 2
E [exp (tui )|ϵi ] = Mu|ϵ (t) =   exp tπ̃ + t Σ̃
Φ2 0; κ̃, ∆˜ + D̃Σ̃D̃′ 2

ρσv ϵi !
2
−Sϵi σu σv2 σu
2
 Sρ  −Z′si γ − σv2 +σu
2

1 − ρ2 0

where π̃ = 2 , Σ̃ = 2 , D̃ = σv , κ̃ = Sσu 2
ϵi
˜ =
,∆ .
σv2 +σu σv2 +σu 1 0 0
σv2 +σu2

The derivation of the efficiency and the reciprocal efficiency is obtained by replacing t = −1 and
t = 1, respectively. To obtain the inefficiency as E [ui |ϵi ] is more complicated as it requires the
derivation of a multivariate normal cdf. We have:

∂Mu|ϵ (t)
E [ui |ϵi ] =
∂t t=0

Then
 
¨
′ Φ∗2 0; κ̃, ∆

E [ui |ϵi ] = π̃ + D̃Σ̃  
Φ2 0; κ̃, ∆¨

  ¨)
∂Φ2 (s;κ̃,∆
¨ =
where Φ∗2 s; κ̃, ∆ ∂s
efficiencies 11

Value
A data frame that contains individual (in-)efficiency estimates. These are ordered in the same way
as the corresponding observations in the dataset used for the estimation.
- For object of class ’sfacross’ the following elements are returned:

u Conditional inefficiency. In the case argument udist of sfacross is set to 'uniform',


two conditional inefficiency estimates are returned: u1 for the classic condi-
tional inefficiency following Jondrow et al. (1982), and u2 which is obtained
when θ/σv −→ ∞ (see Nguyen, 2010).
uLB Lower bound for conditional inefficiency. Only when the argument udist of
sfacross is set to 'hnormal', 'exponential', 'tnormal' or 'uniform'.
uUB Upper bound for conditional inefficiency. Only when the argument udist of
sfacross is set to 'hnormal', 'exponential', 'tnormal' or 'uniform'.
teJLMS exp (−E[u|ϵ]). When the argument udist of sfacross is set to 'uniform',
teJLMS1 = exp (−E[u1 |ϵ]) and teJLMS2 = exp (−E[u2 |ϵ]). Only when logDepVar
= TRUE.
m Conditional model. Only when the argument udist of sfacross is set to 'hnormal',
'exponential', 'tnormal', or 'rayleigh'.
teMO exp (−m). Only when, in the function sfacross, logDepVar = TRUE and udist
= 'hnormal', 'exponential', 'tnormal', 'uniform', or 'rayleigh'.
teBC Battese and Coelli (1988) conditional efficiency. Only when, in the function
sfacross, logDepVar = TRUE. In the case udist = 'uniform', two conditional
efficiency estimates are returned: teBC1 which is the classic conditional effi-
ciency following Battese and Coelli (1988) and teBC2 when θ/σv −→ ∞ (see
Nguyen, 2010).
teBC_reciprocal
Reciprocal of Battese and Coelli (1988) conditional efficiency. Similar to teBC
except that it is computed as E [exp (u)|ϵ].
teBCLB Lower bound for Battese and Coelli (1988) conditional efficiency. Only when, in
the function sfacross, logDepVar = TRUE and udist = 'hnormal', 'exponential',
'tnormal', or 'uniform'.
teBCUB Upper bound for Battese and Coelli (1988) conditional efficiency. Only when, in
the function sfacross, logDepVar = TRUE and udist = 'hnormal', 'exponential',
'tnormal', or 'uniform'.
theta In the case udist = 'uniform'. u ∈ [0, θ].

- For object of class ’sfalcmcross’ the following elements are returned:

Group_c Most probable class for each observation.


PosteriorProb_c
Highest posterior probability.
u_c Conditional inefficiency of the most probable class given the posterior probabil-
ity.
teJLMS_c exp (−E[uc |ϵc ]). Only when, in the function sfalcmcross logDepVar = TRUE.
12 efficiencies

teBC_c E [exp (−uc )|ϵc ]. Only when, in the function sfalcmcross logDepVar = TRUE.
teBC_reciprocal_c
E [exp (uc )|ϵc ]. Only when, in the function sfalcmcross logDepVar = TRUE.
PosteriorProb_c#
Posterior probability of class #.
PriorProb_c# Prior probability of class #.
u_c# Conditional inefficiency associated to class #, regardless of Group_c.
teBC_c# Conditional efficiency (E [exp (−uc )|ϵc ]) associated to class #, regardless of
Group_c. Only when, in the function sfalcmcross logDepVar = TRUE.
teBC_reciprocal_c#
Reciprocal conditional efficiency (E [exp (uc )|ϵc ]) associated to class #, regard-
less of Group_c. Only when, in the function sfalcmcross logDepVar = TRUE.
ineff_c# Conditional inefficiency (u_c) for observations in class # only.
effBC_c# Conditional efficiency (teBC_c) for observations in class # only.
ReffBC_c# Reciprocal conditional efficiency (teBC_reciprocal_c) for observations in class
# only.
theta_c# In the case udist = 'uniform'. u ∈ [0, θc# ].

- For object of class ’sfaselectioncross’ the following elements are returned:

u Conditional inefficiency.
teJLMS exp (−E[u|ϵ]). Only when logDepVar = TRUE.
teBC Battese and Coelli (1988) conditional efficiency. Only when, in the function
sfaselectioncross, logDepVar = TRUE.
teBC_reciprocal
Reciprocal of Battese and Coelli (1988) conditional efficiency. Similar to teBC
except that it is computed as E [exp (u)|ϵ].

References
Battese, G.E., and T.J. Coelli. 1988. Prediction of firm-level technical efficiencies with a general-
ized frontier production function and panel data. Journal of Econometrics, 38:387–399.
Bera, A.K., and S.C. Sharma. 1999. Estimating production uncertainty in stochastic frontier pro-
duction function models. Journal of Productivity Analysis, 12:187-210.
Gonzalez-Farias, G., Dominguez-Molina, A., Gupta, A. K., 2004. Additive properties of skew
normal random vectors. Journal of Statistical Planning and Inference. 126: 521-534.
Greene, W., 2010. A stochastic frontier model with correction for sample selection. Journal of
Productivity Analysis. 34, 15–24.
Hjalmarsson, L., S.C. Kumbhakar, and A. Heshmati. 1996. DEA, DFA and SFA: A comparison.
Journal of Productivity Analysis, 7:303-327.
Horrace, W.C., and P. Schmidt. 1996. Confidence statements for efficiency estimates from stochas-
tic frontier models. Journal of Productivity Analysis, 7:257-282.
electricity 13

Jondrow, J., C.A.K. Lovell, I.S. Materov, and P. Schmidt. 1982. On the estimation of technical
inefficiency in the stochastic frontier production function model. Journal of Econometrics, 19:233–
238.
Lai, H. P., 2015. Maximum likelihood estimation of the stochastic frontier model with endogenous
switching or sample selection. Journal of Productivity Analysis, 43: 105-117.
Nguyen, N.B. 2010. Estimation of technical efficiency in stochastic frontier analysis. PhD Disser-
tation, Bowling Green State University, August.

See Also
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) + log(wl/wf) +
log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) + I(log(wl/wf) *
log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)), udist = 'tnormal',
muhet = ~ regu, uhet = ~ regu, data = utility, S = -1, scaling = TRUE, method = 'mla')
eff.tl_u_ts <- efficiencies(tl_u_ts)
head(eff.tl_u_ts)
summary(eff.tl_u_ts)

## End(Not run)

electricity Data on U.S. electric power generation

Description
This dataset is on electric power generation in the United States.

Format
A data frame with 123 observations on the following 9 variables.
firm Firm identification.
cost Total cost in 1970, MM USD.
output Output in million KwH.
14 extract

lprice Labor price.


lshare Labor’s cost share.
cprice Capital price.
cshare Capital’s cost share.
fprice Fuel price.
fshare Fuel’s cost share.

Details
The dataset is from Christensen and Greene (1976) and has also been used in Greene (1990).

Source
https://pages.stern.nyu.edu/~wgreene/Text/Edition7/tablelist8new.htm

References
Christensen, L.R., and W.H. Greene. 1976. Economies of scale in US electric power generation.
The Journal of Political Economy, 84:655–676.
Greene, W.H. 1990. A Gamma-distributed stochastic frontier model. Journal of Econometrics,
46:141–163.

Examples
str(electricity)
summary(electricity)

extract Extract frontier information to be used with texreg package

Description
Extract coefficients and additional information for stochastic frontier models returned by sfacross,
sfalcmcross, or sfaselectioncross.

Usage
extract.sfacross(model, ...)

extract.sfalcmcross(model, ...)

extract.sfaselectioncross(model, ...)

Arguments
model objects of class 'sfacross', 'sfalcmcross', or 'sfaselectioncross'
... Currently ignored
fitted 15

Value

A texreg object representing the statistical model.

See Also

sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional data.

Examples
hlf <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'hnormal', uhet = ~ regu, data = utility, S = -1, method = 'bfgs')
trnorm <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, data = utility, S = -1, method = 'bfgs')

tscal <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +


log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility,
S = -1, method = 'bfgs', scaling = TRUE)

expo <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +


log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'exponential', uhet = ~ regu, data = utility, S = -1, method = 'bfgs')

texreg::screenreg(list(hlf, trnorm, tscal, expo))

fitted Extract fitted values of stochastic frontier models

Description

fitted returns the fitted frontier values from stochastic frontier models estimated with sfacross,
sfalcmcross, or sfaselectioncross.
16 fitted

Usage
## S3 method for class 'sfacross'
fitted(object, ...)

## S3 method for class 'sfalcmcross'


fitted(object, ...)

## S3 method for class 'sfaselectioncross'


fitted(object, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
... Currently ignored.

Value
In the case of an object of class 'sfacross', or 'sfaselectioncross', a vector of fitted values is
returned.
In the case of an object of class 'sfalcmcross', a data frame containing the fitted values for each
class is returned where each variable ends with '_c#', '#' being the class number.

Note
The fitted values are ordered in the same way as the corresponding observations in the dataset used
for the estimation.

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on eighty-two countries production (GDP)
# LCM Cobb Douglas (production function) half normal distribution
cb_2c_h <- sfalcmcross(formula = ly ~ lk + ll + yr, udist = 'hnormal',
data = worldprod)
fit.cb_2c_h <- fitted(cb_2c_h)
head(fit.cb_2c_h)

## End(Not run)
ic 17

ic Extract information criteria of stochastic frontier models

Description
ic returns information criterion from stochastic frontier models estimated with sfacross, sfalcmcross,
or sfaselectioncross.

Usage
## S3 method for class 'sfacross'
ic(object, IC = "AIC", ...)

## S3 method for class 'sfalcmcross'


ic(object, IC = "AIC", ...)

## S3 method for class 'sfaselectioncross'


ic(object, IC = "AIC", ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
IC Character string. Information criterion measure. Three criteria are available:
• 'AIC' for Akaike information criterion (default)
• 'BIC' for Bayesian information criterion
• 'HQIC' for Hannan-Quinn information criterion
.
... Currently ignored.

Details
The different information criteria are computed as follows:

• AIC: −2 log LL + 2 ∗ K
• BIC: −2 log LL + log N ∗ K
• HQIC: −2 log LL + 2 log [log N ] ∗ K

where LL is the maximum likelihood value, K the number of parameters estimated and N the
number of observations.

Value
ic returns the value of the information criterion (AIC, BIC or HQIC) of the maximum likelihood
coefficients.
18 logLik

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on Swiss railway
# LCM (cost function) half normal distribution
cb_2c_u <- sfalcmcross(formula = LNCT ~ LNQ2 + LNQ3 + LNNET + LNPK + LNPL,
udist = 'hnormal', uhet = ~ 1, data = swissrailways, S = -1, method='ucminf')
ic(cb_2c_u)
ic(cb_2c_u, IC = 'BIC')
ic(cb_2c_u, IC = 'HQIC')

## End(Not run)

logLik Extract log-likelihood value of stochastic frontier models

Description
logLik extracts the log-likelihood value(s) from stochastic frontier models estimated with sfacross,
sfalcmcross, or sfaselectioncross.

Usage
## S3 method for class 'sfacross'
logLik(object, individual = FALSE, ...)

## S3 method for class 'sfalcmcross'


logLik(object, individual = FALSE, ...)

## S3 method for class 'sfaselectioncross'


logLik(object, individual = FALSE, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
individual Logical. If FALSE (default), the sum of all observations’ log-likelihood values is
returned. If TRUE, a vector of each observation’s log-likelihood value is returned.
... Currently ignored.
marginal 19

Value

logLik returns either an object of class 'logLik', which is the log-likelihood value with the
total number of observations (nobs) and the number of free parameters (df) as attributes, when
individual = FALSE, or a list of elements, containing the log-likelihood of each observation (logLik),
the total number of observations (Nobs) and the number of free parameters (df), when individual
= TRUE.

See Also

sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
logLik(tl_u_ts)

## Using data on eighty-two countries production (GDP)


# LCM Cobb Douglas (production function) half normal distribution
cb_2c_h <- sfalcmcross(formula = ly ~ lk + ll + yr, udist = 'hnormal',
data = worldprod, S = 1)
logLik(cb_2c_h, individual = TRUE)

## End(Not run)

marginal Marginal effects of the inefficiency drivers in stochastic frontier mod-


els

Description

This function returns marginal effects of the inefficiency drivers from stochastic frontier models
estimated with sfacross, sfalcmcross, or sfaselectioncross.
20 marginal

Usage
## S3 method for class 'sfacross'
marginal(object, newData = NULL, ...)

## S3 method for class 'sfalcmcross'


marginal(object, newData = NULL, ...)

## S3 method for class 'sfaselectioncross'


marginal(object, newData = NULL, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
newData Optional data frame that is used to calculate the marginal effect of Z variables
on inefficiency. If NULL (the default), the marginal estimates are calculated for
the observations that were used in the estimation.
... Currently ignored.

Details
marginal operates in the presence of exogenous variables that explain inefficiency, namely the
inefficiency drivers (uhet = Zu or muhet = Zmu ).
Two components are computed for each variable: the marginal effects on the expected inefficiency
∂E[u] ∂V [u]
( ∂Zmu
) and the marginal effects on the variance of inefficiency ( ∂Z mu
).
The model also allows the Wang (2002) parametrization of µ and σu2 by the same vector of exoge-
nous variables. This double parameterization accounts for non-monotonic relationships between
the inefficiency and its drivers.

Value
marginal returns a data frame containing the marginal effects of the Zu variables on the expected
inefficiency (each variable has the prefix 'Eu_') and on the variance of the inefficiency (each vari-
able has the prefix 'Vu_').
In the case of the latent class stochastic frontier (LCM), each variable ends with '_c#' where '#'
is the class number.

References
Wang, H.J. 2002. Heteroscedasticity and non-monotonic efficiency effects of a stochastic frontier
model. Journal of Productivity Analysis, 18:241–253.

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
nobs 21

sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu + wl, uhet = ~ regu + wl, data = utility,
S = -1, scaling = TRUE, method = 'mla')
marg.tl_u_ts <- marginal(tl_u_ts)
summary(marg.tl_u_ts)

## Using data on eighty-two countries production (GDP)


# LCM Cobb Douglas (production function) half normal distribution
cb_2c_h <- sfalcmcross(formula = ly ~ lk + ll + yr, udist = 'hnormal',
data = worldprod, uhet = ~ initStat + h, S = 1, method = 'mla')
marg.cb_2c_h <- marginal(cb_2c_h)
summary(marg.cb_2c_h)

## End(Not run)

nobs Extract total number of observations used in frontier models

Description
This function extracts the total number of ’observations’ from a fitted frontier model.

Usage
## S3 method for class 'sfacross'
nobs(object, ...)

## S3 method for class 'sfalcmcross'


nobs(object, ...)

## S3 method for class 'sfaselectioncross'


nobs(object, ...)

Arguments
object a sfacross, sfalcmcross, or sfaselectioncross object for which the num-
ber of total observations is to be extracted.
... Currently ignored.
22 residuals

Details

nobs gives the number of observations actually used by the estimation procedure. It is not necessar-
ily the number of observations of the model frame (number of rows in the model frame), because
sometimes the model frame is further reduced by the estimation procedure especially in the pres-
ence of NA. In the case of sfaselectioncross, nobs returns the number of observations used in
the frontier equation.

Value

A single number, normally an integer.

See Also

sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples

## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog (cost function) half normal with heteroscedasticity
tl_u_h <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'hnormal', uhet = ~ regu, data = utility, S = -1, method = 'bfgs')
nobs(tl_u_h)

## End(Not run)

residuals Extract residuals of stochastic frontier models

Description

This function returns the residuals’ values from stochastic frontier models estimated with sfacross,
sfalcmcross, or sfaselectioncross.
residuals 23

Usage
## S3 method for class 'sfacross'
residuals(object, ...)

## S3 method for class 'sfalcmcross'


residuals(object, ...)

## S3 method for class 'sfaselectioncross'


residuals(object, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
... Currently ignored.

Value
When the object is of class 'sfacross', or 'sfaselectioncross', residuals returns a vector
of residuals values.
When the object is of 'sfalcmcross', residuals returns a data frame containing the residuals
values for each latent class, where each variable ends with '_c#', '#' being the class number.

Note
The residuals values are ordered in the same way as the corresponding observations in the dataset
used for the estimation.

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional or pooled data.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
resid.tl_u_ts <- residuals(tl_u_ts)
head(resid.tl_u_ts)
24 ricephil

## Using data on eighty-two countries production (GDP)


# LCM Cobb Douglas (production function) half normal distribution
cb_2c_h <- sfalcmcross(formula = ly ~ lk + ll + yr, udist = 'hnormal',
data = worldprod, S = 1)
resid.cb_2c_h <- residuals(cb_2c_h)
head(resid.cb_2c_h)

## End(Not run)

ricephil Data on rice production in the Philippines

Description
This dataset contains annual data collected from 43 smallholder rice producers in the Tarlac region
of the Philippines between 1990 and 1997.

Format
A data frame with 344 observations on the following 17 variables.
YEARDUM Time period (1= 1990, ..., 8 = 1997).
FARMERCODE Farmer code (1, ..., 43).
PROD Output (tonnes of freshly threshed rice).
AREA Area planted (hectares).
LABOR Labor used (man-days of family and hired labor).
NPK Fertiliser used (kg of active ingredients).
OTHER Other inputs used (Laspeyres index = 100 for Farm 17 in 1991).
PRICE Output price (pesos per kg).
AREAP Rental price of land (pesos per hectare).
LABORP Labor price (pesos per hired man-day).
NPKP Fertiliser price (pesos per kg of active ingredient).
OTHERP Price of other inputs (implicit price index).
AGE Age of the household head (years).
EDYRS Education of the household head (years).
HHSIZE Household size.
NADULT Number of adults in the household.
BANRAT Percentage of area classified as bantog (upland) fields.

Details
This dataset is published as supplement to Coelli et al. (2005). While most variables of this dataset
were supplied by the International Rice Research Institute (IRRI), some were calculated by Coelli
et al. (2005, see p. 325–326). The survey is described in Pandey et al. (1999).
sfacross 25

References
Coelli, T. J., Rao, D. S. P., O’Donnell, C. J., and Battese, G. E. 2005. An Introduction to Efficiency
and Productivity Analysis, Springer, New York.
Pandey, S., Masciat, P., Velasco, L, and Villano, R. 1999. Risk analysis of a rainfed rice production
system system in Tarlac, Central Luzon, Philippines. Experimental Agriculture, 35:225–237.

Examples
str(ricephil)
summary(ricephil)

sfacross Stochastic frontier estimation using cross-sectional data

Description
sfacross is a symbolic formula-based function for the estimation of stochastic frontier models in
the case of cross-sectional or pooled cross-sectional data, using maximum (simulated) likelihood -
M(S)L.
The function accounts for heteroscedasticity in both one-sided and two-sided error terms as in
Reifschneider and Stevenson (1991), Caudill and Ford (1993), Caudill et al. (1995) and Hadri
(1999), but also heterogeneity in the mean of the pre-truncated distribution as in Kumbhakar et al.
(1991), Huang and Liu (1994) and Battese and Coelli (1995).
Ten distributions are possible for the one-sided error term and eleven optimization algorithms are
available.
The truncated normal - normal distribution with scaling property as in Wang and Schmidt (2002) is
also implemented.

Usage
sfacross(
formula,
muhet,
uhet,
vhet,
logDepVar = TRUE,
data,
subset,
weights,
wscale = TRUE,
S = 1L,
udist = "hnormal",
scaling = FALSE,
start = NULL,
method = "bfgs",
26 sfacross

hessianType = 1L,
simType = "halton",
Nsim = 100,
prime = 2L,
burn = 10,
antithetics = FALSE,
seed = 12345,
itermax = 2000,
printInfo = FALSE,
tol = 1e-12,
gradtol = 1e-06,
stepmax = 0.1,
qac = "marquardt"
)

## S3 method for class 'sfacross'


print(x, ...)

## S3 method for class 'sfacross'


bread(x, ...)

## S3 method for class 'sfacross'


estfun(x, ...)

Arguments
formula A symbolic description of the model to be estimated based on the generic func-
tion formula (see section ‘Details’).
muhet A one-part formula to consider heterogeneity in the mean of the pre-truncated
distribution (see section ‘Details’).
uhet A one-part formula to consider heteroscedasticity in the one-sided error variance
(see section ‘Details’).
vhet A one-part formula to consider heteroscedasticity in the two-sided error variance
(see section ‘Details’).
logDepVar Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE).
Default = TRUE.
data The data frame containing the data.
subset An optional vector specifying a subset of observations to be used in the opti-
mization process.
weights An optional vector of weights to be used for weighted log-likelihood. Should be
NULL or numeric vector with positive values. When NULL, a numeric vector of 1
is used.
wscale Logical. When weights is not NULL, a scaling transformation is used such that
the weights sum to the sample size. Default TRUE. When FALSE no scaling is
used.
S If S = 1 (default), a production (profit) frontier is estimated: ϵi = vi − ui . If S =
-1, a cost frontier is estimated: ϵi = vi + ui .
sfacross 27

udist Character string. Default = 'hnormal'. Distribution specification for the one-
sided error term. 10 different distributions are available:
• 'hnormal', for the half normal distribution (Aigner et al. 1977, Meeusen
and Vandenbroeck 1977)
• 'exponential', for the exponential distribution
• 'tnormal' for the truncated normal distribution (Stevenson 1980)
• 'rayleigh', for the Rayleigh distribution (Hajargasht 2015)
• 'uniform', for the uniform distribution (Li 1996, Nguyen 2010)
• 'gamma', for the Gamma distribution (Greene 2003)
• 'lognormal', for the log normal distribution (Migon and Medici 2001,
Wang and Ye 2020)
• 'weibull', for the Weibull distribution (Tsionas 2007)
• 'genexponential', for the generalized exponential distribution (Papadopou-
los 2020)
• 'tslaplace', for the truncated skewed Laplace distribution (Wang 2012).
scaling Logical. Only when udist = 'tnormal' and scaling = TRUE, the scaling prop-
erty model (Wang and Schmidt 2002) is estimated. Default = FALSE. (see section
‘Details’).
start Numeric vector. Optional starting values for the maximum likelihood (ML)
estimation.
method Optimization algorithm used for the estimation. Default = 'bfgs'. 11 algo-
rithms are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)
• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
• 'ucminf', for a quasi-Newton type optimisation with BFGS updating of
the inverse Hessian and soft line search with a trust region type monitoring
of the input to the line search algorithm (see ucminf)
• 'mla', for general-purpose optimization based on Marquardt-Levenberg al-
gorithm (see mla)
• 'sr1', for Symmetric Rank 1 (see trust.optim)
• 'sparse', for trust regions and sparse Hessian (see trust.optim)
• 'nlminb', for optimization using PORT routines (see nlminb)
hessianType Integer. If 1 (Default), analytic Hessian is returned for all the distributions. If 2,
bhhh Hessian is estimated (g ′ g).
simType Character string. If simType = 'halton' (Default), Halton draws are used for
maximum simulated likelihood (MSL). If simType = 'ghalton', Generalized-
Halton draws are used for MSL. If simType = 'sobol', Sobol draws are used for
MSL. If simType = 'uniform', uniform draws are used for MSL. (see section
‘Details’).
28 sfacross

Nsim Number of draws for MSL. Default 100.


prime Prime number considered for Halton and Generalized-Halton draws. Default =
2.
burn Number of the first observations discarded in the case of Halton draws. Default
= 10.
antithetics Logical. Default = FALSE. If TRUE, antithetics counterpart of the uniform draws
is computed. (see section ‘Details’).
seed Numeric. Seed for the random draws.
itermax Maximum number of iterations allowed for optimization. Default = 2000.
printInfo Logical. Print information during optimization. Default = FALSE.
tol Numeric. Convergence tolerance. Default = 1e-12.
gradtol Numeric. Convergence tolerance for gradient. Default = 1e-06.
stepmax Numeric. Step max for ucminf algorithm. Default = 0.1.
qac Character. Quadratic Approximation Correction for 'bhhh' and 'nr' algo-
rithms. If 'stephalving', the step length is decreased but the direction is kept.
If 'marquardt' (default), the step length is decreased while also moving closer
to the pure gradient direction. See maxBHHH and maxNR.
x an object of class sfacross (returned by the function sfacross).
... additional arguments of frontier are passed to sfacross; additional arguments of
the print, bread, estfun, nobs methods are currently ignored.

Details
The stochastic frontier model for the cross-sectional data is defined as:

yi = α + x′i β + vi − Sui

with

ϵi = vi − Sui

where i is the observation, y is the output (cost, revenue, profit), x is the vector of main explanatory
variables (inputs and other control variables), u is the one-sided error term with variance σu2 , and v
is the two-sided error term with variance σv2 .
S = 1 in the case of production (profit) frontier function and S = -1 in the case of cost frontier
function.
The model is estimated using maximum likelihood (ML) for most distributions except the Gamma,
Weibull and log-normal distributions for which maximum simulated likelihood (MSL) is used. For
this latter, several draws can be implemented namely Halton, Generalized Halton, Sobol and uni-
form. In the case of uniform draws, antithetics can also be computed: first Nsim/2 draws are
obtained, then the Nsim/2 other draws are obtained as counterpart of one (1-draw).
To account for heteroscedasticity in the variance parameters of the error terms, a single part (right)
formula can also be specified. To impose the positivity to these parameters, the variances are mod-
elled as: σu2 = exp (δ ′ Zu ) or σv2 = exp (ϕ′ Zv ), where Zu and Zv are the heteroscedasticity
sfacross 29

variables (inefficiency drivers in the case of Zu ) and δ and ϕ the coefficients. In the case of het-
erogeneity in the truncated mean µ, it is modelled as µ = ω ′ Zµ . The scaling property can be
applied for the truncated normal distribution: u ∼ h(Zu , δ)u where u follows a truncated normal
distribution N + (τ, exp (cu)).
In the case of the truncated normal distribution, the convolution of ui and vi is:
!  .  
1 Sϵ + µ µi∗ µ
f (ϵi ) = p ϕ p i Φ Φ
σu + σv2
2 σu2 + σv2 σ∗ σu

where

µσv2 − Sϵi σu2


µi∗ =
σu2 + σv2
and

σu2 σv2
σ∗2 =
σu2 + σv2
In the case of the half normal distribution the convolution is obtained by setting µ = 0.
sfacross allows for the maximization of weighted log-likelihood. When option weights is speci-
fied and wscale = TRUE, the weights are scaled as:

oldweights
newweights = samplesize × P
(oldweights )
For complex problems, non-gradient methods (e.g. nm or sann) can be used to warm start the
optimization and zoom in the neighborhood of the solution. Then a gradient-based methods is
recommended in the second step. In the case of sann, we recommend to significantly increase the
iteration limit (e.g. itermax = 20000). The Conjugate Gradient (cg) can also be used in the first
stage.
A set of extractor functions for fitted model objects is available for objects of class 'sfacross'
including methods to the generic functions print, summary, coef, fitted, logLik, residuals,
vcov, efficiencies, ic, marginal, skewnessTest, estfun and bread (from the sandwich pack-
age), lmtest::coeftest() (from the lmtest package).

Value
sfacross returns a list of class 'sfacross' containing the following elements:

call The matched call.


formula The estimated model.
S The argument 'S'. See the section ‘Arguments’.
typeSfa Character string. ’Stochastic Production/Profit Frontier, e = v - u’ when S = 1
and ’Stochastic Cost Frontier, e = v + u’ when S = -1.
Nobs Number of observations used for optimization.
nXvar Number of explanatory variables in the production or cost frontier.
30 sfacross

nmuZUvar Number of variables explaining heterogeneity in the truncated mean, only if


udist = 'tnormal' or 'lognormal'.
scaling The argument 'scaling'. See the section ‘Arguments’.
logDepVar The argument 'logDepVar'. See the section ‘Arguments’.
nuZUvar Number of variables explaining heteroscedasticity in the one-sided error term.
nvZVvar Number of variables explaining heteroscedasticity in the two-sided error term.
nParm Total number of parameters estimated.
udist The argument 'udist'. See the section ‘Arguments’.
startVal Numeric vector. Starting value for M(S)L estimation.
dataTable A data frame (tibble format) containing information on data used for optimiza-
tion along with residuals and fitted values of the OLS and M(S)L estimations,
and the individual observation log-likelihood. When weights is specified an
additional variable is also provided in dataTable.
olsParam Numeric vector. OLS estimates.
olsStder Numeric vector. Standard errors of OLS estimates.
olsSigmasq Numeric. Estimated variance of OLS random error.
olsLoglik Numeric. Log-likelihood value of OLS estimation.
olsSkew Numeric. Skewness of the residuals of the OLS estimation.
olsM3Okay Logical. Indicating whether the residuals of the OLS estimation have the ex-
pected skewness.
CoelliM3Test Coelli’s test for OLS residuals skewness. (See Coelli, 1995).
AgostinoTest D’Agostino’s test for OLS residuals skewness. (See D’Agostino and Pearson,
1973).
isWeights Logical. If TRUE weighted log-likelihood is maximized.
optType Optimization algorithm used.
nIter Number of iterations of the ML estimation.
optStatus Optimization algorithm termination message.
startLoglik Log-likelihood at the starting values.
mlLoglik Log-likelihood value of the M(S)L estimation.
mlParam Parameters obtained from M(S)L estimation.
gradient Each variable gradient of the M(S)L estimation.
gradL_OBS Matrix. Each variable individual observation gradient of the M(S)L estimation.
gradientNorm Gradient norm of the M(S)L estimation.
invHessian Covariance matrix of the parameters obtained from the M(S)L estimation.
hessianType The argument 'hessianType'. See the section ‘Arguments’.
mlDate Date and time of the estimated model.
simDist The argument 'simDist', only if udist = 'gamma', 'lognormal' or , 'weibull'.
See the section ‘Arguments’.
Nsim The argument 'Nsim', only if udist = 'gamma', 'lognormal' or , 'weibull'.
See the section ‘Arguments’.
FiMat Matrix of random draws used for MSL, only if udist = 'gamma', 'lognormal'
or , 'weibull'.
sfacross 31

Note
For the Halton draws, the code is adapted from the mlogit package.

References
Aigner, D., Lovell, C. A. K., and Schmidt, P. 1977. Formulation and estimation of stochastic frontier
production function models. Journal of Econometrics, 6(1), 21–37.
Battese, G. E., and Coelli, T. J. 1995. A model for technical inefficiency effects in a stochastic
frontier production function for panel data. Empirical Economics, 20(2), 325–332.
Caudill, S. B., and Ford, J. M. 1993. Biases in frontier estimation due to heteroscedasticity. Eco-
nomics Letters, 41(1), 17–20.
Caudill, S. B., Ford, J. M., and Gropper, D. M. 1995. Frontier estimation and firm-specific ineffi-
ciency measures in the presence of heteroscedasticity. Journal of Business & Economic Statistics,
13(1), 105–111.
Coelli, T. 1995. Estimators and hypothesis tests for a stochastic frontier function - a Monte-Carlo
analysis. Journal of Productivity Analysis, 6:247–268.
D’Agostino, R., and E.S. Pearson.
√ 1973. Tests for departure from normality. Empirical results for
the distributions of b2 and b1 . Biometrika, 60:613–622.
Greene, W. H. 2003. Simulated likelihood estimation of the normal-Gamma stochastic frontier
function. Journal of Productivity Analysis, 19(2-3), 179–190.
Hadri, K. 1999. Estimation of a doubly heteroscedastic stochastic frontier cost function. Journal of
Business & Economic Statistics, 17(3), 359–363.
Hajargasht, G. 2015. Stochastic frontiers with a Rayleigh distribution. Journal of Productivity
Analysis, 44(2), 199–208.
Huang, C. J., and Liu, J.-T. 1994. Estimation of a non-neutral stochastic frontier production func-
tion. Journal of Productivity Analysis, 5(2), 171–180.
Kumbhakar, S. C., Ghosh, S., and McGuckin, J. T. 1991) A generalized production frontier ap-
proach for estimating determinants of inefficiency in U.S. dairy farms. Journal of Business &
Economic Statistics, 9(3), 279–286.
Li, Q. 1996. Estimating a stochastic production frontier when the adjusted error is symmetric.
Economics Letters, 52(3), 221–228.
Meeusen, W., and Vandenbroeck, J. 1977. Efficiency estimation from Cobb-Douglas production
functions with composed error. International Economic Review, 18(2), 435–445.
Migon, H. S., and Medici, E. V. 2001. Bayesian hierarchical models for stochastic production
frontier. Lacea, Montevideo, Uruguay.
Nguyen, N. B. 2010. Estimation of technical efficiency in stochastic frontier analysis. PhD disser-
tation, Bowling Green State University, August.
Papadopoulos, A. 2021. Stochastic frontier models using the generalized exponential distribution.
Journal of Productivity Analysis, 55:15–29.
Reifschneider, D., and Stevenson, R. 1991. Systematic departures from the frontier: A framework
for the analysis of firm inefficiency. International Economic Review, 32(3), 715–723.
Stevenson, R. E. 1980. Likelihood Functions for Generalized Stochastic Frontier Estimation. Jour-
nal of Econometrics, 13(1), 57–66.
32 sfacross

Tsionas, E. G. 2007. Efficiency measurement with the Weibull stochastic frontier. Oxford Bulletin
of Economics and Statistics, 69(5), 693–706.
Wang, K., and Ye, X. 2020. Development of alternative stochastic frontier models for estimating
time-space prism vertices. Transportation.
Wang, H.J., and Schmidt, P. 2002. One-step and two-step estimation of the effects of exogenous
variables on technical efficiency levels. Journal of Productivity Analysis, 18:129–144.
Wang, J. 2012. A normal truncated skewed-Laplace model in stochastic frontier analysis. Master
thesis, Western Kentucky University, May.

See Also
print for printing sfacross object.
summary for creating and printing summary results.
coef for extracting coefficients of the estimation.
efficiencies for computing (in-)efficiency estimates.
fitted for extracting the fitted frontier values.
ic for extracting information criteria.
logLik for extracting log-likelihood value(s) of the estimation.
marginal for computing marginal effects of inefficiency drivers.
residuals for extracting residuals of the estimation.
skewnessTest for conducting residuals skewness test.
vcov for computing the variance-covariance matrix of the coefficients.
bread for bread for sandwich estimator.
estfun for gradient extraction for each observation.
skewnessTest for implementing skewness test.

Examples
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog (cost function) half normal with heteroscedasticity
tl_u_h <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'hnormal', uhet = ~ regu, data = utility, S = -1, method = 'bfgs')
summary(tl_u_h)

# Translog (cost function) truncated normal with heteroscedasticity


tl_u_t <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, data = utility, S = -1, method = 'bhhh')
summary(tl_u_t)

# Translog (cost function) truncated normal with scaling property


tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
sfalcmcross 33

log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +


I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
summary(tl_u_ts)

## Using data on Philippine rice producers


# Cobb Douglas (production function) generalized exponential, and Weibull
# distributions

cb_p_ge <- sfacross(formula = log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) +


log(OTHER), udist = 'genexponential', data = ricephil, S = 1, method = 'bfgs')
summary(cb_p_ge)

## Using data on U.S. electric utility industry


# Cost frontier Gamma distribution
tl_u_g <- sfacross(formula = log(cost/fprice) ~ log(output) + I(log(output)^2) +
I(log(lprice/fprice)) + I(log(cprice/fprice)), udist = 'gamma', uhet = ~ 1,
data = electricity, S = -1, method = 'bfgs', simType = 'halton', Nsim = 200,
hessianType = 2)
summary(tl_u_g)

sfalcmcross Latent class stochastic frontier using cross-sectional data

Description
sfalcmcross is a symbolic formula based function for the estimation of the latent class stochastic
frontier model (LCM) in the case of cross-sectional or pooled cross-sectional data. The model is
estimated using maximum likelihood (ML). See Orea and Kumbhakar (2004), Parmeter and Kumb-
hakar (2014, p282).
Only the half-normal distribution is possible for the one-sided error term. Eleven optimization
algorithms are available.
The function also accounts for heteroscedasticity in both one-sided and two-sided error terms, as
in Reifschneider and Stevenson (1991), Caudill and Ford (1993), Caudill et al. (1995) and Hadri
(1999).
The model can estimate up to five classes.

Usage
sfalcmcross(
formula,
uhet,
vhet,
thet,
logDepVar = TRUE,
data,
34 sfalcmcross

subset,
weights,
wscale = TRUE,
S = 1L,
udist = "hnormal",
start = NULL,
whichStart = 2L,
initAlg = "nm",
initIter = 100,
lcmClasses = 2,
method = "bfgs",
hessianType = 1,
itermax = 2000L,
printInfo = FALSE,
tol = 1e-12,
gradtol = 1e-06,
stepmax = 0.1,
qac = "marquardt"
)

## S3 method for class 'sfalcmcross'


print(x, ...)

## S3 method for class 'sfalcmcross'


bread(x, ...)

## S3 method for class 'sfalcmcross'


estfun(x, ...)

Arguments
formula A symbolic description of the model to be estimated based on the generic func-
tion formula (see section ‘Details’).
uhet A one-part formula to account for heteroscedasticity in the one-sided error vari-
ance (see section ‘Details’).
vhet A one-part formula to account for heteroscedasticity in the two-sided error vari-
ance (see section ‘Details’).
thet A one-part formula to account for technological heterogeneity in the construc-
tion of the classes.
logDepVar Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE).
Default = TRUE.
data The data frame containing the data.
subset An optional vector specifying a subset of observations to be used in the opti-
mization process.
weights An optional vector of weights to be used for weighted log-likelihood. Should be
NULL or numeric vector with positive values. When NULL, a numeric vector of 1
is used.
sfalcmcross 35

wscale Logical. When weights is not NULL, a scaling transformation is used such that
the weights sums to the sample size. Default TRUE. When FALSE no scaling is
used.
S If S = 1 (default), a production (profit) frontier is estimated: ϵi = vi − ui . If S =
-1, a cost frontier is estimated: ϵi = vi + ui .
udist Character string. Distribution specification for the one-sided error term. Only
the half normal distribution 'hnormal' (Aigner et al., 1977, Meeusen and Van-
denbroeck, 1977) is currently implemented.
start Numeric vector. Optional starting values for the maximum likelihood (ML)
estimation.
whichStart Integer. If 'whichStart = 1', the starting values are obtained from the method
of moments. When 'whichStart = 2' (Default), the model is initialized by
solving the homoscedastic pooled cross section SFA model.
initAlg Character string specifying the algorithm used for initialization and obtain the
starting values (when 'whichStart = 2'). Only maxLik package algorithms
are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)
• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead - Default - (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
initIter Maximum number of iterations for initialization algorithm. Default 100.
lcmClasses Number of classes to be estimated (default = 2). A maximum of five classes can
be estimated.
method Optimization algorithm used for the estimation. Default = 'bfgs'. 11 algo-
rithms are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)
• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
• 'ucminf', for a quasi-Newton type optimization with BFGS updating of
the inverse Hessian and soft line search with a trust region type monitoring
of the input to the line search algorithm (see ucminf)
• 'mla', for general-purpose optimization based on Marquardt-Levenberg al-
gorithm (see mla)
• 'sr1', for Symmetric Rank 1 (see trust.optim)
• 'sparse', for trust regions and sparse Hessian (see trust.optim)
• 'nlminb', for optimization using PORT routines (see nlminb)
hessianType Integer. If 1 (default), analytic Hessian is returned. If 2, bhhh Hessian is esti-
mated (g ′ g).
36 sfalcmcross

itermax Maximum number of iterations allowed for optimization. Default = 2000.


printInfo Logical. Print information during optimization. Default = FALSE.
tol Numeric. Convergence tolerance. Default = 1e-12.
gradtol Numeric. Convergence tolerance for gradient. Default = 1e-06.
stepmax Numeric. Step max for ucminf algorithm. Default = 0.1.
qac Character. Quadratic Approximation Correction for 'bhhh' and 'nr' algo-
rithms. If 'qac = stephalving', the step length is decreased but the direction
is kept. If 'qac = marquardt' (default), the step length is decreased while also
moving closer to the pure gradient direction. See maxBHHH and maxNR.
x an object of class sfalcmcross (returned by the function sfalcmcross).
... additional arguments of frontier are passed to sfalcmcross; additional arguments
of the print, bread, estfun, nobs methods are currently ignored.

Details
LCM is an estimation of a finite mixture of production functions:

yi = αj + x′i βj + vi|j − Sui|j

ϵi|j = vi|j − Sui|j

where i is the observation, j is the class, y is the output (cost, revenue, profit), x is the vector of
main explanatory variables (inputs and other control variables), u is the one-sided error term with
variance σu2 , and v is the two-sided error term with variance σv2 .
S = 1 in the case of production (profit) frontier function and S = -1 in the case of cost frontier
function.
The contribution of observation i to the likelihood conditional on class j is defined as:
 
 
2 Sϵi|j µi∗|j
P (i|j) = q ϕ q Φ
2
σu|j 2
+ σv|j 2 + σ2
σu|j σ∗|j
v|j

where

2
−Sϵi|j σu|j
µi∗|j = 2 + σ2
σu|j v|j

and

2 2
σu|j σv|j
σ∗2 = 2 + σ2
σu|j v|j

The prior probability of using a particular technology can depend on some covariates (namely the
variables separating the observations into classes) using a logit specification:
sfalcmcross 37

exp (θj′ Zhi )


π(i, j) = PJ
′ Z )
exp (θm
m=1 hi

with Zh the covariates, θ the coefficients estimated for the covariates, and exp(θJ′ Zh ) = 1.
The unconditional likelihood of observation i is simply the average over the J classes:

J
X
P (i) = π(i, m)P (i|m)
m=1

The number of classes to retain can be based on information criterion (see for instance ic).
Class assignment is based on the largest posterior probability. This probability is obtained using
Bayes’ rule, as follows for class j:

P (i|j) π (i, j)
w (j|i) = PJ
m=1 P (i|m) π (i, m)

To accommodate heteroscedasticity in the variance parameters of the error terms, a single part
(right) formula can also be specified. To impose the positivity on these parameters, the variances
2
are modelled respectively as: σu|j = exp (δj′ Zu ) and σv|j
2
= exp (ϕ′j Zv ), where Zu and Zv are
the heteroscedasticity variables (inefficiency drivers in the case of Zu ) and δ and ϕ the coefficients.
'sfalcmcross' only supports the half-normal distribution for the one-sided error term.
sfalcmcross allows for the maximization of weighted log-likelihood. When option weights is
specified and wscale = TRUE, the weights are scaled as:

oldweights
newweights = samplesize × P
(oldweights )

For complex problems, non-gradient methods (e.g. nm or sann) can be used to warm start the
optimization and zoom in the neighborhood of the solution. Then a gradient-based methods is
recommended in the second step. In the case of sann, we recommend to significantly increase the
iteration limit (e.g. itermax = 20000). The Conjugate Gradient (cg) can also be used in the first
stage.
A set of extractor functions for fitted model objects is available for objects of class 'sfalcmcross'
including methods to the generic functions print, summary, coef, fitted, logLik, residuals,
vcov, efficiencies, ic, marginal, estfun and bread (from the sandwich package), lmtest::coeftest()
(from the lmtest package).

Value
sfalcmcross returns a list of class 'sfalcmcross' containing the following elements:

call The matched call.


formula Multi parts formula describing the estimated model.
S The argument 'S'. See the section ‘Arguments’.
typeSfa Character string. ’Latent Class Production/Profit Frontier, e = v - u’ when S = 1
and ’Latent Class Cost Frontier, e = v + u’ when S = -1.
38 sfalcmcross

Nobs Number of observations used for optimization.


nXvar Number of main explanatory variables.
nZHvar Number of variables in the logit specification of the finite mixture model (i.e.
number of covariates).
logDepVar The argument 'logDepVar'. See the section ‘Arguments’.
nuZUvar Number of variables explaining heteroscedasticity in the one-sided error term.
nvZVvar Number of variables explaining heteroscedasticity in the two-sided error term.
nParm Total number of parameters estimated.
udist The argument 'udist'. See the section ‘Arguments’.
startVal Numeric vector. Starting value for ML estimation.
dataTable A data frame (tibble format) containing information on data used for optimiza-
tion along with residuals and fitted values of the OLS and ML estimations, and
the individual observation log-likelihood. When weights is specified an addi-
tional variable is also provided in dataTable.
initHalf When start = NULL and whichStart == 2L. Initial ML estimation with half
normal distribution for the one-sided error term. Model to construct the starting
values for the latent class estimation. Object of class 'maxLik' and 'maxim'
returned.
isWeights Logical. If TRUE weighted log-likelihood is maximized.
optType The optimization algorithm used.
nIter Number of iterations of the ML estimation.
optStatus An optimization algorithm termination message.
startLoglik Log-likelihood at the starting values.
nClasses The number of classes estimated.
mlLoglik Log-likelihood value of the ML estimation.
mlParam Numeric vector. Parameters obtained from ML estimation.
mlParamMatrix Double. Matrix of ML parameters by class.
gradient Numeric vector. Each variable gradient of the ML estimation.
gradL_OBS Matrix. Each variable individual observation gradient of the ML estimation.
gradientNorm Numeric. Gradient norm of the ML estimation.
invHessian The covariance matrix of the parameters obtained from the ML estimation.
hessianType The argument 'hessianType'. See the section ‘Arguments’.
mlDate Date and time of the estimated model.

Note
In the case of panel data, sfalcmcross estimates a pooled cross-section where the probability of
belonging to a class a priori is not permanent (not fixed over time).
sfalcmcross 39

References
Aigner, D., Lovell, C. A. K., and P. Schmidt. 1977. Formulation and estimation of stochastic
frontier production function models. Journal of Econometrics, 6(1), 21–37.
Caudill, S. B., and J. M. Ford. 1993. Biases in frontier estimation due to heteroscedasticity. Eco-
nomics Letters, 41(1), 17–20.
Caudill, S. B., Ford, J. M., and D. M. Gropper. 1995. Frontier estimation and firm-specific ineffi-
ciency measures in the presence of heteroscedasticity. Journal of Business & Economic Statistics,
13(1), 105–111.
Hadri, K. 1999. Estimation of a doubly heteroscedastic stochastic frontier cost function. Journal of
Business & Economic Statistics, 17(3), 359–363.
Meeusen, W., and J. Vandenbroeck. 1977. Efficiency estimation from Cobb-Douglas production
functions with composed error. International Economic Review, 18(2), 435–445.
Orea, L., and S.C. Kumbhakar. 2004. Efficiency measurement using a latent class stochastic frontier
model. Empirical Economics, 29, 169–183.
Parmeter, C.F., and S.C. Kumbhakar. 2014. Efficiency analysis: A primer on recent advances.
Foundations and Trends in Econometrics, 7, 191–385.
Reifschneider, D., and R. Stevenson. 1991. Systematic departures from the frontier: A framework
for the analysis of firm inefficiency. International Economic Review, 32(3), 715–723.

See Also
print for printing sfalcmcross object.
summary for creating and printing summary results.
coef for extracting coefficients of the estimation.
efficiencies for computing (in-)efficiency estimates.
fitted for extracting the fitted frontier values.
ic for extracting information criteria.
logLik for extracting log-likelihood value(s) of the estimation.
marginal for computing marginal effects of inefficiency drivers.
residuals for extracting residuals of the estimation.
vcov for computing the variance-covariance matrix of the coefficients.
bread for bread for sandwich estimator.
estfun for gradient extraction for each observation.

Examples
## Using data on eighty-two countries production (GDP)
# LCM Cobb Douglas (production function) half normal distribution
# Intercept and initStat used as separating variables
cb_2c_h1 <- sfalcmcross(formula = ly ~ lk + ll + yr, thet = ~initStat,
data = worldprod)
summary(cb_2c_h1)
40 sfaR-deprecated

# summary of the initial ML model


summary(cb_2c_h1$InitHalf)

# Only the intercept is used as the separating variable


# and only variable initStat is used as inefficiency driver
cb_2c_h3 <- sfalcmcross(formula = ly ~ lk + ll + yr, uhet = ~initStat,
data = worldprod)
summary(cb_2c_h3)

sfaR-deprecated Deprecated functions of sfaR

Description
These functions are provided for compatibility with older versions of ‘sfaR’ only, and could be
defunct at a future release.

Usage
lcmcross(
formula,
uhet,
vhet,
thet,
logDepVar = TRUE,
data,
subset,
weights,
wscale = TRUE,
S = 1L,
udist = "hnormal",
start = NULL,
whichStart = 2L,
initAlg = "nm",
initIter = 100,
lcmClasses = 2,
method = "bfgs",
hessianType = 1,
itermax = 2000L,
printInfo = FALSE,
tol = 1e-12,
gradtol = 1e-06,
stepmax = 0.1,
qac = "marquardt"
)

## S3 method for class 'lcmcross'


sfaR-deprecated 41

print(x, ...)

## S3 method for class 'lcmcross'


bread(x, ...)

## S3 method for class 'lcmcross'


estfun(x, ...)

## S3 method for class 'lcmcross'


coef(object, extraPar = FALSE, ...)

## S3 method for class 'summary.lcmcross'


coef(object, ...)

## S3 method for class 'lcmcross'


fitted(object, ...)

## S3 method for class 'lcmcross'


ic(object, IC = "AIC", ...)

## S3 method for class 'lcmcross'


logLik(object, individual = FALSE, ...)

## S3 method for class 'lcmcross'


marginal(object, newData = NULL, ...)

## S3 method for class 'lcmcross'


nobs(object, ...)

## S3 method for class 'lcmcross'


residuals(object, ...)

## S3 method for class 'lcmcross'


summary(object, grad = FALSE, ci = FALSE, ...)

## S3 method for class 'summary.lcmcross'


print(x, digits = max(3, getOption("digits") - 2), ...)

## S3 method for class 'lcmcross'


efficiencies(object, level = 0.95, newData = NULL, ...)

## S3 method for class 'lcmcross'


vcov(object, ...)

Arguments

formula A symbolic description of the model to be estimated based on the generic func-
tion formula (see section ‘Details’).
42 sfaR-deprecated

uhet A one-part formula to account for heteroscedasticity in the one-sided error vari-
ance (see section ‘Details’).
vhet A one-part formula to account for heteroscedasticity in the two-sided error vari-
ance (see section ‘Details’).
thet A one-part formula to account for technological heterogeneity in the construc-
tion of the classes.
logDepVar Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE).
Default = TRUE.
data The data frame containing the data.
subset An optional vector specifying a subset of observations to be used in the opti-
mization process.
weights An optional vector of weights to be used for weighted log-likelihood. Should be
NULL or numeric vector with positive values. When NULL, a numeric vector of 1
is used.
wscale Logical. When weights is not NULL, a scaling transformation is used such that
the weights sums to the sample size. Default TRUE. When FALSE no scaling is
used.
S If S = 1 (default), a production (profit) frontier is estimated: ϵi = vi − ui . If S =
-1, a cost frontier is estimated: ϵi = vi + ui .
udist Character string. Distribution specification for the one-sided error term. Only
the half normal distribution 'hnormal' (Aigner et al., 1977, Meeusen and Van-
denbroeck, 1977) is currently implemented.
start Numeric vector. Optional starting values for the maximum likelihood (ML)
estimation.
whichStart Integer. If 'whichStart = 1', the starting values are obtained from the method
of moments. When 'whichStart = 2' (Default), the model is initialized by
solving the homoscedastic pooled cross section SFA model. 'whichStart = 1'
can be fast.
initAlg Character string specifying the algorithm used for initialization and obtain the
starting values (when 'whichStart = 2'). Only maxLik package algorithms
are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)
• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead - Default - (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
initIter Maximum number of iterations for initialization algorithm. Default 100.
lcmClasses Number of classes to be estimated (default = 2). A maximum of five classes can
be estimated.
method Optimization algorithm used for the estimation. Default = 'bfgs'. 11 algo-
rithms are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
sfaR-deprecated 43

• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)


• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
• 'ucminf', for a quasi-Newton type optimization with BFGS updating of
the inverse Hessian and soft line search with a trust region type monitoring
of the input to the line search algorithm (see ucminf)
• 'mla', for general-purpose optimization based on Marquardt-Levenberg al-
gorithm (see mla)
• 'sr1', for Symmetric Rank 1 (see trust.optim)
• 'sparse', for trust regions and sparse Hessian (see trust.optim)
• 'nlminb', for optimization using PORT routines (see nlminb)
hessianType Integer. If 1 (default), analytic Hessian is returned. If 2, bhhh Hessian is esti-
mated (g ′ g).
itermax Maximum number of iterations allowed for optimization. Default = 2000.
printInfo Logical. Print information during optimization. Default = FALSE.
tol Numeric. Convergence tolerance. Default = 1e-12.
gradtol Numeric. Convergence tolerance for gradient. Default = 1e-06.
stepmax Numeric. Step max for ucminf algorithm. Default = 0.1.
qac Character. Quadratic Approximation Correction for 'bhhh' and 'nr' algo-
rithms. If 'qac = stephalving', the step length is decreased but the direction
is kept. If 'qac = marquardt' (default), the step length is decreased while also
moving closer to the pure gradient direction. See maxBHHH and maxNR.
x an object of class lcmcross (returned by the function lcmcross).
... additional arguments of frontier are passed to lcmcross; additional arguments of
the print, bread, estfun, nobs methods are currently ignored.
object an object of class lcmcross (returned by the function lcmcross).
extraPar Logical (default = FALSE). If TRUE, additional parameters are returned (see coef
or vcov).
IC Character string. Information criterion measure. Three criteria are available:
• 'AIC' for Akaike information criterion (default)
• 'BIC' for Bayesian information criterion
• 'HQIC' for Hannan-Quinn information criterion
.
individual Logical. If FALSE (default), the sum of all observations’ log-likelihood values is
returned. If TRUE, a vector of each observation’s log-likelihood value is returned.
newData Optional data frame that is used to calculate the efficiency estimates. If NULL
(the default), the efficiency estimates are calculated for the observations that
were used in the estimation.
grad Logical. Default = FALSE. If TRUE, the gradient for the maximum likelihood
(ML) estimates of the different parameters is returned.
44 sfaselectioncross

ci Logical. Default = FALSE. If TRUE, the 95% confidence interval for the different
parameters (OLS or/and ML estimates) is returned.
digits Numeric. Number of digits displayed in values.
level A number between between 0 and 0.9999 used for the computation of (in-
)efficiency confidence intervals (defaut = 0.95). Not used in the case of lcmcross.

Details
The following functions are deprecated and could be removed from sfaR in a near future. Use the
replacement indicated below:
• lcmcross: sfalcmcross
• bread.lcmcross: bread.sfalcmcross
• coef.lcmcross: coef.sfalcmcross
• coef.summary.lcmcross: coef.summary.sfalcmcross
• efficiencies.lcmcross: efficiencies.sfalcmcross
• estfun.lcmcross: estfun.sfalcmcross
• fitted.lcmcross: fitted.sfalcmcross
• ic.lcmcross: ic.sfalcmcross
• logLik.lcmcross: logLik.sfalcmcross
• marginal.lcmcross: marginal.sfalcmcross
• nobs.lcmcross: nobs.sfalcmcross
• print.lcmcross: print.sfalcmcross
• print.summary.lcmcross: print.summary.sfalcmcross
• residuals.lcmcross: residuals.sfalcmcross
• summary.lcmcross: summary.sfalcmcross
• vcov.lcmcross: vcov.sfalcmcross

sfaselectioncross Sample selection in stochastic frontier estimation using cross-section


data

Description
sfaselectioncross is a symbolic formula based function for the estimation of the stochastic fron-
tier model in the presence of sample selection. The model accommodates cross-sectional or pooled
cross-sectional data. The model can be estimated using different quadrature approaches or maxi-
mum simulated likelihood (MSL). See Greene (2010).
Only the half-normal distribution is possible for the one-sided error term. Eleven optimization
algorithms are available.
The function also accounts for heteroscedasticity in both one-sided and two-sided error terms, as
in Reifschneider and Stevenson (1991), Caudill and Ford (1993), Caudill et al. (1995) and Hadri
(1999).
sfaselectioncross 45

Usage
sfaselectioncross(
selectionF,
frontierF,
uhet,
vhet,
modelType = "greene10",
logDepVar = TRUE,
data,
subset,
weights,
wscale = TRUE,
S = 1L,
udist = "hnormal",
start = NULL,
method = "bfgs",
hessianType = 2L,
lType = "ghermite",
Nsub = 100,
uBound = Inf,
simType = "halton",
Nsim = 100,
prime = 2L,
burn = 10,
antithetics = FALSE,
seed = 12345,
itermax = 2000,
printInfo = FALSE,
intol = 1e-06,
tol = 1e-12,
gradtol = 1e-06,
stepmax = 0.1,
qac = "marquardt"
)

## S3 method for class 'sfaselectioncross'


print(x, ...)

## S3 method for class 'sfaselectioncross'


bread(x, ...)

## S3 method for class 'sfaselectioncross'


estfun(x, ...)

Arguments
selectionF A symbolic (formula) description of the selection equation.
frontierF A symbolic (formula) description of the outcome (frontier) equation.
46 sfaselectioncross

uhet A one-part formula to consider heteroscedasticity in the one-sided error variance


(see section ‘Details’).
vhet A one-part formula to consider heteroscedasticity in the two-sided error variance
(see section ‘Details’).
modelType Character string. Model used to solve the selection bias. Only the model dis-
cussed in Greene (2010) is currently available.
logDepVar Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE).
Default = TRUE.
data The data frame containing the data.
subset An optional vector specifying a subset of observations to be used in the opti-
mization process.
weights An optional vector of weights to be used for weighted log-likelihood. Should be
NULL or numeric vector with positive values. When NULL, a numeric vector of 1
is used.
wscale Logical. When weights is not NULL, a scaling transformation is used such that
the weights sum to the sample size. Default TRUE. When FALSE no scaling is
used.
S If S = 1 (default), a production (profit) frontier is estimated: ϵi = vi − ui . If S =
-1, a cost frontier is estimated: ϵi = vi + ui .
udist Character string. Distribution specification for the one-sided error term. Only
the half normal distribution 'hnormal' is currently implemented.
start Numeric vector. Optional starting values for the maximum likelihood (ML)
estimation.
method Optimization algorithm used for the estimation. Default = 'bfgs'. 11 algo-
rithms are available:
• 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)
• 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)
• 'nr', for Newton-Raphson (see maxNR)
• 'nm', for Nelder-Mead (see maxNM)
• 'cg', for Conjugate Gradient (see maxCG)
• 'sann', for Simulated Annealing (see maxSANN)
• 'ucminf', for a quasi-Newton type optimization with BFGS updating of
the inverse Hessian and soft line search with a trust region type monitoring
of the input to the line search algorithm (see ucminf)
• 'mla', for general-purpose optimization based on Marquardt-Levenberg al-
gorithm (see mla)
• 'sr1', for Symmetric Rank 1 (see trust.optim)
• 'sparse', for trust regions and sparse Hessian (see trust.optim)
• 'nlminb', for optimization using PORT routines (see nlminb)
hessianType Integer. If 1, analytic Hessian is returned. If 2, bhhh Hessian is estimated (g ′ g).
bhhh hessian is estimated by default as the estimation is conducted in two steps.
sfaselectioncross 47

lType Specifies the way the likelihood is estimated. Five possibilities are available:
kronrod for Gauss-Kronrod quadrature (see integrate), hcubature and pcubature
for adaptive integration over hypercubes (see hcubature and pcubature), ghermite
for Gauss-Hermite quadrature (see gaussHermiteData), and msl for maximum
simulated likelihood. Default ghermite.
Nsub Integer. Number of subdivisions/nodes used for quadrature approaches. Default
Nsub = 100.
uBound Numeric. Upper bound for the inefficiency component when solving integrals
using quadrature approaches except Gauss-Hermite for which the upper bound
is automatically infinite (Inf). Default uBound = Inf.
simType Character string. If simType = 'halton' (Default), Halton draws are used for
maximum simulated likelihood (MSL). If simType = 'ghalton', Generalized-
Halton draws are used for MSL. If simType = 'sobol', Sobol draws are used for
MSL. If simType = 'uniform', uniform draws are used for MSL. (see section
‘Details’).
Nsim Number of draws for MSL (default 100).
prime Prime number considered for Halton and Generalized-Halton draws. Default =
2.
burn Number of the first observations discarded in the case of Halton draws. Default
= 10.
antithetics Logical. Default = FALSE. If TRUE, antithetics counterpart of the uniform draws
is computed. (see section ‘Details’).
seed Numeric. Seed for the random draws.
itermax Maximum number of iterations allowed for optimization. Default = 2000.
printInfo Logical. Print information during optimization. Default = FALSE.
intol Numeric. Integration tolerance for quadrature approaches (kronrod, hcubature,
pcubature).
tol Numeric. Convergence tolerance. Default = 1e-12.
gradtol Numeric. Convergence tolerance for gradient. Default = 1e-06.
stepmax Numeric. Step max for ucminf algorithm. Default = 0.1.
qac Character. Quadratic Approximation Correction for 'bhhh' and 'nr' algo-
rithms. If 'stephalving', the step length is decreased but the direction is kept.
If 'marquardt' (default), the step length is decreased while also moving closer
to the pure gradient direction. See maxBHHH and maxNR.
x an object of class sfaselectioncross (returned by the function sfaselectioncross).
... additional arguments of frontier are passed to sfaselectioncross; additional argu-
ments of the print, bread, estfun, nobs methods are currently ignored.

Details
The current model is an extension of Heckman (1976, 1979) sample selection model to nonlinear
models particularly stochastic frontier model. The model has first been discussed in Greene (2010),
and an application can be found in Dakpo et al. (2021). Practically, we have:
48 sfaselectioncross



1 if y1i >0
y1i = ∗
0 if y1i ≤0

where


y1i = Z′si γ + wi , wi ∼ N (0, 1)

and

∗ ∗

y2i if y1i >0
y2i = ∗
N A if y1i ≤0

where


y2i = x′i β + vi − Sui , vi = σv Vi ∧ Vi ∼ N (0, 1), ui = σu |Ui | ∧ Ui ∼ N (0, 1)

y1i describes the selection equation while y2i represents the frontier equation. The selection bias
arises from the correlation between the two symmetric random components vi and wi :

(vi , wi ) ∼ N2 (0, 0), (1, ρσv , σv2 )


 

Conditionaly on |Ui |, the probability associated to each observation is:

∗ 1−y1i ∗ ∗ y1i
P r [y1i ≤ 0] · {f (y2i |y1i > 0) × P r [y1i > 0]}

Using the conditional probability formula:

P (A ∩ B) = P (A) · P (B|A) = P (B) · P (A|B)

Therefore:

∗ ∗ ∗
f (y2i |y1i ≥ 0) · P r [y1i ≥ 0] = f (y2i ) · P r(y1i ≥ 0|y2i )

Using the properties of a bivariate normal distribution, we have:


 
∗ ρ
yi1 |yi2 ∼N Z′si γ + vi , 1 − ρ 2
σv

Hence conditionally on |Ui |, we have:

Z′si γ + σρv vi
  !
∗ ∗ 1 vi
f (y2i |y1i ≥ 0) · P r [y1i ≥ 0] = ϕ Φ p
σv σv 1 − ρ2

The conditional likelihood is equal to:


sfaselectioncross 49

( ρ
!)y1i
y2i − x′i β + Sσu |Ui | Z′si γ + (y2i − x′i β + Sσu |Ui |)
 
1 σv
Li |Ui | = Φ(−Z′si γ)1−y1i × ϕ Φ p
σv σv 1 − ρ2

Since the non-selected observations bring no additional information, the conditional likelihood to
be considered is:

ρ
!
y2i − x′i β + Sσu |Ui | Z′si γ + (y2i − x′i β + Sσu |Ui |)
 
1 σv
Li |Ui | = ϕ Φ p
σv σv 1 − ρ2

The unconditional likelihood is obtained by integrating |Ui | out of the conditional likelihood. Thus

ρ
!
y2i − x′i β + Sσu |Ui | Z′si γ + (y2i − x′i β + Sσu |Ui |)
Z  
1 σv
Li = ϕ Φ p p (|Ui |) d|Ui |
|Ui | σv σv 1 − ρ2

To simplifiy the estimation, the likelihood can be estimated using a two-step approach. In the first
step, the probit model can be run and estimate of γ can be obtained. Then, in the second step, the
following model is estimated:

ρ
!
y2i − x′i β + Sσu |Ui | ai + (y2i − x′i β + Sσu |Ui |)
Z  
1 σv
Li = ϕ Φ p p (|Ui |) d|Ui |
|Ui | σv σv 1 − ρ2

where ai = Z′si γ̂. This likelihood can be estimated using five different approaches: Gauss-Kronrod
quadrature, adaptive integration over hypercubes (hcubature and pcubature), Gauss-Hermite quadra-
ture, and maximum simulated likelihood. We also use the BHHH estimator to obtain the asymptotic
standard errors for the parameter estimators.
sfaselectioncross allows for the maximization of weighted log-likelihood. When option weights
is specified and wscale = TRUE, the weights are scaled as:

oldweights
newweights = samplesize × P
(oldweights )

For complex problems, non-gradient methods (e.g. nm or sann) can be used to warm start the
optimization and zoom in the neighborhood of the solution. Then a gradient-based methods is
recommended in the second step. In the case of sann, we recommend to significantly increase the
iteration limit (e.g. itermax = 20000). The Conjugate Gradient (cg) can also be used in the first
stage.
A set of extractor functions for fitted model objects is available for objects of class 'sfaselectioncross'
including methods to the generic functions print, summary, coef, fitted, logLik, residuals,
vcov, efficiencies, ic, marginal, estfun and bread (from the sandwich package), lmtest::coeftest()
(from the lmtest package).
50 sfaselectioncross

Value
sfaselectioncross returns a list of class 'sfaselectioncross' containing the following ele-
ments:

call The matched call.


selectionF The selection equation formula.
frontierF The frontier equation formula.
S The argument 'S'. See the section ‘Arguments’.
typeSfa Character string. ’Stochastic Production/Profit Frontier, e = v - u’ when S = 1
and ’Stochastic Cost Frontier, e = v + u’ when S = -1.
Ninit Number of initial observations in all samples.
Nobs Number of observations used for optimization.
nXvar Number of explanatory variables in the production or cost frontier.
logDepVar The argument 'logDepVar'. See the section ‘Arguments’.
nuZUvar Number of variables explaining heteroscedasticity in the one-sided error term.
nvZVvar Number of variables explaining heteroscedasticity in the two-sided error term.
nParm Total number of parameters estimated.
udist The argument 'udist'. See the section ‘Arguments’.
startVal Numeric vector. Starting value for M(S)L estimation.
dataTable A data frame (tibble format) containing information on data used for optimiza-
tion along with residuals and fitted values of the OLS and M(S)L estimations,
and the individual observation log-likelihood. When argument weights is spec-
ified, an additional variable is provided in dataTable.
lpmObj Linear probability model used for initializing the first step probit model.
probitObj Probit model. Object of class 'maxLik' and 'maxim'.
ols2stepParam Numeric vector. OLS second step estimates for selection correction. Inverse
Mills Ratio is introduced as an additional explanatory variable.
ols2stepStder Numeric vector. Standard errors of OLS second step estimates.
ols2stepSigmasq
Numeric. Estimated variance of OLS second step random error.
ols2stepLoglik Numeric. Log-likelihood value of OLS second step estimation.
ols2stepSkew Numeric. Skewness of the residuals of the OLS second step estimation.
ols2stepM3Okay Logical. Indicating whether the residuals of the OLS second step estimation
have the expected skewness.
CoelliM3Test Coelli’s test for OLS residuals skewness. (See Coelli, 1995).
AgostinoTest D’Agostino’s test for OLS residuals skewness. (See D’Agostino and Pearson,
1973).
isWeights Logical. If TRUE weighted log-likelihood is maximized.
lType Type of likelihood estimated. See the section ‘Arguments’.
optType Optimization algorithm used.
sfaselectioncross 51

nIter Number of iterations of the ML estimation.


optStatus Optimization algorithm termination message.
startLoglik Log-likelihood at the starting values.
mlLoglik Log-likelihood value of the M(S)L estimation.
mlParam Parameters obtained from M(S)L estimation.
gradient Each variable gradient of the M(S)L estimation.
gradL_OBS Matrix. Each variable individual observation gradient of the M(S)L estimation.
gradientNorm Gradient norm of the M(S)L estimation.
invHessian Covariance matrix of the parameters obtained from the M(S)L estimation.
hessianType The argument 'hessianType'. See the section ‘Arguments’.
mlDate Date and time of the estimated model.
simDist The argument 'simDist', only if lType = 'msl'. See the section ‘Arguments’.
Nsim The argument 'Nsim', only if lType = 'msl'. See the section ‘Arguments’.
FiMat Matrix of random draws used for MSL, only if lType = 'msl'.
gHermiteData List. Gauss-Hermite quadrature rule as provided by gaussHermiteData. Only
if lType = 'ghermite'.
Nsub Number of subdivisions used for quadrature approaches.
uBound Upper bound for the inefficiency component when solving integrals using quadra-
ture approaches except Gauss-Hermite for which the upper bound is automati-
cally infinite (Inf).
intol Integration tolerance for quadrature approaches except Gauss-Hermite.

Note
For the Halton draws, the code is adapted from the mlogit package.

References
Caudill, S. B., and Ford, J. M. 1993. Biases in frontier estimation due to heteroscedasticity. Eco-
nomics Letters, 41(1), 17–20.
Caudill, S. B., Ford, J. M., and Gropper, D. M. 1995. Frontier estimation and firm-specific ineffi-
ciency measures in the presence of heteroscedasticity. Journal of Business & Economic Statistics,
13(1), 105–111.
Coelli, T. 1995. Estimators and hypothesis tests for a stochastic frontier function - a Monte-Carlo
analysis. Journal of Productivity Analysis, 6:247–268.
D’Agostino, R., and E.S. Pearson.
√ 1973. Tests for departure from normality. Empirical results for
the distributions of b2 and b1 . Biometrika, 60:613–622.
Dakpo, K. H., Latruffe, L., Desjeux, Y., Jeanneaux, P., 2022. Modeling heterogeneous technologies
in the presence of sample selection: The case of dairy farms and the adoption of agri-environmental
schemes in France. Agricultural Economics, 53(3), 422-438.
Greene, W., 2010. A stochastic frontier model with correction for sample selection. Journal of
Productivity Analysis. 34, 15–24.
52 sfaselectioncross

Hadri, K. 1999. Estimation of a doubly heteroscedastic stochastic frontier cost function. Journal of
Business & Economic Statistics, 17(3), 359–363.
Heckman, J., 1976. Discrete, qualitative and limited dependent variables. Ann Econ Soc Meas. 4,
475–492.
Heckman, J., 1979. Sample Selection Bias as a Specification Error. Econometrica. 47, 153–161.
Reifschneider, D., and Stevenson, R. 1991. Systematic departures from the frontier: A framework
for the analysis of firm inefficiency. International Economic Review, 32(3), 715–723.

See Also
print for printing sfaselectioncross object.
summary for creating and printing summary results.
coef for extracting coefficients of the estimation.
efficiencies for computing (in-)efficiency estimates.
fitted for extracting the fitted frontier values.
ic for extracting information criteria.
logLik for extracting log-likelihood value(s) of the estimation.
marginal for computing marginal effects of inefficiency drivers.
residuals for extracting residuals of the estimation.
vcov for computing the variance-covariance matrix of the coefficients.
bread for bread for sandwich estimator.
estfun for gradient extraction for each observation.

Examples
## Not run:

## Simulated example

N <- 2000 # sample size


set.seed(12345)
z1 <- rnorm(N)
z2 <- rnorm(N)
v1 <- rnorm(N)
v2 <- rnorm(N)
e1 <- v1
e2 <- 0.7071 * (v1 + v2)
ds <- z1 + z2 + e1
d <- ifelse(ds > 0, 1, 0)
u <- abs(rnorm(N))
x1 <- rnorm(N)
x2 <- rnorm(N)
y <- x1 + x2 + e2 - u
data <- cbind(y = y, x1 = x1, x2 = x2, z1 = z1, z2 = z2, d = d)

## Estimation using quadrature (Gauss-Kronrod)


skewnessTest 53

selecRes1 <- sfaselectioncross(selectionF = d ~ z1 + z2, frontierF = y ~ x1 + x2,


modelType = 'greene10', method = 'bfgs',
logDepVar = TRUE, data = as.data.frame(data),
S = 1L, udist = 'hnormal', lType = 'kronrod', Nsub = 100, uBound = Inf,
simType = 'halton', Nsim = 300, prime = 2L, burn = 10, antithetics = FALSE,
seed = 12345, itermax = 2000, printInfo = FALSE)

summary(selecRes1)

## Estimation using maximum simulated likelihood

selecRes2 <- sfaselectioncross(selectionF = d ~ z1 + z2, frontierF = y ~ x1 + x2,


modelType = 'greene10', method = 'bfgs',
logDepVar = TRUE, data = as.data.frame(data),
S = 1L, udist = 'hnormal', lType = 'msl', Nsub = 100, uBound = Inf,
simType = 'halton', Nsim = 300, prime = 2L, burn = 10, antithetics = FALSE,
seed = 12345, itermax = 2000, printInfo = FALSE)

summary(selecRes2)

## End(Not run)

skewnessTest Skewness test for stochastic frontier models

Description
skewnessTest computes skewness test for stochastic frontier models (i.e. objects of class 'sfacross').

Usage
skewnessTest(object, test = "agostino")

Arguments
object An object of class 'sfacross', returned by sfacross.
test A character string specifying the test to implement. If 'agostino' (default),
D’Agostino skewness test is implemented (D’Agostino and Pearson, 1973). If
'coelli', Coelli skewness test is implemented (Coelli, 1995).

Value
skewnessTest returns the results of either the D’Agostino’s or the Coelli’s skewness test.

Note
skewnessTest is currently only available for object of class 'sfacross'.
54 summary

References
Coelli, T. 1995. Estimators and hypothesis tests for a stochastic frontier function - a Monte-Carlo
analysis. Journal of Productivity Analysis, 6:247–268.
D’Agostino, R., and E.S. Pearson.
√ 1973. Tests for departure from normality. Empirical results for
the distributions of b2 and b1 . Biometrika, 60:613–622.

Examples
## Not run:
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
skewnessTest(tl_u_ts)
skewnessTest(tl_u_ts, test = 'coelli')

## End(Not run)

summary Summary of results for stochastic frontier models

Description
Create and print summary results for stochastic frontier models returned by sfacross, sfalcmcross,
or sfaselectioncross.

Usage
## S3 method for class 'sfacross'
summary(object, grad = FALSE, ci = FALSE, ...)

## S3 method for class 'summary.sfacross'


print(x, digits = max(3, getOption("digits") - 2), ...)

## S3 method for class 'sfalcmcross'


summary(object, grad = FALSE, ci = FALSE, ...)

## S3 method for class 'summary.sfalcmcross'


print(x, digits = max(3, getOption("digits") - 2), ...)

## S3 method for class 'sfaselectioncross'


summary(object, grad = FALSE, ci = FALSE, ...)

## S3 method for class 'summary.sfaselectioncross'


print(x, digits = max(3, getOption("digits") - 2), ...)
summary 55

Arguments
object An object of either class 'sfacross' returned by the function sfacross, or
'sfalcmcross' returned by the function sfalcmcross, or class 'sfaselectioncross'
returned by the function sfaselectioncross.
grad Logical. Default = FALSE. If TRUE, the gradient for the maximum likelihood
(ML) estimates of the different parameters is returned.
ci Logical. Default = FALSE. If TRUE, the 95% confidence interval for the different
parameters (OLS or/and ML estimates) is returned.
... Currently ignored.
x An object of either class 'summary.sfacross', 'summary.sfalcmcross', or
'summary.sfaselectioncross'.
digits Numeric. Number of digits displayed in values.

Value
The summary method returns a list of class 'summary.sfacross', 'summary.sfalcmcross', or
'summary.sfaselectioncross' that contains the same elements as an object returned by sfacross,
sfalcmcross, or sfaselectioncross with the following additional elements:

AIC Akaike information criterion.


BIC Bayesian information criterion.
HQIC Hannan-Quinn information criterion.
sigmavSq For object of class 'sfacross' or 'sfaselectioncross'. Variance of the
two-sided error term (σv2 ).
sigmauSq For object of class 'sfacross' or 'sfaselectioncross'. Parametrization of
the variance of the one-sided error term (σu2 ).
Varu For object of class 'sfacross' or 'sfaselectioncross'. Variance of the
one-sided error term.
theta For object of class 'sfacross' with 'udist = uniform'. Θ value in the case
the uniform distribution is defined as: ui ∈ [0, Θ].
Eu For object of class 'sfacross' or 'sfaselectioncross'. Expected uncon-
ditional inefficiency (E[u]).
Expu For object of class 'sfacross' or 'sfaselectioncross'. Expected uncon-
ditional efficiency (E[exp(u)]).
olsRes For object of class 'sfacross'. Matrix of OLS estimates, their standard er-
rors, t-values, P-values, and when ci = TRUE their confidence intervals.
ols2StepRes For object of class 'sfaselectioncross'. Matrix of OLS 2 step estimates,
their standard errors, t-values, P-values, and when ci = TRUE their confidence
intervals.
mlRes Matrix of ML estimates, their standard errors, z-values, asymptotic P-values,
and when grad = TRUE their gradient, ci = TRUE their confidence intervals.
chisq For object of class 'sfacross'. Chi-square statistics of the difference between
the stochastic frontier and the OLS.
df Degree of freedom for the inefficiency model.
56 swissrailways

See Also

sfacross, for the stochastic frontier analysis model fitting function for cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function for cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function for cross-
sectional or pooled data.
print for printing sfacross object.
coef for extracting coefficients of the estimation.
efficiencies for computing (in-)efficiency estimates.
fitted for extracting the fitted frontier values.
ic for extracting information criteria.
logLik for extracting log-likelihood value(s) of the estimation.
marginal for computing marginal effects of inefficiency drivers.
residuals for extracting residuals of the estimation.
vcov for computing the variance-covariance matrix of the coefficients.
bread for bread for sandwich estimator.
estfun for gradient extraction for each observation.
skewnessTest for implementing skewness test.

Examples
## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog SFA (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
summary(tl_u_ts, grad = TRUE, ci = TRUE)

swissrailways Data on Swiss railway companies

Description

This dataset is an unbalanced panel of 50 Swiss railway companies over the period 1985-1997.
swissrailways 57

Format
A data frame with 605 observations on the following 42 variables.

ID Firm identification.
YEAR Year identification.
NI Number of years observed.
STOPS Number of stops in network.
NETWORK Network length (in meters).
NARROW_T Dummy variable for railroads with narrow track.
RACK Dummy variable for ‘rack rail’ in network.
TUNNEL Dummy variable for network with tunnels over 300 meters on average.
T Time indicator, first year = 0.
Q2 Passenger output – passenger km.
Q3 Freight output – ton km.
CT Total cost (1,000 Swiss franc).
PL Labor price.
PE Electricity price.
PK Capital price.
VIRAGE 1 for railroads with curvy tracks.
LNCT Log of CT/PE.
LNQ2 Log of Q2.
LNQ3 Log of Q3.
LNNET Log of NETWORK/1000.
LNPL Log of PL/PE.
LNPE Log of PE.
LNPK Log of PK/PE.
LNSTOP Log of STOPS.
MLNQ2 Mean of LNQ2.
MLNQ3 Mean of LNQ3.
MLNNET Mean of LNNET.
MLNPL Mean of LNPL.
MLNPK Mean of LNPK.
MLNSTOP Mean of LNSTOP.

Details
The dataset is extracted from the annual reports of the Swiss Federal Office of Statistics on public
transport companies and has been used in Farsi et al. (2005).
58 utility

Source
https://pages.stern.nyu.edu/~wgreene/Text/Edition7/tablelist8new.htm

References
Farsi, M., M. Filippini, and W. Greene. 2005. Efficiency measurement in network industries:
Application to the Swiss railway companies. Journal of Regulatory Economics, 28:69–90.

Examples
str(swissrailways)

utility Data on U.S. electricity generating plants

Description
This dataset contains data on fossil fuel fired steam electric power generation plants in the United
States between 1986 and 1996.

Format
A data frame with 791 observations on the following 11 variables.
firm Plant identification.
year Year identification.
y Net-steam electric power generation in megawatt-hours.
regu Dummy variable which takes a value equal to 1 if the power plant is in a state which enacted
legislation or issued a regulatory order to implement retail access during the sample period,
and 0 otherwise.
k Capital stock.
labor Labor and maintenance.
fuel Fuel.
wl Labor price.
wf Fuel price.
wk Capital price.
tc Total cost.

Details
The dataset has been used in Kumbhakar et al. (2014).

Source
https://sites.google.com/view/sfbook-stata/home
vcov 59

References
Kumbhakar, S.C., H.J. Wang, and A. Horncastle. 2014. A Practitioner’s Guide to Stochastic Fron-
tier Analysis Using Stata. Cambridge University Press.

Examples
str(utility)
summary(utility)

vcov Compute variance-covariance matrix of stochastic frontier models

Description
vcov computes the variance-covariance matrix of the maximum likelihood (ML) coefficients from
stochastic frontier models estimated with sfacross, sfalcmcross, or sfaselectioncross.

Usage
## S3 method for class 'sfacross'
vcov(object, extraPar = FALSE, ...)

## S3 method for class 'sfalcmcross'


vcov(object, ...)

## S3 method for class 'sfaselectioncross'


vcov(object, extraPar = FALSE, ...)

Arguments
object A stochastic frontier model returned by sfacross, sfalcmcross, or sfaselectioncross.
extraPar Logical. Only available for non heteroscedastic models returned by sfacross
and sfaselectioncross. Default = FALSE. If TRUE, variances and covariances
of additional parameters are returned:
sigmaSq = sigmauSq + sigmavSq
lambdaSq = sigmauSq/sigmavSq
sigmauSq = exp (W u) = exp (δZu )
sigmavSq = exp (W v) = exp (ϕZv )
sigma = sigmaSq^0.5
lambda = lambdaSq^0.5
sigmau = sigmauSq^0.5
sigmav = sigmavSq^0.5
gamma = sigmauSq/(sigmauSq + sigmavSq)
... Currently ignored
60 vcov

Details
The variance-covariance matrix is obtained by the inversion of the negative Hessian matrix. De-
pending on the distribution and the 'hessianType' option, the analytical/numeric Hessian or the
bhhh Hessian is evaluated.
The argument extraPar, is currently available only for objects of class 'sfacross' and 'sfaselectioncross'.
When 'extraPar = TRUE', the variance-covariance of the additional parameters is obtained using
the delta method.

Value
The variance-covariance matrix of the maximum likelihood coefficients is returned.

See Also
sfacross, for the stochastic frontier analysis model fitting function using cross-sectional or pooled
data.
sfalcmcross, for the latent class stochastic frontier analysis model fitting function using cross-
sectional or pooled data.
sfaselectioncross for sample selection in stochastic frontier model fitting function using cross-
sectional data.

Examples
## Using data on Spanish dairy farms
# Cobb Douglas (production function) half normal distribution
cb_s_h <- sfacross(formula = YIT ~ X1 + X2 + X3 + X4, udist = 'hnormal',
data = dairyspain, S = 1, method = 'bfgs')
vcov(cb_s_h)
vcov(cb_s_h, extraPar = TRUE)

# Other variance-covariance matrices can be obtained using the sandwich package

# Robust variance-covariance matrix

requireNamespace('sandwich', quietly = TRUE)

sandwich::vcovCL(cb_s_h)

# Coefficients and standard errors can be obtained using lmtest package

requireNamespace('lmtest', quietly = TRUE)

lmtest::coeftest(cb_s_h, vcov. = sandwich::vcovCL)

# Clustered standard errors

lmtest::coeftest(cb_s_h, vcov. = sandwich::vcovCL, cluster = ~ FARM)

# Doubly clustered standard errors


worldprod 61

lmtest::coeftest(cb_s_h, vcov. = sandwich::vcovCL, cluster = ~ FARM + YEAR)

# BHHH standard errors

lmtest::coeftest(cb_s_h, vcov. = sandwich::vcovOPG)

# Adjusted BHHH standard errors

lmtest::coeftest(cb_s_h, vcov. = sandwich::vcovOPG, adjust = TRUE)

## Using data on eighty-two countries production (GDP)


# LCM Cobb Douglas (production function) half normal distribution
cb_2c_h <- sfalcmcross(formula = ly ~ lk + ll + yr, udist = 'hnormal',
data = worldprod, uhet = ~ initStat, S = 1)
vcov(cb_2c_h)

worldprod Data on world production

Description
This dataset provides information on production related variables for eighty-two countries over the
period 1960–1987.

Format
A data frame with 2,296 observations on the following 12 variables.
country Country name.
code Country identification.
yr Year identification.
y GDP in 1987 U.S. dollars.
k Physical capital stock in 1987 U.S. dollars.
l Labor (number of individuals in the workforce between the age of 15 and 64).
h Human capital-adjusted labor.
ly Log of y.
lk Log of k.
ll Log of l.
lh Log of h.
initStat Log of the initial capital to labor ratio of each country, lk - ll, measured at the beginning
of the sample period.

Details
The dataset is from the World Bank STARS database and has been used in Kumbhakar et al. (2014).
62 worldprod

Source
https://sites.google.com/view/sfbook-stata/home

References
Kumbhakar, S.C., H.J. Wang, and A. Horncastle. 2014. A Practitioner’s Guide to Stochastic Fron-
tier Analysis Using Stata. Cambridge University Press.

Examples
str(worldprod)
summary(worldprod)
Index

∗ AIC ic, 17
ic, 17 logLik, 18
∗ BIC marginal, 19
ic, 17 residuals, 22
∗ HQIC skewnessTest, 53
ic, 17 summary, 54
∗ attribute vcov, 59
nobs, 21 ∗ models
∗ coefficients sfacross, 25
coef, 3 sfalcmcross, 33
∗ cross-section sfaselectioncross, 44
sfacross, 25 ∗ optimize
sfalcmcross, 33 sfacross, 25
sfaselectioncross, 44 sfalcmcross, 33
∗ datasets sfaselectioncross, 44
dairynorway, 5 ∗ residuals
dairyspain, 7 residuals, 22
electricity, 13 ∗ summary
ricephil, 24 summary, 54
swissrailways, 56 ∗ vcov
utility, 58 vcov, 59
worldprod, 61 _SFAR (sfaR-package), 2
∗ extract
bread, 29, 32, 37, 39, 49, 52, 56
extract, 14
bread.lcmcross (sfaR-deprecated), 40
∗ fitted
bread.sfacross (sfacross), 25
fitted, 15
bread.sfalcmcross, 44
∗ latent-class
bread.sfalcmcross (sfalcmcross), 33
sfalcmcross, 33 bread.sfaselectioncross
∗ likelihood (sfaselectioncross), 44
logLik, 18
sfacross, 25 coef, 3, 4, 5, 29, 32, 37, 39, 43, 49, 52, 56
sfalcmcross, 33 coef.lcmcross (sfaR-deprecated), 40
sfaselectioncross, 44 coef.sfalcmcross, 44
∗ marginal coef.summary.lcmcross
marginal, 19 (sfaR-deprecated), 40
∗ methods coef.summary.sfalcmcross, 44
coef, 3
extract, 14 dairynorway, 5
fitted, 15 dairyspain, 7

63
64 INDEX

efficiencies, 8, 8, 29, 32, 37, 39, 49, 52, 56 pcubature, 47


efficiencies.lcmcross print, 29, 32, 37, 39, 49, 52, 56
(sfaR-deprecated), 40 print.lcmcross (sfaR-deprecated), 40
efficiencies.sfalcmcross, 44 print.sfacross (sfacross), 25
electricity, 13 print.sfalcmcross, 44
estfun, 29, 32, 37, 39, 49, 52, 56 print.sfalcmcross (sfalcmcross), 33
estfun.lcmcross (sfaR-deprecated), 40 print.sfaselectioncross
estfun.sfacross (sfacross), 25 (sfaselectioncross), 44
estfun.sfalcmcross, 44 print.summary.lcmcross
estfun.sfalcmcross (sfalcmcross), 33 (sfaR-deprecated), 40
estfun.sfaselectioncross print.summary.sfacross (summary), 54
(sfaselectioncross), 44 print.summary.sfalcmcross, 44
extract, 14 print.summary.sfalcmcross (summary), 54
print.summary.sfaselectioncross
fitted, 15, 15, 29, 32, 37, 39, 49, 52, 56 (summary), 54
fitted.lcmcross (sfaR-deprecated), 40
fitted.sfalcmcross, 44 residuals, 22, 23, 29, 32, 37, 39, 49, 52, 56
residuals.lcmcross (sfaR-deprecated), 40
gaussHermiteData, 47, 51 residuals.sfalcmcross, 44
ricephil, 24
hcubature, 47
sfacross, 2–5, 8, 9, 11, 13–20, 22, 23, 25, 25,
ic, 17, 17, 29, 32, 37, 39, 49, 52, 56 28, 29, 53–56, 59, 60
ic.lcmcross (sfaR-deprecated), 40 sfalcmcross, 2–5, 8, 9, 11–20, 22, 23, 33, 33,
ic.sfalcmcross, 44 36–38, 44, 54–56, 59, 60
integrate, 47 sfaR (sfaR-package), 2
sfaR-deprecated, 40
lcmcross, 43 sfaR-package, 2
lcmcross (sfaR-deprecated), 40 sfaselectioncross, 2–5, 8, 12–23, 44, 44,
lmtest::coeftest(), 29, 37, 49 47, 50, 54–56, 59, 60
logLik, 18, 18, 19, 29, 32, 37, 39, 49, 52, 56 skewnessTest, 29, 32, 53, 53, 56
logLik.lcmcross (sfaR-deprecated), 40 summary, 29, 32, 37, 39, 49, 52, 54, 55
logLik.sfalcmcross, 44 summary.lcmcross (sfaR-deprecated), 40
summary.sfalcmcross, 44
marginal, 19, 20, 29, 32, 37, 39, 49, 52, 56 swissrailways, 56
marginal.lcmcross (sfaR-deprecated), 40
marginal.sfalcmcross, 44 trust.optim, 27, 35, 43, 46
maxBFGS, 27, 35, 42, 46
ucminf, 27, 35, 43, 46
maxBHHH, 27, 28, 35, 36, 42, 43, 46, 47
utility, 58
maxCG, 27, 35, 42, 43, 46
maxNM, 27, 35, 42, 43, 46 vcov, 29, 32, 37, 39, 43, 49, 52, 56, 59, 59
maxNR, 27, 28, 35, 36, 42, 43, 46, 47 vcov.lcmcross (sfaR-deprecated), 40
maxSANN, 27, 35, 42, 43, 46 vcov.sfalcmcross, 44
mla, 27, 35, 43, 46
worldprod, 61
nlminb, 27, 35, 43, 46
nobs, 21
nobs.lcmcross (sfaR-deprecated), 40
nobs.sfalcmcross, 44

You might also like