Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions
Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions
DOI 10.1007/s10182-016-0266-z
ORIGINAL PAPER
Received: 29 March 2015 / Accepted: 5 January 2016 / Published online: 19 January 2016
© Springer-Verlag Berlin Heidelberg 2016
Abstract Scale mixtures of normal distributions are often used as a challenging class
for statistical analysis of symmetrical data. Recently, Ferreira et al. (Stat Methodol
8:154–171, 2011) defined the univariate skew scale mixtures of normal distributions
that offer much needed flexibility by combining both skewness with heavy tails. In
this paper, we develop a multivariate version of the skew scale mixtures of normal
distributions, with emphasis on the multivariate skew-Student-t, skew-slash and skew-
contaminated normal distributions. The main virtue of the members of this family of
distributions is that they are easy to simulate from and they also supply genuine expec-
tation/conditional maximisation either algorithms for maximum likelihood estimation.
The observed information matrix is derived analytically to account for standard errors.
Results obtained from real and simulated datasets are reported to illustrate the useful-
ness of the proposed method.
B Víctor H. Lachos
[email protected]
Clécio S. Ferreira
[email protected]
Heleno Bolfarine
[email protected]
1 Department of Statistics, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
2 Departamento de Estatística, Universidade Estadual de Campinas, Cidade Universitaria
“Zeferino Vaz”, Campinas, São Paulo, Brazil
3 Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
123
422 C. S. Ferreira et al.
1 Introduction
Scale mixtures of normal distributions (Andrews and Mallows 1974) compose a group
of thick-tailed distributions that are often used for robust inference of symmetrical data.
Moreover, this class includes distributions such as the Student-t, slash and contami-
nated normal, among others. However, the theory and application (through simulation
or experimentation) often generate a large number of datasets that are skewed and
have heavy tails, such as datasets of family income (Azzalini et al. 2003) or substance
concentration (Bolfarine and Lachos 2007). Thus, appropriate distributions to fit these
skewed and heavy-tailed data are needed. The skew-normal (SN) distribution is a new
class of density functions depending on an additional shape parameter, and includes
the normal density as a special case.
Azzalini (1985) proposed the univariate SN distribution and it was recently gener-
alized to the multivariate case by Azzalini and Dalla-Valle (1996) and Arellano-Valle
et al. (2005). The multivariate SN density extends the multivariate normal model by
allowing a shape parameter to account for skewness. The probability density function
(pdf) of the generic element of a multivariate skew-normal distribution is given by
SN (Lachos, 2010) f (y|μ, Σ, λ) = 2φ p (y|μ, Σ)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (1)
where φ p (.|μ, Σ) stands for the pdf of the p-variate normal distribution with mean vec-
tor μ and covariance matrix Σ, Φ1 (.) represents the cumulative distribution function
(cdf) of the standard normal distribution, and Σ −1/2 satisfies Σ −1/2 Σ −1/2 = Σ −1 .
When λ = 0, the skew-normal distribution reduces to the normal distribution
(Y ∼ N p (μ, Σ)). A p-dimensional random vector Y with pdf as in (1) will be denoted
by SN p (μ, Σ, λ). Its stochastic representation, which can be used to derive several of
its properties, is given by
where |T0 | denotes the absolute value of T0 , T0 ∼ N1 (0, 1) and T1 ∼ N p (0, I p ) are
d
independent, I p denotes the identity matrix of order p and “=” means “distributed
as”.
A conditional representation of Y can be obtained by
Y|T = t ∼ N p (μ + Σ 1/2 δt, Σ 1/2 (I p − δδ )1/2 Σ 1/2 ),
(3)
T ∼ T N (0, 1; (0, +∞)),
Text
where T N (μ, σ 2 ; (a, b))
represents the univariate truncated normal distribution of
N (μ, σ 2 ) lying within interval (a, b) (Johnson et al. 1994). From (2), it follows that
the expectation and covariance of Y are given, respectively, by
2 1/2 1/2
E[Y] = μ + 2/π Σ 1/2 δ and Cov[Y] = Σ − Σ δδ Σ . (4)
π
Reasoning as in Azzalini and Dalla-Valle (1996), it is natural to construct multi-
variate distributions that combine skewness with heavy tails. Notice that the main idea
123
Likelihood-based inference for multivariate skew scale… 423
behind the construction in (1) involves a density function defined as the product of the
normal density with its cdf. Following Lin et al. (2013), a more general set-up can be
considered if we multiply a scale mixtures of normal (SMN) density by the cdf of the
normal distribution (as the skewing function). The approach leads to a family of asym-
metric elliptical distributions that will be called skew scale mixtures of skew-normal
(SSMN) distributions. For this new class of SSMN distributions, we study some of its
probabilistic and inferential properties and discuss applications to real data. One inter-
esting and simplifying aspect of the family defined is that the implementation of the
expectation/conditional maximisation either (ECME) algorithm are facilitated by the
fact that the E-step is exactly as in the scale mixtures of the normal distribution class of
models proposed in Andrews and Mallows (1974) (see also Osorio et al. 2007). More-
over, the M-step involves closed form expressions facilitating the implementation of
the EM-type algorithm. The multivariate SSMN class proposed here is fundamentally
different from the scale mixtures of skew-normal distributions (SMSN) developed by
Lachos et al. (2010) because we start our construction from the SMN densities and
not from the stochastic representation of a skew-normal random variable as presented
in Branco and Dey (2001) and Lachos et al. (2010)
The rest of the article is organized as follows. In Sect. 2, the SSMN distributions is
defined by extending the elliptical class of SMN distributions. Properties like moments
and a stochastic representation of the proposed distributions are also discussed. More-
over, some examples of SSMN distributions are presented. In Sect. 3, we discuss
how to compute maximum likelihood (ML) estimates via the ECME algorithm, which
presents advantages over the direct maximization approach, especially in terms of
robustness with respect to starting values. The observed information matrix is derived
analytically. Section 4 reports applications to simulated and real data sets, indicating
the usefulness of the proposed methodology. Finally, Sect. 5 concludes with some
discussions, citing avenues for future research.
Andrews and Mallows (1974) used the Laplace transform technique to prove that
a standardized continuous random variable Y has a scale mixture of normal (SMN)
distribution. This symmetric family of distributions has attracted much attention in the
last few years, mainly because it includes distributions such as the Student-t, slash,
power exponential and contaminated normal distributions. All these distributions have
heavier tails than the normal one.
We say that a p-dimensional vector Y has an SMN distribution (Lange and Sin-
sheimer 1993), with location parameter μ ∈ R p , a positive definite scale matrix Σ
and an hyperparameter τ , if its density function assumes the form
∞
f 0 (y) = φ p (y|μ, u −1 Σ)dH (u; τ )
0
∞
1 1 −1
= u p/2
exp − u(y − μ) Σ (y − μ)
(2π ) p/2 |Σ|1/2 0 2
×dH (u; τ ), y ∈ R , p
(5)
123
424 C. S. Ferreira et al.
where f 0 (·) is as in (5). For a random vector with a pdf as in (6), we shall use the
notation Y ∼ SSMN p (μ, Σ, λ; H ).
123
Likelihood-based inference for multivariate skew scale… 425
Finally, integrating out u, from (10) we have the marginal pdf of Y as follows:
+∞
Σ
f (y) = 2 φ p y|μ, h(u; τ )duΦ1 (λ v)
0 u
= 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)). (11)
The proof entails dividing (10) by (11). The conditional distributions U |Y = y, for
each element of the SSMN class, are given in Sect. 2.1. For an SSMN random vector,
a convenient hierarchical representation is given next which can be used to quickly
simulate realizations of Y, for the implementation of the EM-type algorithm and also
to study many of its properties.
From (12), to generate a SSMN random variable, we proceed in two steps, that is,
we generate first from the distribution of U and next from the conditional distribution
Y|U using, for instance, the stochastic representation given in (2).
+∞ s Σs
s μ +
MY (s) = E es Y = 2e 2u Φ1
0
λ Σ 1/2
× s d H (u; τ ), s ∈ R p .
[u(u + λ λ)]1/2
√
Proof From Proposition 2, we have that Y|U = u ∼ SN p (μ, Σ/u, λ/ u).
Moreover, from well-known properties of conditional expectations, it follows that
123
426 C. S. Ferreira et al.
MY (s) = EU (E(es Y |U )) and the proof concludes with the fact that U is a pos-
itive random variable with cdf H and since, if Z ∼ SN p (μ, Σ, λ) then MZ (s) =
s Σs
2es μ+ 2 Φ1 δ Σ 1/2 s .
Dλ = (Y − μ) Σ −1 (Y − μ),
• The skew Student-t normal (StN) distribution with ν > 0 degrees of freedom,
denoted by StN p (μ, Σ, λ; ν).
The use of the Student-t distribution as an alternative to the normal distribution has
been frequently suggested in the literature. For instance, Little (1988) and Lange et al.
(1989) recommend using the Student-t distribution for robust modeling. Considering
U ∼ Gamma(ν/2, ν/2), the pdf of Y takes the form
Γ ( ν+ p ) −( ν+2 p )
where t p (·|μ, Σ; ν) = Γ ( ν )(νπ ) 2p/2 |Σ|1/2 1 + dν , d = (y − μ) Σ −1 (y − μ) is
2
the density function of a p-dimensional Student-t variate with ν degrees of freedom.
The univariate skew Student-t normal (StN) distribution was developed by Gómez
et al. (2007). In this paper, the authors showed that the St N distribution can present
a much wider asymmetry range than the one presented by the ordinary skew-normal
distribution Azzalini (1985). Lin et al. (2013) used the multivariate StN distribution
in the context of finite mixture models, including the implementation of an interesting
EM-type algorithm for ML estimation. When ν ↑ ∞, we get the skew-normal dis-
tribution as the limiting case. The quadratic form D/ p ∼ F p,ν . Finally, U |Y = y ∼
Γ ((ν + p)/2 + k)
Gamma((ν + p)/2, (ν + d)/2), so E[U k |Y = y] = .
Γ ((ν + p)/2)((ν + d)/2)k
123
Likelihood-based inference for multivariate skew scale… 427
The distribution of U is Beta(ν, 1), 0 < u < 1 and ν > 0. Its pdf is given by
1
f (y) = 2ν u ν−1 φ p (y|μ, u −1 Σ)duΦ1 (λ Σ −1/2 (y − μ)), y ∈ R p . (14)
0
It is easy to see that U |Y = y ∼ T G(ν+ p/2, d/2, 1), where T G(a, b, t) is the right
ba
truncated gamma distribution, with pdf f (x|a, b, t) = γ (a,bt) x a−1 exp(−bx)I(0,t) (x),
b a−1 −u
γ (a, b) = 0 u e du is the incomplete gamma function. So, we have that
Γ ((ν + p)/2 + k)P1 ((ν + p)/2 + k, d/2)
E[U k |Y = y] = , where Px (a, b)
Γ ((ν + p)/2)((ν + d)/2)k P1 ((ν + p)/2, d/2)
denotes the cdf of the Gamma(a, b) distribution (with mean a/b) evaluated at x.
123
428 C. S. Ferreira et al.
4
3
3
0.02
2
2
0.04
0.04
0.06
0.06
0.1
0.14 0.1
1
1
0.18 0.14
0.22 0.18
0
0
0.2
0.2 0.16
0.16 0.12
0.12
−1
−1
0.08
0.08
−2
−2
−3
−3
−3 −2 −1 0 1 2 3 4 −3 −2 −1 0 1 2 3 4
(a) (b)
4
4
0.02
3
0.04
0.02
0.06
0.04
2
0.06 0.1
0.08 0.14
0.12 0.18
1
0.2
0.16
0.24
0.18
0
0.22
0.14
0.16
−1
−1
0.1
0.12
0.08
−2
−2
−3
−3
−3 −2 −1 0 1 2 3 4 −3 −2 −1 0 1 2 3 4
(c) (d)
Fig. 1 Contour plot of some elements of the standard bivariate SSMN family. a SN2 (λ), b StN2 (λ; 2), c
SCN2 (λ; 0.5, 0.5) and d SSL2 (λ; 1), where λ = (2, 1)
Figure 1a, b provide contour plots of the SSMN bivariate densities. Figure 2 displays
the contour plot of a SSMN bivariate density for two levels (c = 0.01 and c = 0.1).
Note that SN contour are inside the StN, SSL and SCN ones for c = 0.01, while a
opposite situations occurs for c = 0.1.
The EM algorithm originally proposed by Dempster et al. (1977) has several appealing
features, such as stability of monotone convergence with each iteration increasing
the likelihood and simplicity of implementation. However, ML estimation in SSMN
models is complicated and the EM algorithm is less advisable due to the computational
difficulty in the M-step. To cope with this problem, we apply an extension of the
EM algorithm, called the ECME algorithm (Liu and Rubin 1994), that shares the
appealing features of the EM and has a typically faster convergence rate than the
123
Likelihood-based inference for multivariate skew scale… 429
2.0
4
3
1.5
2
1.0
1
0.5
0
0.0
−1
−0.5
−2
SN SN
StN StN
SSL SSL
−1.0
−3
SCN SCN
EM. The ECME algorithm is obtained by maximizing the constrained Q-function (the
expected complete data function) with some conditional maximization (CM) steps
that maximizes the corresponding constrained actual marginal likelihood function,
called the CML steps. In the following, we demonstrate how to employ the ECME
algorithms for ML estimation of SSMN models.
Given Y ∼ SSMN p (μ, Σ, λ; H ), we have that joint distribution of (Y, U, T ) (see
Appendix 2) is given by
f (y, u, t) = 2φ p (y|μ, Σ/u) h(u; τ )φ1 (t|λ Σ −1/2 (y − μ), 1). (16)
n
n ui
c (θ |yc ) ∝ − log |Σ| + − (yi − μ) Σ −1 (yi − μ)
2 2
i=1
1
+ ti λ Σ −1/2 (yi − μ) − (λ Σ −1/2 (yi − μ))2 + log h(u i ; τ ) . (17)
2
(k) (k)
n
(k)
n
(k)
Q(θ |
θ ) = E[c (θ |yc )|y, θ =
θ ]= Q 1i (θ |
θ )+ Q 2i (θ|
θ ), (18)
i=1 i=1
123
430 C. S. Ferreira et al.
(k) (k)
with Q 2i (θ |
θ ) = E[log h(Ui ; τ )|yi , θ =
θ ] and
1 −1
Q 1i (θ |
(k)
θ ) = − log |Σ (k) | − 1 (k) (k) (yi −
μ (k) ) Σ
u i (yi − μ (k) )
2 2
1 (k)
+ (k) (yi −
ti(k) Δ μ (k) ) − [Δ (yi − μ (k) )]2 .
2
(k)
The (k + 1)th M-step then finds to maximize Q(θ|
k+1
θ θ ). From (25), it follows
(k) (k)
that T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)). So, ti = E[Ti |yi , θ = θ ]
can be expressed by
n −1
(k) −1 (k) (k) (k)
μ (k+1)
=
ui Σ
+ nΔ Δ
i=1
n
(k)
× (
(k) −1
ui Σ (k)
yi − Δ
(k)
ti + Δ (k) Δ
(k) yi ),
i=1
n −1 n
(k+1) =
Δ μ (k) )(yi −
(yi − μ (k) ) (k)
μ (k) ),
ti (yi −
i=1 i=1
123
Likelihood-based inference for multivariate skew scale… 431
n
(k+1) = 1
Σ
(k)
μ (k) )(yi −
u i (yi −
μ (k) ) ,
n
i=1
n
(k+1)
τ (k+1) = argmaxτ
log f 0 (yi |
μ(k+1) , Σ , τ ), (22)
i=1
where f 0 (y) is the respective symmetric pdf as defined in (5). Further, this step requires
one-dimensional search of StN, SSL models and a bi-dimensional search of the SCN
model, which can be easily accomplished by using, for example, the “optim” or “opti-
mize” routines in R (R Core Team 2015) or “fmincon” in Matlab . The iterations
of the above algorithms are repeated until the difference between two successive
(k+1) (k)
log-likelihood
n values, |(
θ ) − (
θ )|, is sufficiently small, say 10−5 , where
(θ ) = i=1 log f (yi ), f (yi ) as defined in (11). The initial values used in the ECME
algorithm are the vector of sample mean for μ, the sample covariance matrix for Σ
and the vector of sample skewness for λ (see, for instance, Cabral et al. 2012).
∞ u
K i = K i (θ) = u p/2 exp − di h(u; τ )du,
0 2
di = (yi − μ) Σ −1 (yi − μ) and Ai = Ai (θ ) = λ Σ −1/2 (yi − μ). The score vector
∂(θ) ∂i (θ )
n
is given by = , where
∂θ ∂θ
i=1
123
432 C. S. Ferreira et al.
with WΦ (x) = φ1 (x)/Φ1 (x). The second derivatives of i (θ) with respect to θ take
the form
∂ 2 i (θ ) 1 ∂ 2 (log |Σ|) 1 ∂ 2 Ki 1 ∂ Ki ∂ Ki ∂ 2 Ai
=−
+
− 2
+ WΦ (Ai )
∂θ∂θ 2 ∂θ∂θ K i ∂θ∂θ K i ∂θ ∂θ ∂θ ∂θ
∂ Ai ∂ Ai
+WΦ (Ai ) ,
∂θ ∂θ
with WΦ (x) = −WΦ (x)(x + WΦ (x)). The first and second derivatives of K i (θ) with
respect to θ are given by
∂ K i (θ ) 1 ∂di
= − K i (θ) E[U |Y = yi ],
∂θ 2 ∂θ
∂ 2 K i (θ) K i (θ) ∂ 2 di 1 ∂di ∂di
= − E[U |Y = y i ] − E[U 2
|Y = yi ] .
∂θ∂θ 2 ∂θ∂θ 2 ∂θ ∂θ
∂ 2 i (θ ) 1 ∂ 2 (log |Σ|) 1 ∂ 2 di
=− − E[U |Y = yi ]
∂θ∂θ 2 ∂θ∂θ 2 ∂θ∂θ
1 ∂di ∂di
+ [E[U 2 |Y = yi ] − E[U |Y = yi ]2 ]
4 ∂θ ∂θ
∂ 2 Ai ∂ Ai ∂ Ai
+WΦ (Ai )
+ WΦ (Ai ) .
∂θ ∂θ ∂θ ∂θ
3 Simulation experiments
123
Likelihood-based inference for multivariate skew scale… 433
Table 1 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates
(SEemp ) based on 1000 samples from the StN2 model
μ1 0.5 0.505 0.123 0.112 0.500 0.088 0.086 0.501 0.062 0.061
μ2 −1 −0.999 0.122 0.112 −1.001 0.089 0.086 −1.000 0.062 0.061
α1 1 1.002 0.089 0.076 1.001 0.067 0.059 1.000 0.048 0.042
α2 0 −0.004 0.046 0.043 −0.002 0.034 0.033 −0.001 0.024 0.023
α3 1 1.004 0.092 0.076 1.003 0.069 0.059 1.000 0.048 0.042
λ1 2 2.150 0.686 0.549 2.085 0.446 0.406 2.044 0.325 0.281
λ2 2 2.157 0.674 0.552 2.093 0.465 0.407 2.040 0.313 0.280
Table 2 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates
(SEemp ) based on 1000 samples from the SSL2 model
μ1 0.5 0.526 0.161 0.202 0.508 0.068 0.069 0.499 0.071 0.069
μ2 −1 −0.977 0.170 0.203 −1.003 0.067 0.069 −1.000 0.074 0.069
α1 1 1.009 0.104 0.077 1.009 0.062 0.041 1.014 0.063 0.041
α2 0 −0.014 0.053 0.041 −0.001 0.023 0.021 −0.001 0.022 0.021
α3 1 1.009 0.106 0.075 1.018 0.064 0.042 1.013 0.066 0.041
λ1 2 2.054 0.691 0.602 2.045 0.323 0.271 2.073 0.321 0.276
λ2 2 2.052 0.715 0.599 2.064 0.311 0.272 2.065 0.333 0.275
we can say that bias related to all parameters tends to zero when the sample size
increases, indicating that the ML estimates based on the proposed EM-type algo-
rithm present good large sample properties. These tables also provide the average
values of the approximate standard errors of the EM estimates obtained through the
information-based method described in Sect. 2.3 (IM SE) and the Monte Carlo stan-
dard deviation (MC SD) for the parameters. As expected, the results summarized in
these tables suggest that the approximation produced by the information method is
reliable.
Here, the experiment is planned to show the flexibility of our proposed SSMN class.
Our strategy is to generate artificial data from a linear regression model where the
errors follow a distribution that is a finite mixture totally different in nature from the
class of SSMN distributions studied here. Specifically, we consider the normal inverse
Gaussian distribution (NIG), which is a scale mixture of a normal distribution and an
inverse Gaussian distribution that produces asymmetry and heavy tails (Cabral et al.
123
434 C. S. Ferreira et al.
2014). We say that a random variable U has an inverse Gaussian (IG) distribution
when its density is given by
δ ρ2
f (u|ρ, δ) = √ u −3/2 exp − (u − δ/ρ)2 , u > 0,
2π 2u
Definition 2 We say that the random vector X has a p-dimensional NIG distribution
if it admits the representation
Aiming at evaluating the performance of the SSMN distribution for multivariate data
with heavy tails, we also use the multivariate skew-t (ST) distribution proposed by
Azzalini and Capitanio (2003).
We generated a sample of size n = 300 from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7)
and the results, according to the four aforementioned distributions (Normal, SN, StN,
SSL SCN) are presented in Table 3. Note that the StN distribution presented the best
fit, according to the AIC and BIC measures. The ST distribution was also fitted, which
provided a log-likelihood value ( θ) = −1133.039, leading to AI C = 2282.079 and
B I C = 2311.709. In fact, these values are higher than those obtained for the StN and
SCN from the SSMN class.
A second simulation study was also conducted by generating 1000 Monte Carlo
samples from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7) , with size n = 300. Under these
parameter values, measures of sample skewness and kurtosis ranged from (−9, −1.5)
to (1.4, 9.7) and (5.2, 105) to (4.8, 118.9) in each coordinate, respectively. For each
generated sample, we fitted the StN, SSL and SCN distributions from the SSMN class
and also the model ST of Azzalini and Capitanio (2003). Table 4 presents the results
of this simulation experiment, where we show the percents of best fit for each model
using the AIC and BIC. Note that the AIC and BIC pick the StN model in more than
90 % of the Monte Carlo samples, indicating the flexibility of the StN distribution
from our proposed SSMN class.
4 Application
This example concerns the Australian Institute of Sport (AIS) dataset which includes
11 physical and hematological attributes measured in 202 athletes and a binary indi-
cator variable for their gender (100 females and 102 males). The data were originally
collected by Cook and Weisberg (1994) and have previously been analyzed by var-
ious authors including Azzalini and Dalla-Valle (1996) and Azzalini (1985). Now,
we revisit this dataset with the aim of expanding the inferential results to the SMSN
family. Specifically, we focus on the SN, StN, SSL and the SCN distributions. The
123
Likelihood-based inference for multivariate skew scale… 435
ML estimates (MLE) for the SSMN models fitted with samples from X ∼
NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7), n = 300. The SE values are the average of the estimated asymptotic
standard errors
Table 4 Percentages of best performance for each SSMN distribution based on 1000 samples generated
from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7)
dataset is available in the R package “sn” (R Core Team 2015). For illustration, we
consider a subset of variables: Y1 body mass index (BMI), Y2 percentage of body fat
(Bfat) and Y3 lean body mass (LBM).
Table 5 presents the ML estimates of the parameters from the SN, StN, SSL and
SCN models, along with their corresponding standard errors (SE) calculated via the
observed information matrix (see Sect. 2.3). We also compare the SN, ST and the
SSL models by inspecting some information selection criteria. Comparing the models
by looking at the values of the information criteria presented in Table 5, we observe
that the SCN presents the best fit, followed closely by StN and SSL models. The fit
of SN is the worst, indicating a lack of adequacy of SN assumptions for this dataset.
The Q-Q plots and envelopes shown in Fig. 3 are based on the distribution of the
Mahalanobis distance given in Proposition 4, which is the same as the SMN class (see
Lange and Sinsheimer 1993). The lines in these figures represent the 5th percentile,
the mean and the 95th percentile of 100 simulated points for each observation. These
figures clearly show once again that the SSMN distribution with heavy tails provides a
better-fit than the SN model to the AIS dataset. Figure 4 presents the contour plots for
the skew-contaminated normal model fitted to AIS data. We can see from this figure
123
436 C. S. Ferreira et al.
ML estimation results (MLE) for the SSMN models. SE are the estimated standard errors
that the fitted SCN density has reasonable ability to capture the asymmetry present in
the data.
5 Conclusions
In this work, we defined a new family of asymmetric models by extending the sym-
metric scale mixtures of normal family. Our proposal extends recent results found
in Ferreira et al. (2011) to a multivariate context. In addition, we developed a very
general method based on the EM algorithm for estimating the parameters of the skew
scale mixtures of normal distributions. Closed-form expressions were derived for the
iterative estimation processes. This was a consequence of the fact that the proposed
distributions possesses a stochastic representation that can be used to represent them
conditionally. This stochastic representation also allows us to study many of its proper-
ties easily. We believe that the approaches proposed here can also be used to study other
asymmetric multivariate models, like those proposed by Lachos et al. (2009, 2010).
These models proposed in the latter article have a stochastic representation of the
form Y = μ + κ 1/2 (U )Z, and they also have proper elements like the skew-t, skew-
slash, skew-contaminated normal, skew-logistic, skew-stable and skew-exponential
power distributions. The assessment of influence of data and model assumption on
the results of the statistical analysis is a key aspect of any new class of distributions.
123
Likelihood-based inference for multivariate skew scale… 437
20
12
15
10
8
10
6
4
5
2
0
0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Theoretical χ2p quantiles Theoretical F(p, ν) quantiles
(a) (b)
60
Sample values and simulated envelope
100
Sample values and simulated envelope
50
80
40
60
30
40
20
20
10
0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Theoretical χ2p quantiles Theoretical χ2p quantiles
(c) (d)
Fig. 3 Simulated envelope for SSMN distributions adjusted to AIS data: a skew-normal, b skew Student-t
normal, c skew-slash and d skew-contaminated normal
We are currently exploring the local influence and residual analysis to address this
issue.
One anonymous referee suggests to generalize the presented results to “multivariate
skew scale mixtures of t distributions” by replacing φ p in f 0 in (5) by the Student-t
density. However, the theoretical derivation seems to be not so easy, elegant and clear as
is the case of our proposal. For instance, different and independent u i for the t-density
(t p (, ν) instead φ p ()) and for H (; τ ), makes the estimation quite complicated and we
could not find a easy way to estimate the parameters in this scenario. Note also that this
generalization includes an additional parameter ν, which needs to be included in the
estimation process. For fitting SSMN models, we present feasible ECME algorithms
with simple analytical expressions (also for the observed information matrix) accord-
ing to a three-level hierarchical representation (8) of the model. In addition, numerical
results show that the SSMN class of distributions is very flexible and has already been
applied successfully in models of practical interest. For instance, Lin et al. (2013)
123
438 C. S. Ferreira et al.
35
40 50 60 70 80 90 100
30
0.001
25
0.002
Bfat
lbm
0.003
0.003 0.005
20
0.007
0.004 9
0.00
0.005
0.008
15
0.008 0.006
0.004
0.01
0.011
10
0.002
0.012 0.001
0.009
0.007 0.006
5
20 25 30 35 20 25 30 35
bmi bmi
40 50 60 70 80 90 100
lbm
0.0024
0.0026
0.0022
0.002
0.0016 0.0018
0.0014
0.0012
0.001
8e−04
6e−04
4e−04
2e−04
5 10 15 20 25 30 35
Bfat
Fig. 4 Contour plots for the skew-contaminated normal fitted to the AIS data
Acknowledgments We thank the editor, associate editor and two referees whose constructive comments
led to an improved presentation of the paper. C.S. acknowledges support from FAPEMIG (Minas Gerais
State Foundation for Research Development), Grant CEX APQ 01845/14. V.H. acknowledges support from
CNPq-Brazil (Grant 305054/2011-2) and FAPESP-Brazil (Grant 2014/02938-9).
Considering α = Vech(B), where Σ 1/2 = B = B(α), the first and second derivatives
of log |Σ|, Ai and di are obtained. The notation used is that of Sect. 2 and for a p-
dimensional vector ρ = (ρ1 . . . , ρ p ) , we will use the notation Ḃr = ∂B(α)/∂αr ,
with r = 1, 2, . . . , p( p + 1)/2. Thus,
• Σ
∂ 2 log |Σ|
= −2tr(B−1 Ḃs B−1 Ḃk ),
∂αk ∂αs
123
Likelihood-based inference for multivariate skew scale… 439
• Ai
∂ Ai ∂ Ai ∂ Ai
= −B−1 λ, = −λ B−1 Ḃk B−1 (yi − μ), = B−1 (yi − μ),
∂μ ∂αk ∂λ
∂ 2 Ai ∂ 2 Ai −1 −1 ∂ 2 Ai
= 0, = B Ḃk B λ, = −B−1 ,
∂μ∂μ ∂μ∂αk ∂μ∂λ
∂ 2 Ai
= −λ B−1 [Ḃs B−1 Ḃk + Ḃk B−1 Ḃs ]B−1 (yi − μ),
∂αk ∂αs
∂ 2 Ai ∂ 2 Ai
= −B−1 Ḃk B−1 (yi − μ), = 0,
∂αk ∂λ ∂λ∂λ
• di
∂di ∂di
= −2B−2 (yi − μ), = −(yi − μ) B−1 [Ḃk B−1 + B−1 Ḃk ]B−1 (yi − μ),
∂μ ∂αk
∂di ∂ 2 di ∂ 2 di
= 0, = 2B−2 , = 2B−1 [Ḃk B−1 + B−1 Ḃk ]B−1 (yi − μ),
∂λ ∂μ∂μ ∂μ∂αk
∂ 2 di ∂ 2 di ∂ 2 di
= 0, = 0, = 0,
∂μ∂λ ∂αk ∂λ ∂λ∂λ
∂ 2 di
= (yi − μ) B−1 [Ḃs B−1 Ḃk B−1 + Ḃk B−1 Ḃs B−1 + Ḃk B−2 Ḃs + Ḃs B−2 Ḃk
∂αk ∂αs
+B−1 Ḃs B−1 Ḃk + B−1 Ḃk B−1 Ḃs ]B−1 (yi − μ).
Y|T = t, U = u ∼ N p (μ + u 1/2
t
Σ 1/2 δ u , u1 Σ 1/2 (I p + λu λu )−1 Σ 1/2 ),
(23)
U ∼ H (τ ), T ∼ T N (0, 1; (0, +∞)),
λ √
with U and T independent, δ u = √ , λu = λ/ u.
u+λ λ
Using some results given in Lachos et al. (2010), it follows that the joint distribution
of (Y, U, T ) is given by
123
440 C. S. Ferreira et al.
and
φ1 (t|λ Σ −1/2 (y − μ), 1)
f (t|y) = , (25)
Φ1 (λ Σ −1/2 (y − μ))
so that, T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)).
References
Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. Ser. B 36, 99–102
(1974)
Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438
(2005)
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Azzalini, A., Capitanio, A.: Distributions generated and perturbation of symmetry with emphasis on the
multivariate skew-t distribution. J. R. Stat. Soc. Ser. B 61, 367–389 (2003)
Azzalini, A., Dalla-Valle, A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Azzalini, A., Capello, T.D., Kotz, S.: Log-skew-normal and log-skew-t distributions as models for family
income data. J. Income Distrib. 11, 13–21 (2003)
Bolfarine, H., Lachos, V.: Skew probit error-in-variables models. Stat. Methodol. 3, 1–12 (2007)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal.
79, 99–113 (2001)
Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent
distributions. Comput. Stat. Data Anal. 56(1), 126–142 (2012)
Cabral, C.R.B., Lachos, V.H., Zeller, C.B.: Multivariate measurement error models using finite mixtures of
skew-Student t distributions. J. Multivar. Anal. 124, 179–198 (2014)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, Hoboken (1994)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R.
Stat. Soc. Ser. B 39(1), 1–38 (1977)
Ferreira, C.S., Bolfarine, H., Lachos, V.H.: Skew scale mixtures of normal distributions: properties and
estimation. Stat. Methodol. 8, 154–171 (2011)
Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the normal distribu-
tion function. Environmetrics 18, 395–407 (2007)
Harville, D.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 1. Wiley, New York
(1994)
123
Likelihood-based inference for multivariate skew scale… 441
Lachos, V.H., Vilca, L.F., Bolfarine, H., Ghosh, P.: Robust multivariate measurement error models with
scale mixtures of skew-normal distributions. Statistics 44(6), 541–556 (2009)
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent
linear mixed models. Stat. Sin. 20(1), 303 (2010)
Lange, K.L., Sinsheimer, J.S.: Normal/independent distributions and their applications in robust regression.
J. Comput. Graph. Stat. 2, 175–198 (1993)
Lange, K.L., Little, R., Taylor, J.: Robust statistical modeling using t distribution. J. Am. Stat. Assoc. 84,
881–896 (1989)
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution.
Stat. Comput. 24, 531–546 (2013)
Little, R.J.A.: Robust estimation of the mean and covariance matrix from data with missing values. Appl.
Stat. 37, 23–38 (1988)
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone
convergence. Biometrika 80, 267–278 (1994)
Osorio, F., Paula, G.A., Galea, M.: Assessment of local influence in elliptical linear models with longitudinal
structure. Comput. Stat. Data Anal. 51(9), 4354–4368 (2007)
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna (2015). http://www.R-project.org/
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate distributions with applications to Bayesian
regression models. Can. J. Stat. 31, 129–150 (2003)
Wang, J., Boyer, J., Genton, M.: A skew-symmetric representation of multivariate distributions. Stat. Sin.
14, 1259–1270 (2004)
123