0% found this document useful (0 votes)

14 views21 pages

Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions

Uploaded by

ygx5k7gcw4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views21 pages

Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions

Uploaded by

ygx5k7gcw4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

AStA Adv Stat Anal (2016) 100:421–441

DOI 10.1007/s10182-016-0266-z

ORIGINAL PAPER

Likelihood-based inference for multivariate skew scale

mixtures of normal distributions

Clécio S. Ferreira1 · Víctor H. Lachos2 ·

Heleno Bolfarine3

Received: 29 March 2015 / Accepted: 5 January 2016 / Published online: 19 January 2016
© Springer-Verlag Berlin Heidelberg 2016

Abstract Scale mixtures of normal distributions are often used as a challenging class
for statistical analysis of symmetrical data. Recently, Ferreira et al. (Stat Methodol
8:154–171, 2011) defined the univariate skew scale mixtures of normal distributions
that offer much needed flexibility by combining both skewness with heavy tails. In
this paper, we develop a multivariate version of the skew scale mixtures of normal
distributions, with emphasis on the multivariate skew-Student-t, skew-slash and skew-
contaminated normal distributions. The main virtue of the members of this family of
distributions is that they are easy to simulate from and they also supply genuine expec-
tation/conditional maximisation either algorithms for maximum likelihood estimation.
The observed information matrix is derived analytically to account for standard errors.
Results obtained from real and simulated datasets are reported to illustrate the useful-
ness of the proposed method.

Keywords EM algorithm · ECME algorithm · Multivariate scale mixtures of normal

distributions · Skew distributions

B Víctor H. Lachos
[email protected]
Clécio S. Ferreira
[email protected]
Heleno Bolfarine
[email protected]

1 Department of Statistics, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
2 Departamento de Estatística, Universidade Estadual de Campinas, Cidade Universitaria
“Zeferino Vaz”, Campinas, São Paulo, Brazil
3 Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil

123
422 C. S. Ferreira et al.

1 Introduction

Scale mixtures of normal distributions (Andrews and Mallows 1974) compose a group
of thick-tailed distributions that are often used for robust inference of symmetrical data.
Moreover, this class includes distributions such as the Student-t, slash and contami-
nated normal, among others. However, the theory and application (through simulation
or experimentation) often generate a large number of datasets that are skewed and
have heavy tails, such as datasets of family income (Azzalini et al. 2003) or substance
concentration (Bolfarine and Lachos 2007). Thus, appropriate distributions to fit these
skewed and heavy-tailed data are needed. The skew-normal (SN) distribution is a new
class of density functions depending on an additional shape parameter, and includes
the normal density as a special case.
Azzalini (1985) proposed the univariate SN distribution and it was recently gener-
alized to the multivariate case by Azzalini and Dalla-Valle (1996) and Arellano-Valle
et al. (2005). The multivariate SN density extends the multivariate normal model by
allowing a shape parameter to account for skewness. The probability density function
(pdf) of the generic element of a multivariate skew-normal distribution is given by
SN (Lachos, 2010) f (y|μ, Σ, λ) = 2φ p (y|μ, Σ)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (1)

where φ p (.|μ, Σ) stands for the pdf of the p-variate normal distribution with mean vec-
tor μ and covariance matrix Σ, Φ1 (.) represents the cumulative distribution function
(cdf) of the standard normal distribution, and Σ −1/2 satisfies Σ −1/2 Σ −1/2 = Σ −1 .
When λ = 0, the skew-normal distribution reduces to the normal distribution
(Y ∼ N p (μ, Σ)). A p-dimensional random vector Y with pdf as in (1) will be denoted
by SN p (μ, Σ, λ). Its stochastic representation, which can be used to derive several of
its properties, is given by

Y = μ + Σ 1/2 [δ|T0 | + (I p − δδ )1/2 T1 ], with δ = λ/(1 + λ λ)1/2 ,

d
(2)

where |T0 | denotes the absolute value of T0 , T0 ∼ N1 (0, 1) and T1 ∼ N p (0, I p ) are
d
independent, I p denotes the identity matrix of order p and “=” means “distributed
as”.
A conditional representation of Y can be obtained by
Y|T = t ∼ N p (μ + Σ 1/2 δt, Σ 1/2 (I p − δδ )1/2 Σ 1/2 ),
(3)
T ∼ T N (0, 1; (0, +∞)),
Text
where T N (μ, σ 2 ; (a, b))
represents the univariate truncated normal distribution of
N (μ, σ 2 ) lying within interval (a, b) (Johnson et al. 1994). From (2), it follows that
the expectation and covariance of Y are given, respectively, by

2 1/2 1/2
E[Y] = μ + 2/π Σ 1/2 δ and Cov[Y] = Σ − Σ δδ Σ . (4)
π
Reasoning as in Azzalini and Dalla-Valle (1996), it is natural to construct multi-
variate distributions that combine skewness with heavy tails. Notice that the main idea

123
Likelihood-based inference for multivariate skew scale… 423

behind the construction in (1) involves a density function defined as the product of the
normal density with its cdf. Following Lin et al. (2013), a more general set-up can be
considered if we multiply a scale mixtures of normal (SMN) density by the cdf of the
normal distribution (as the skewing function). The approach leads to a family of asym-
metric elliptical distributions that will be called skew scale mixtures of skew-normal
(SSMN) distributions. For this new class of SSMN distributions, we study some of its
probabilistic and inferential properties and discuss applications to real data. One inter-
esting and simplifying aspect of the family defined is that the implementation of the
expectation/conditional maximisation either (ECME) algorithm are facilitated by the
fact that the E-step is exactly as in the scale mixtures of the normal distribution class of
models proposed in Andrews and Mallows (1974) (see also Osorio et al. 2007). More-
over, the M-step involves closed form expressions facilitating the implementation of
the EM-type algorithm. The multivariate SSMN class proposed here is fundamentally
different from the scale mixtures of skew-normal distributions (SMSN) developed by
Lachos et al. (2010) because we start our construction from the SMN densities and
not from the stochastic representation of a skew-normal random variable as presented
in Branco and Dey (2001) and Lachos et al. (2010)
The rest of the article is organized as follows. In Sect. 2, the SSMN distributions is
defined by extending the elliptical class of SMN distributions. Properties like moments
and a stochastic representation of the proposed distributions are also discussed. More-
over, some examples of SSMN distributions are presented. In Sect. 3, we discuss
how to compute maximum likelihood (ML) estimates via the ECME algorithm, which
presents advantages over the direct maximization approach, especially in terms of
robustness with respect to starting values. The observed information matrix is derived
analytically. Section 4 reports applications to simulated and real data sets, indicating
the usefulness of the proposed methodology. Finally, Sect. 5 concludes with some
discussions, citing avenues for future research.

2 A skew version of multivariate scale mixtures of normal distributions

Andrews and Mallows (1974) used the Laplace transform technique to prove that
a standardized continuous random variable Y has a scale mixture of normal (SMN)
distribution. This symmetric family of distributions has attracted much attention in the
last few years, mainly because it includes distributions such as the Student-t, slash,
power exponential and contaminated normal distributions. All these distributions have
heavier tails than the normal one.
We say that a p-dimensional vector Y has an SMN distribution (Lange and Sin-
sheimer 1993), with location parameter μ ∈ R p , a positive definite scale matrix Σ
and an hyperparameter τ , if its density function assumes the form
∞
f 0 (y) = φ p (y|μ, u −1 Σ)dH (u; τ )
0
∞
1 1 −1
= u p/2
exp − u(y − μ) Σ (y − μ)
(2π ) p/2 |Σ|1/2 0 2
×dH (u; τ ), y ∈ R , p
(5)

123
424 C. S. Ferreira et al.

where H (u; τ ) is a cdf of a one-dimensional positive random variable U indexed by

the parameter vector τ . For a random vector with a pdf as in (5), we shall use the
notation Y ∼ SMN p (μ, Σ; H ). Now, when μ = 0 and Σ = I p , we use the notation
Y ∼ SMN p (H ).
We will extend the SN distributions to a wider class, incorporating distributions
with heavier tails, by replacing the component φ p (·) in (1) with the robust SMN class
of distributions.

Definition 1 A p-dimensional random vector Y follows a multivariate skew scale

mixtures of normal distribution with location parameter μ ∈ R p , a positive definite
scale matrix Σ and skewness parameter λ ∈ R p , if its pdf is given by

f (y) = 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (6)

where f 0 (·) is as in (5). For a random vector with a pdf as in (6), we shall use the
notation Y ∼ SSMN p (μ, Σ, λ; H ).

If p = 1, then we have the univariate SSMN distribution developed in Ferreira

et al. (2011). Clearly, when λ = 0, we get the corresponding SMN distribution.
A random vector Y, with Y ∼ SSMN p (μ, Σ, λ; H ), has the following stochastic
representation:

d λ|T0 | −1/2
Y=μ+Σ 1/2
+ (U I p + λλ ) T1 , (7)
[U (U + λ λ)]1/2

where U ∼ H (.; τ ), T0 ∼ N (0, 1) and T1 ∼ N p (0, I p ) are mutually independent.

Let Υ = U −1/2 [U + λ λ]1/2 |T0 |. From (7), we have that

Σ 1/2 λυ −1 1/2
Y|Υ = υ, U = u ∼ N p μ + , Σ 1/2
(uI p + λλ ) Σ ,
u + λ λ

u + λ λ
Υ |U = u ∼ T N 0, ; (0, +∞) ,
u
U ∼ H (.; τ ). (8)

From (8), the joint pdf of Y, Υ and U is given by

2(1− p)/2 u p/2 u 1

f (y, υ, u) = exp − v v − (υ − λ v)2 h(u; τ ), (9)
π (1+ p)/2 |Σ|1/2 2 2

where v = Σ −1/2 (y − μ) and h(·; τ ) is the corresponding pdf of U ∼ H (.; τ ).

Integrating out υ in (9), we get

2(2− p)/2 u p/2 u

f (y, u) = exp − v v Φ1 (λ v)h(u; τ ), (10)
π p/2 |Σ|1/2 2

123
Likelihood-based inference for multivariate skew scale… 425

Finally, integrating out u, from (10) we have the marginal pdf of Y as follows:
+∞
Σ
f (y) = 2 φ p y|μ, h(u; τ )duΦ1 (λ v)
0 u
= 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)). (11)

Proposition 1 (An invariance result) If Y ∼ SSMN p (μ, Σ, λ; H ), then the condi-

tional distribution of U |Y = y does not depend on λ.

The proof entails dividing (10) by (11). The conditional distributions U |Y = y, for
each element of the SSMN class, are given in Sect. 2.1. For an SSMN random vector,
a convenient hierarchical representation is given next which can be used to quickly
simulate realizations of Y, for the implementation of the EM-type algorithm and also
to study many of its properties.

Proposition 2 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then its hierarchical representation is

given by

1 1
Y|U = u ∼ SN p μ, Σ, 1/2 λ ,
u u
U ∼ H (.; τ ). (12)

Proof Conditioned on U , it follows that

f Y (y|U = u) = 2φ p (y|μ, u −1 Σ)Φ1 (λ Σ −1/2 (y − μ))

= 2φ p (y|μ, u −1 Σ)Φ1 (u −1/2 λ [(u −1 Σ)−1/2 ](y − μ)).

Thus, clearly from (1), Y|U = u ∼ SN p (μ, u −1 Σ, u −1/2 λ).

From (12), to generate a SSMN random variable, we proceed in two steps, that is,
we generate first from the distribution of U and next from the conditional distribution
Y|U using, for instance, the stochastic representation given in (2).

Proposition 3 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then the moment generating function

(MY (s)) is given by

+∞ s Σs
s μ +
MY (s) = E es Y = 2e 2u Φ1
0

λ Σ 1/2
× s d H (u; τ ), s ∈ R p .
[u(u + λ λ)]1/2
√
Proof From Proposition 2, we have that Y|U = u ∼ SN p (μ, Σ/u, λ/ u).
Moreover, from well-known properties of conditional expectations, it follows that

123
426 C. S. Ferreira et al.

MY (s) = EU (E(es Y |U )) and the proof concludes with the fact that U is a pos-
itive random variable with cdf H and since, if Z ∼ SN p (μ, Σ, λ) then MZ (s) =
s Σs
2es μ+ 2 Φ1 δ Σ 1/2 s .

Proposition 4 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then, the Mahalanobis distance

Dλ = (Y − μ) Σ −1 (Y − μ),

has the same distribution as D = (X−μ) Σ −1 (X−μ), where X ∼ S M N p (μ, Σ; H ).

Proof The pdf of Y ∼ SSMN p (μ, Σ, λ; H ) in (6) has a skew-symmetric construction

(see Proposition 1 in Wang et al. 2004). Then, from Proposition 6 of the same article,
the distribution of ω(Y − μ), where ω is an even function, does not depend on the
skewing function Φ1 (λ).
The distributions of the quadratic forms can be found in Lange and Sinsheimer
(1993). Some particular cases of the SSMN distributions are discussed next.

2.1 Examples of SSMN distributions

• The skew Student-t normal (StN) distribution with ν > 0 degrees of freedom,
denoted by StN p (μ, Σ, λ; ν).

The use of the Student-t distribution as an alternative to the normal distribution has
been frequently suggested in the literature. For instance, Little (1988) and Lange et al.
(1989) recommend using the Student-t distribution for robust modeling. Considering
U ∼ Gamma(ν/2, ν/2), the pdf of Y takes the form

f (y) = 2t p (y|μ, Σ; ν)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (13)

Γ ( ν+ p ) −( ν+2 p )
where t p (·|μ, Σ; ν) = Γ ( ν )(νπ ) 2p/2 |Σ|1/2 1 + dν , d = (y − μ) Σ −1 (y − μ) is
2
the density function of a p-dimensional Student-t variate with ν degrees of freedom.
The univariate skew Student-t normal (StN) distribution was developed by Gómez
et al. (2007). In this paper, the authors showed that the St N distribution can present
a much wider asymmetry range than the one presented by the ordinary skew-normal
distribution Azzalini (1985). Lin et al. (2013) used the multivariate StN distribution
in the context of finite mixture models, including the implementation of an interesting
EM-type algorithm for ML estimation. When ν ↑ ∞, we get the skew-normal dis-
tribution as the limiting case. The quadratic form D/ p ∼ F p,ν . Finally, U |Y = y ∼
Γ ((ν + p)/2 + k)
Gamma((ν + p)/2, (ν + d)/2), so E[U k |Y = y] = .
Γ ((ν + p)/2)((ν + d)/2)k

• The skew-slash (SSL) distribution, with shape parameter ν > 0, denoted by

SSL p (μ, Σ, λ; ν).

123
Likelihood-based inference for multivariate skew scale… 427

The distribution of U is Beta(ν, 1), 0 < u < 1 and ν > 0. Its pdf is given by
1
f (y) = 2ν u ν−1 φ p (y|μ, u −1 Σ)duΦ1 (λ Σ −1/2 (y − μ)), y ∈ R p . (14)
0

The skew-slash distribution reduces to the skew-normal distribution when ν ↑ ∞.

The Mahalanobis distance has distribution function
2ν Γ (ν + p/2)
Pr (D ≤ r ) = Pr (χ p2 ≤ r ) − Pr (χ2ν+
2
p ≤ r ).
r ν Γ ( p/2)

It is easy to see that U |Y = y ∼ T G(ν+ p/2, d/2, 1), where T G(a, b, t) is the right
ba
truncated gamma distribution, with pdf f (x|a, b, t) = γ (a,bt) x a−1 exp(−bx)I(0,t) (x),
b a−1 −u
γ (a, b) = 0 u e du is the incomplete gamma function. So, we have that
Γ ((ν + p)/2 + k)P1 ((ν + p)/2 + k, d/2)
E[U k |Y = y] = , where Px (a, b)
Γ ((ν + p)/2)((ν + d)/2)k P1 ((ν + p)/2, d/2)
denotes the cdf of the Gamma(a, b) distribution (with mean a/b) evaluated at x.

• The skew-contaminated normal (SCN) distribution, denoted by

SCN p (μ, Σ, λ; ν, γ ), 0 ≤ ν ≤ 1, 0 < γ ≤ 1.
Here, U is a discrete random variable taking one of the two states. The pdf of U ,
given the parameter vector τ = (ν, γ ) , is denoted by h(u; τ ) = νI(u=γ ) + (1 −
ν)I(u=1) . It follows that

f (y) = 2{νφ p (y|μ, γ −1 Σ) + (1 − ν)φ p (y|μ, Σ)}Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p .

The skew-contaminated normal distribution reduces to the skew-normal distribution

when γ = 1. The Mahalanobis distance has cdf given by Pr (D ≤ r ) = ν Pr (χ p2 ≤
γ r ) + (1 − ν)Pr (χ p2 ≤ r ).
The conditional distribution U |Y = y is also a degenerated function, given
by f (u|Y = y) = f01(y) {νφ p (y, μ, γ −1 Σ)I(u=γ ) + (1 − ν)φ p (y, μ, Σ)I(u=1) },
where f 0 (y) = νφ p (y, μ, γ −1 Σ)I(u=γ ) + (1 − ν)φ p (y, μ, Σ)I(u=1) . Therefore,
1 − ν + νγ p/2+k exp{(1 − γ )d/2}
E[U k |Y = y] = .
1 − ν + νγ p/2 exp{(1 − γ )d/2}
From Proposition 3, it follows that the expectation and covariance of Y ∼
SSMN p (μ, Σ, λ; H ) are given, respectively, by

2 2 2 1/2 1/2
E[Y] = μ + ηΣ 1/2 λ and Cov[Y] = κΣ − η Σ λλ Σ , (15)
π π

where η = E[{U (U + λ λ)}−1/2 ] can be evaluated by numerical integration. The

constant κ = E[U −1 ] is equal to ν/(ν − 2) (ν > 2) for StN case, ν/(ν − 1) (ν > 1)
for SSL and ν(1 − γ )/γ + 1 for SCN. Note that if U = 1, that is Y ∼ SN p (μ, Σ, λ),
we have the same result given in (4).

123
428 C. S. Ferreira et al.

4
3

3
0.02
2

2
0.04
0.04
0.06
0.06
0.1
0.14 0.1
1

1
0.18 0.14
0.22 0.18
0

0
0.2
0.2 0.16
0.16 0.12
0.12
−1

−1
0.08
0.08
−2

−2
−3

−3
−3 −2 −1 0 1 2 3 4 −3 −2 −1 0 1 2 3 4
(a) (b)
4

4
0.02
3

0.04
0.02
0.06
0.04
2

0.06 0.1

0.08 0.14

0.12 0.18
1

0.2
0.16
0.24
0.18
0

0.22
0.14

0.16
−1

−1

0.1
0.12

0.08
−2

−2
−3

−3

−3 −2 −1 0 1 2 3 4 −3 −2 −1 0 1 2 3 4
(c) (d)
Fig. 1 Contour plot of some elements of the standard bivariate SSMN family. a SN2 (λ), b StN2 (λ; 2), c
SCN2 (λ; 0.5, 0.5) and d SSL2 (λ; 1), where λ = (2, 1)

Figure 1a, b provide contour plots of the SSMN bivariate densities. Figure 2 displays
the contour plot of a SSMN bivariate density for two levels (c = 0.01 and c = 0.1).
Note that SN contour are inside the StN, SSL and SCN ones for c = 0.01, while a
opposite situations occurs for c = 0.1.

2.2 Maximum likelihood estimation

The EM algorithm originally proposed by Dempster et al. (1977) has several appealing
features, such as stability of monotone convergence with each iteration increasing
the likelihood and simplicity of implementation. However, ML estimation in SSMN
models is complicated and the EM algorithm is less advisable due to the computational
difficulty in the M-step. To cope with this problem, we apply an extension of the
EM algorithm, called the ECME algorithm (Liu and Rubin 1994), that shares the
appealing features of the EM and has a typically faster convergence rate than the

123
Likelihood-based inference for multivariate skew scale… 429

2.0
4
3

1.5
2

1.0
1

0.5
0

0.0
−1

−0.5
−2

SN SN
StN StN
SSL SSL

−1.0
−3

SCN SCN

−3 −2 −1 0 1 2 3 4 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

(a) (b)
Fig. 2 A comparison of density contours of the SSMN distributions by plotting f (y|0, I2 , (4, 4); H ) = c,
with ν = 2 for StN and SSL models and (ν, γ ) = (0.5, 0.5) for SCN model. a c = 0.005 and b c = 0.1

EM. The ECME algorithm is obtained by maximizing the constrained Q-function (the
expected complete data function) with some conditional maximization (CM) steps
that maximizes the corresponding constrained actual marginal likelihood function,
called the CML steps. In the following, we demonstrate how to employ the ECME
algorithms for ML estimation of SSMN models.
Given Y ∼ SSMN p (μ, Σ, λ; H ), we have that joint distribution of (Y, U, T ) (see
Appendix 2) is given by

f (y, u, t) = 2φ p (y|μ, Σ/u) h(u; τ )φ1 (t|λ Σ −1/2 (y − μ), 1). (16)

Let y = (y1 , y2 , . . . , yn ) , u = (u 1 , . . . , u n ) and t = (t1 , . . . , tn ) . Treating u and

t as hypothetical missing data, it follows from (16) that the complete log-likelihood
function associated with yc = (y , u , t ) is given by

n

n ui
c (θ |yc ) ∝ − log |Σ| + − (yi − μ) Σ −1 (yi − μ)
2 2
i=1

1
+ ti λ Σ −1/2 (yi − μ) − (λ Σ −1/2 (yi − μ))2 + log h(u i ; τ ) . (17)
2

To facilitate the estimation process, consider the reparameterization Δ = Σ −1/2 λ.

(k)
Given the current estimate of θ in the kth iteration, θ = ( (k) ,
μ (k) , Σ
(k)
λ , τ (k) ),
the (k + 1)th E-step finds the conditional expectation of the complete log-likelihood
function with respect to the conditional distribution of (U, T) given Y and the current
(k)
estimated parameter θ ,

123
430 C. S. Ferreira et al.

(k) (k)
with Q 2i (θ |
θ ) = E[log h(Ui ; τ )|yi , θ =
θ ] and

1 −1
Q 1i (θ |
(k)
θ ) = − log |Σ (k) | − 1 (k) (k) (yi −
μ (k) ) Σ
u i (yi − μ (k) )
2 2
1 (k)
+ (k) (yi −
ti(k) Δ μ (k) ) − [Δ (yi − μ (k) )]2 .
2

(k)
The (k + 1)th M-step then finds to maximize Q(θ|
k+1
θ θ ). From (25), it follows
(k) (k)
that T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)). So, ti = E[Ti |yi , θ = θ ]
can be expressed by

(k) (k) −1/2 (k) (k) −1/2

(k)
ti =
λ Σ μ (k) ) + WΦ (
(yi − λ Σ μ (k) )),
(yi − (19)

where WΦ (u) = φ1 (u)/Φ1 (u).

(k) (k)
u i = E[Ui |yi , θ =
The expressions θ ] can be readily evaluated for the StN,
SSL and SCN distributions, as follows:
⎧
⎪
⎪ 1, for the SN distribution,
⎪
⎪ ν+p
⎪
⎪ ,
⎪
⎪
for the StN distribution,
⎨ ν + di
u i = (2ν + p) P1 ( p/2 + ν + 1, di /2) ,
for the SSL distribution, (20)
⎪
⎪ P1 ( p/2 + ν, di /2)
⎪
⎪
di
⎪
⎪ 1 − ν + νγ p/2+1 exp {(1 − γ )di /2}
⎪
⎪
⎩ , for the SCN distribution,
1 − ν + νγ p/2 exp {(1 − γ )di /2}

where di = (yi − μ) Σ −1 (yi − μ) is the Mahalanobis distance.

(k)
E-step Given θ = θ , compute ti (k) and
u i(k) , for i = 1, ..., n, using (19) and (20).
(k)
The CM-step then conditionally maximizes Q(θ| θ ) with respect to θ , obtaining
(k+1)
a new estimate
θ , as described next:
(k+1) (k+1)
μ(k+1) ,
CM-step Update λ
and Σ as

n −1
(k) −1 (k) (k) (k)

μ (k+1)
=
ui Σ
+ nΔ Δ
i=1
n
(k)
× (
(k) −1
ui Σ (k)
yi − Δ
(k)
ti + Δ (k) Δ
(k) yi ),
i=1
n −1 n

(k+1) =
Δ μ (k) )(yi −
(yi − μ (k) ) (k)
μ (k) ),
ti (yi −
i=1 i=1

123
Likelihood-based inference for multivariate skew scale… 431

n
(k+1) = 1
Σ
(k)
μ (k) )(yi −
u i (yi −
μ (k) ) ,
n
i=1

(k+1) (k+1) (k+1)

1/2
λ =Σ Δ . (21)

CML-step Fix μ(k+1) and Σ (k+1) and update

τ (k) by optimizing the constrained log-
likelihood function, i.e.,

n
(k+1)
τ (k+1) = argmaxτ
log f 0 (yi |
μ(k+1) , Σ , τ ), (22)
i=1

where f 0 (y) is the respective symmetric pdf as defined in (5). Further, this step requires
one-dimensional search of StN, SSL models and a bi-dimensional search of the SCN
model, which can be easily accomplished by using, for example, the “optim” or “opti-
mize” routines in R (R Core Team 2015) or “fmincon” in Matlab . The iterations
of the above algorithms are repeated until the difference between two successive
(k+1) (k)
log-likelihood
n values, |(
θ ) − (
θ )|, is sufficiently small, say 10−5 , where
(θ ) = i=1 log f (yi ), f (yi ) as defined in (11). The initial values used in the ECME
algorithm are the vector of sample mean for μ, the sample covariance matrix for Σ
and the vector of sample skewness for λ (see, for instance, Cabral et al. 2012).

2.3 The observed information matrix in multivariate SSMN distributions

Suppose that we have observations on n independent individuals, Y1 , . . . , Yn , where

Yi ∼ SSMN p (μ, Σ(α), λ; H ), for i = 1, ..., n. Then, the log-likelihood function
for θ = (μ , α , λ ) , with α = vech(Σ 1/2 ), given the observed sample y =
n p
(y1 , . . . , yn ) is of the form (θ ) = i=1 i (θ ), where i (θ ) = log 2 − log2π −
2
1
log |Σ| + log K i + log Φ1 (Ai ), with
2

∞ u
K i = K i (θ) = u p/2 exp − di h(u; τ )du,
0 2

di = (yi − μ) Σ −1 (yi − μ) and Ai = Ai (θ ) = λ Σ −1/2 (yi − μ). The score vector
∂(θ) ∂i (θ )
n
is given by = , where
∂θ ∂θ
i=1

∂i (θ ) 1 ∂(log |Σ|) 1 ∂ Ki ∂ Ai

=− + + WΦ (Ai ) ,
∂θ 2 ∂θ K i ∂θ ∂θ

123
432 C. S. Ferreira et al.

with WΦ (x) = φ1 (x)/Φ1 (x). The second derivatives of i (θ) with respect to θ take
the form

∂ 2 i (θ ) 1 ∂ 2 (log |Σ|) 1 ∂ 2 Ki 1 ∂ Ki ∂ Ki ∂ 2 Ai

=−
+
− 2
+ WΦ (Ai )
∂θ∂θ 2 ∂θ∂θ K i ∂θ∂θ K i ∂θ ∂θ ∂θ ∂θ
∂ Ai ∂ Ai
+WΦ (Ai ) ,
∂θ ∂θ

with WΦ (x) = −WΦ (x)(x + WΦ (x)). The first and second derivatives of K i (θ) with
respect to θ are given by

∂ K i (θ ) 1 ∂di
= − K i (θ) E[U |Y = yi ],
∂θ 2 ∂θ

∂ 2 K i (θ) K i (θ) ∂ 2 di 1 ∂di ∂di
= − E[U |Y = y i ] − E[U 2
|Y = yi ] .
∂θ∂θ 2 ∂θ∂θ 2 ∂θ ∂θ

Simplifying the above expressions, we have

The derivatives of Σ, di and Ai are given in Appendix A. The expected values

E[U k |Y = y] can be calculated by using the expressions given in Sect. 2.1.
A simple way of obtaining the standard errors of ML estimates of SSMN parameters
is to approximate the asymptotic covariance matrix of θ by the inverse of the observed
n ∂ 2 i (θ)
information matrix. Let J(θ |y) = − i=1 denote the observed information
∂θ∂θ
matrix for the marginal log-likelihood (θ), then by consistency of the ML estimates,

θ has approximately an N p (θ , J−1 ) distribution. In practice, J is unknown and has to
be replaced by J, that is, the matrix J evaluated at ML estimates θ̂ .

3 Simulation experiments

3.1 Simulation study 1: parameter recovery

To examine the performance of our proposed method, we conducted a simulation

study. The goal was to investigate the functionality of the proposed EM algorithm
and to check the standard errors provided by the observed Fisher information matrix.
We generated 1000 samples of sizes n = 300, 500 and 1000 and the true values
of the parameters were set at: μ = (0.5, −1), Σ = I2 and λ = (2, 2), ν = 5
for the St N and SS L and (ν, γ ) = (0.3, 0.8) for the SC N . From Tables 1 and 2

123
Likelihood-based inference for multivariate skew scale… 433

Table 1 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates
(SEemp ) based on 1000 samples from the StN2 model

Parameter True value n = 300 n = 500 n = 1000

Mean MC SD IMSE Mean MC SD IMSE Mean MC SD IMSE

μ1 0.5 0.505 0.123 0.112 0.500 0.088 0.086 0.501 0.062 0.061
μ2 −1 −0.999 0.122 0.112 −1.001 0.089 0.086 −1.000 0.062 0.061
α1 1 1.002 0.089 0.076 1.001 0.067 0.059 1.000 0.048 0.042
α2 0 −0.004 0.046 0.043 −0.002 0.034 0.033 −0.001 0.024 0.023
α3 1 1.004 0.092 0.076 1.003 0.069 0.059 1.000 0.048 0.042
λ1 2 2.150 0.686 0.549 2.085 0.446 0.406 2.044 0.325 0.281
λ2 2 2.157 0.674 0.552 2.093 0.465 0.407 2.040 0.313 0.280

Table 2 Mean and standard deviations (SD) for EM estimates and empirical standard error estimates
(SEemp ) based on 1000 samples from the SSL2 model

Parameter True value n = 300 n = 500 n = 1000

Mean MC SD IMSE Mean MC SD IMSE Mean MC SD IMSE

μ1 0.5 0.526 0.161 0.202 0.508 0.068 0.069 0.499 0.071 0.069
μ2 −1 −0.977 0.170 0.203 −1.003 0.067 0.069 −1.000 0.074 0.069
α1 1 1.009 0.104 0.077 1.009 0.062 0.041 1.014 0.063 0.041
α2 0 −0.014 0.053 0.041 −0.001 0.023 0.021 −0.001 0.022 0.021
α3 1 1.009 0.106 0.075 1.018 0.064 0.042 1.013 0.066 0.041
λ1 2 2.054 0.691 0.602 2.045 0.323 0.271 2.073 0.321 0.276
λ2 2 2.052 0.715 0.599 2.064 0.311 0.272 2.065 0.333 0.275

we can say that bias related to all parameters tends to zero when the sample size
increases, indicating that the ML estimates based on the proposed EM-type algo-
rithm present good large sample properties. These tables also provide the average
values of the approximate standard errors of the EM estimates obtained through the
information-based method described in Sect. 2.3 (IM SE) and the Monte Carlo stan-
dard deviation (MC SD) for the parameters. As expected, the results summarized in
these tables suggest that the approximation produced by the information method is
reliable.

3.2 Simulation study 2: flexibility

Here, the experiment is planned to show the flexibility of our proposed SSMN class.
Our strategy is to generate artificial data from a linear regression model where the
errors follow a distribution that is a finite mixture totally different in nature from the
class of SSMN distributions studied here. Specifically, we consider the normal inverse
Gaussian distribution (NIG), which is a scale mixture of a normal distribution and an
inverse Gaussian distribution that produces asymmetry and heavy tails (Cabral et al.

123
434 C. S. Ferreira et al.

2014). We say that a random variable U has an inverse Gaussian (IG) distribution
when its density is given by

δ ρ2
f (u|ρ, δ) = √ u −3/2 exp − (u − δ/ρ)2 , u > 0,
2π 2u

where ρ > 0 and δ > 0. We use the notation U ∼ IG(ρ, δ).

Definition 2 We say that the random vector X has a p-dimensional NIG distribution
if it admits the representation

X|U = u ∼ N p (μ + uΔλ, uΔ), U ∼ I G(ρ, δ),

where μ and λ are p-dimensional vectors of parameters, Δ is a p × p positive definite

matrix, ρ and δ are positive numbers. We use the notation X ∼ NIG(μ, Δ, λ; ρ, δ).

Aiming at evaluating the performance of the SSMN distribution for multivariate data
with heavy tails, we also use the multivariate skew-t (ST) distribution proposed by
Azzalini and Capitanio (2003).
We generated a sample of size n = 300 from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7)
and the results, according to the four aforementioned distributions (Normal, SN, StN,
SSL SCN) are presented in Table 3. Note that the StN distribution presented the best
fit, according to the AIC and BIC measures. The ST distribution was also fitted, which
provided a log-likelihood value ( θ) = −1133.039, leading to AI C = 2282.079 and
B I C = 2311.709. In fact, these values are higher than those obtained for the StN and
SCN from the SSMN class.
A second simulation study was also conducted by generating 1000 Monte Carlo
samples from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7) , with size n = 300. Under these
parameter values, measures of sample skewness and kurtosis ranged from (−9, −1.5)
to (1.4, 9.7) and (5.2, 105) to (4.8, 118.9) in each coordinate, respectively. For each
generated sample, we fitted the StN, SSL and SCN distributions from the SSMN class
and also the model ST of Azzalini and Capitanio (2003). Table 4 presents the results
of this simulation experiment, where we show the percents of best fit for each model
using the AIC and BIC. Note that the AIC and BIC pick the StN model in more than
90 % of the Monte Carlo samples, indicating the flexibility of the StN distribution
from our proposed SSMN class.

4 Application

This example concerns the Australian Institute of Sport (AIS) dataset which includes
11 physical and hematological attributes measured in 202 athletes and a binary indi-
cator variable for their gender (100 females and 102 males). The data were originally
collected by Cook and Weisberg (1994) and have previously been analyzed by var-
ious authors including Azzalini and Dalla-Valle (1996) and Azzalini (1985). Now,
we revisit this dataset with the aim of expanding the inferential results to the SMSN
family. Specifically, we focus on the SN, StN, SSL and the SCN distributions. The

123
Likelihood-based inference for multivariate skew scale… 435

Table 3 Simulation study

SN StN SSL SCN

MLE SE MLE SE MLE SE MLE SE

μ1 −5.255 2.516 −3.004 0.075 −3.178 0.097 −3.303 0.089

μ2 2.266 2.305 0.994 0.068 1.074 0.092 1.157 0.082
α1 2.535 0.101 1.591 0.113 3.060 0.148 2.010 0.094
α2 −0.922 0.064 −0.577 0.0.060 −1.921 0.107 −1.262 0.068
α3 1.628 0.062 0.892 0.062 2.058 0.099 1.425 0.066
λ1 0.007 1.119 −4.906 0.1.162 −6.723 1.114 −3.625 0.573
λ2 0.001 1.699 2.782 0.0.674 5.071 0.907 2.787 0.478
ν – 2.100 2.100 0.438
γ – – – 0.100
(
θ) −1208.245 −957.137 −1148.703 −1123.452
AIC 2430.491 1930.274 2313.406 2264.905
BIC 2456.418 1959.904 2343.036 2298.239

ML estimates (MLE) for the SSMN models fitted with samples from X ∼
NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7), n = 300. The SE values are the average of the estimated asymptotic
standard errors

Table 4 Percentages of best performance for each SSMN distribution based on 1000 samples generated
from X ∼ NIG2 ((−3, 1), I2 , (−4, 3); 1, 0.7)

ST StN SSL SCN

AIC 0 93.9 0 6.1

BIC 0 97.3 0 2.7

dataset is available in the R package “sn” (R Core Team 2015). For illustration, we
consider a subset of variables: Y1 body mass index (BMI), Y2 percentage of body fat
(Bfat) and Y3 lean body mass (LBM).
Table 5 presents the ML estimates of the parameters from the SN, StN, SSL and
SCN models, along with their corresponding standard errors (SE) calculated via the
observed information matrix (see Sect. 2.3). We also compare the SN, ST and the
SSL models by inspecting some information selection criteria. Comparing the models
by looking at the values of the information criteria presented in Table 5, we observe
that the SCN presents the best fit, followed closely by StN and SSL models. The fit
of SN is the worst, indicating a lack of adequacy of SN assumptions for this dataset.
The Q-Q plots and envelopes shown in Fig. 3 are based on the distribution of the
Mahalanobis distance given in Proposition 4, which is the same as the SMN class (see
Lange and Sinsheimer 1993). The lines in these figures represent the 5th percentile,
the mean and the 95th percentile of 100 simulated points for each observation. These
figures clearly show once again that the SSMN distribution with heavy tails provides a
better-fit than the SN model to the AIS dataset. Figure 4 presents the contour plots for
the skew-contaminated normal model fitted to AIS data. We can see from this figure

123
436 C. S. Ferreira et al.

Table 5 AIS data

SN StN SSL SCN

MLE SE MLE SE MLE SE MLE SE

μ1 22.45 0.35 22.39 0.32 22.35 0.31 22.35 0.30

μ2 5.82 0.16 5.81 0.16 5.78 0.17 5.82 0.13
μ3 71.66 1.44 71.49 1.38 71.32 1.37 71.22 1.33
B1,1 2.03 0.10 1.84 0.09 1.58 0.08 1.76 0.09
B2,1 1.19 0.20 1.06 0.19 0.93 0.16 1.03 0.18
B3,1 1.69 0.16 1.53 0.16 1.32 0.14 1.46 0.15
B2,2 9.11 0.48 8.68 0.49 7.64 0.42 8.68 0.46
B3,2 −3.58 0.56 −3.58 0.54 −3.14 0.47 −3.58 0.52
B3,3 14.15 0.90 13.51 0.90 11.75 0.78 13.26 0.86
λ1 2.74 1.23 2.45 1.08 2.15 0.93 3.60 1.60
λ2 21.01 6.84 20.04 6.49 17.66 5.68 30.54 12.13
λ3 −9.22 3.12 −9.16 3.08 −8.04 2.71 −13.797 5.58
ν – 16.59 2.97 0.07
γ – – – 0.27
(
θ) −1768.03 −1765.11 −1764.28 −1761.54
AIC 3560.06 3556.22 3555.56 3551.08
BIC 3599.76 3599.23 3598.57 3597.40

ML estimation results (MLE) for the SSMN models. SE are the estimated standard errors

that the fitted SCN density has reasonable ability to capture the asymmetry present in
the data.

5 Conclusions

In this work, we defined a new family of asymmetric models by extending the sym-
metric scale mixtures of normal family. Our proposal extends recent results found
in Ferreira et al. (2011) to a multivariate context. In addition, we developed a very
general method based on the EM algorithm for estimating the parameters of the skew
scale mixtures of normal distributions. Closed-form expressions were derived for the
iterative estimation processes. This was a consequence of the fact that the proposed
distributions possesses a stochastic representation that can be used to represent them
conditionally. This stochastic representation also allows us to study many of its proper-
ties easily. We believe that the approaches proposed here can also be used to study other
asymmetric multivariate models, like those proposed by Lachos et al. (2009, 2010).
These models proposed in the latter article have a stochastic representation of the
form Y = μ + κ 1/2 (U )Z, and they also have proper elements like the skew-t, skew-
slash, skew-contaminated normal, skew-logistic, skew-stable and skew-exponential
power distributions. The assessment of influence of data and model assumption on
the results of the statistical analysis is a key aspect of any new class of distributions.

123
Likelihood-based inference for multivariate skew scale… 437

Sample values and simulated envelope

12
15

10
8
10

6
4
5

2
0

0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Theoretical χ2p quantiles Theoretical F(p, ν) quantiles
(a) (b)

60
Sample values and simulated envelope
100
Sample values and simulated envelope

50
80

40
60

30
40

20
20

10
0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Theoretical χ2p quantiles Theoretical χ2p quantiles

(c) (d)
Fig. 3 Simulated envelope for SSMN distributions adjusted to AIS data: a skew-normal, b skew Student-t
normal, c skew-slash and d skew-contaminated normal

We are currently exploring the local influence and residual analysis to address this
issue.
One anonymous referee suggests to generalize the presented results to “multivariate
skew scale mixtures of t distributions” by replacing φ p in f 0 in (5) by the Student-t
density. However, the theoretical derivation seems to be not so easy, elegant and clear as
is the case of our proposal. For instance, different and independent u i for the t-density
(t p (, ν) instead φ p ()) and for H (; τ ), makes the estimation quite complicated and we
could not find a easy way to estimate the parameters in this scenario. Note also that this
generalization includes an additional parameter ν, which needs to be included in the
estimation process. For fitting SSMN models, we present feasible ECME algorithms
with simple analytical expressions (also for the observed information matrix) accord-
ing to a three-level hierarchical representation (8) of the model. In addition, numerical
results show that the SSMN class of distributions is very flexible and has already been
applied successfully in models of practical interest. For instance, Lin et al. (2013)

123
438 C. S. Ferreira et al.

40 50 60 70 80 90 100
30

0.001
25

0.002
Bfat

lbm
0.003
0.003 0.005
20

0.007
0.004 9
0.00
0.005
0.008
15

0.008 0.006

0.004
0.01
0.011
10

0.002
0.012 0.001

0.009
0.007 0.006
5

20 25 30 35 20 25 30 35
bmi bmi
40 50 60 70 80 90 100
lbm

0.0024

0.0026
0.0022
0.002
0.0016 0.0018
0.0014
0.0012
0.001
8e−04
6e−04
4e−04

2e−04

5 10 15 20 25 30 35
Bfat

Fig. 4 Contour plots for the skew-contaminated normal fitted to the AIS data

introduce a new family of mixture models based on the multivariate skew-t-normal

(FMStN) distribution, which is a particular case of our proposed SSMN class. They
showed that the proposed FMStN distribution works more substantially feasible than
the mixture model based on the multivariate skew-t distribution introduced by Sahu
et al. (2003) whose fitting procedure relies heavily on a high-dimensional integration
which may be computationally intensive.

Acknowledgments We thank the editor, associate editor and two referees whose constructive comments
led to an improved presentation of the paper. C.S. acknowledges support from FAPEMIG (Minas Gerais
State Foundation for Research Development), Grant CEX APQ 01845/14. V.H. acknowledges support from
CNPq-Brazil (Grant 305054/2011-2) and FAPESP-Brazil (Grant 2014/02938-9).

Appendix 1: Details of the observed information matrix

Considering α = Vech(B), where Σ 1/2 = B = B(α), the first and second derivatives
of log |Σ|, Ai and di are obtained. The notation used is that of Sect. 2 and for a p-
dimensional vector ρ = (ρ1 . . . , ρ p ) , we will use the notation Ḃr = ∂B(α)/∂αr ,
with r = 1, 2, . . . , p( p + 1)/2. Thus,

• Σ

∂ 2 log |Σ|
= −2tr(B−1 Ḃs B−1 Ḃk ),
∂αk ∂αs

123
Likelihood-based inference for multivariate skew scale… 439

• Ai

∂ Ai ∂ Ai ∂ Ai
= −B−1 λ, = −λ B−1 Ḃk B−1 (yi − μ), = B−1 (yi − μ),
∂μ ∂αk ∂λ
∂ 2 Ai ∂ 2 Ai −1 −1 ∂ 2 Ai
= 0, = B Ḃk B λ, = −B−1 ,
∂μ∂μ ∂μ∂αk ∂μ∂λ
∂ 2 Ai
= −λ B−1 [Ḃs B−1 Ḃk + Ḃk B−1 Ḃs ]B−1 (yi − μ),
∂αk ∂αs
∂ 2 Ai ∂ 2 Ai
= −B−1 Ḃk B−1 (yi − μ), = 0,
∂αk ∂λ ∂λ∂λ

• di

∂di ∂di
= −2B−2 (yi − μ), = −(yi − μ) B−1 [Ḃk B−1 + B−1 Ḃk ]B−1 (yi − μ),
∂μ ∂αk
∂di ∂ 2 di ∂ 2 di
= 0, = 2B−2 , = 2B−1 [Ḃk B−1 + B−1 Ḃk ]B−1 (yi − μ),
∂λ ∂μ∂μ ∂μ∂αk
∂ 2 di ∂ 2 di ∂ 2 di
= 0, = 0, = 0,
∂μ∂λ ∂αk ∂λ ∂λ∂λ
∂ 2 di
= (yi − μ) B−1 [Ḃs B−1 Ḃk B−1 + Ḃk B−1 Ḃs B−1 + Ḃk B−2 Ḃs + Ḃs B−2 Ḃk
∂αk ∂αs
+B−1 Ḃs B−1 Ḃk + B−1 Ḃk B−1 Ḃs ]B−1 (yi − μ).

Appendix 2: Joint, conditional and marginal distributions of (Y, U, T )

Note first that from (7), it follows that

Y|T = t, U = u ∼ N p (μ + u 1/2
t
Σ 1/2 δ u , u1 Σ 1/2 (I p + λu λu )−1 Σ 1/2 ),
(23)
U ∼ H (τ ), T ∼ T N (0, 1; (0, +∞)),

λ √
with U and T independent, δ u = √ , λu = λ/ u.
u+λ λ
Using some results given in Lachos et al. (2010), it follows that the joint distribution
of (Y, U, T ) is given by

f (y, u, t) = 2φ p (y|μ + At, Σ a ) φ1 (t|0, 1)h(u; τ )

= 2φ p (y|μ, Σ a + AA )φ1 (t|ΛA Σ a−1 (y − μ), Λ)h(u; τ ),
y ∈ R p , t > 0, u > 0,

where A = Σu 1/2δ u , Σ a = u1 Σ 1/2 (I p + λu λ

1/2
−1 1/2 and Λ = (1 + A Σ −1 A)−1 .
u) Σ a
Using the results given in Harville (1997), and after some algebraic manipulations, it
follows that Σ a + AA = u1 Σ, Λ = u+λu λ and ΛA Σ a−1 = Λ1/2 λ Σ −1/2 .

123
440 C. S. Ferreira et al.

Thus, the marginal distribution of Y ∼ SSMN p (μ, Σ, λ; H ) is given by

Then the joint distribution of (Y, T ) is given by

f (y, t) = 2 f 0 (y|μ, Σ)φ1 (t|λ Σ −1/2 (y − μ), 1), y ∈ R p , t > 0, (24)

and
φ1 (t|λ Σ −1/2 (y − μ), 1)
f (t|y) = , (25)
Φ1 (λ Σ −1/2 (y − μ))
so that, T |Y = y ∼ T N (λ Σ −1/2 (y − μ), 1; (0, +∞)).

References
Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. Ser. B 36, 99–102
(1974)
Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438
(2005)
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Azzalini, A., Capitanio, A.: Distributions generated and perturbation of symmetry with emphasis on the
multivariate skew-t distribution. J. R. Stat. Soc. Ser. B 61, 367–389 (2003)
Azzalini, A., Dalla-Valle, A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Azzalini, A., Capello, T.D., Kotz, S.: Log-skew-normal and log-skew-t distributions as models for family
income data. J. Income Distrib. 11, 13–21 (2003)
Bolfarine, H., Lachos, V.: Skew probit error-in-variables models. Stat. Methodol. 3, 1–12 (2007)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal.
79, 99–113 (2001)
Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent
distributions. Comput. Stat. Data Anal. 56(1), 126–142 (2012)
Cabral, C.R.B., Lachos, V.H., Zeller, C.B.: Multivariate measurement error models using finite mixtures of
skew-Student t distributions. J. Multivar. Anal. 124, 179–198 (2014)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, Hoboken (1994)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R.
Stat. Soc. Ser. B 39(1), 1–38 (1977)
Ferreira, C.S., Bolfarine, H., Lachos, V.H.: Skew scale mixtures of normal distributions: properties and
estimation. Stat. Methodol. 8, 154–171 (2011)
Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the normal distribu-
tion function. Environmetrics 18, 395–407 (2007)
Harville, D.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 1. Wiley, New York
(1994)

123
Likelihood-based inference for multivariate skew scale… 441

Lachos, V.H., Vilca, L.F., Bolfarine, H., Ghosh, P.: Robust multivariate measurement error models with
scale mixtures of skew-normal distributions. Statistics 44(6), 541–556 (2009)
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent
linear mixed models. Stat. Sin. 20(1), 303 (2010)
Lange, K.L., Sinsheimer, J.S.: Normal/independent distributions and their applications in robust regression.
J. Comput. Graph. Stat. 2, 175–198 (1993)
Lange, K.L., Little, R., Taylor, J.: Robust statistical modeling using t distribution. J. Am. Stat. Assoc. 84,
881–896 (1989)
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution.
Stat. Comput. 24, 531–546 (2013)
Little, R.J.A.: Robust estimation of the mean and covariance matrix from data with missing values. Appl.
Stat. 37, 23–38 (1988)
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone
convergence. Biometrika 80, 267–278 (1994)
Osorio, F., Paula, G.A., Galea, M.: Assessment of local influence in elliptical linear models with longitudinal
structure. Comput. Stat. Data Anal. 51(9), 4354–4368 (2007)
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna (2015). http://www.R-project.org/
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate distributions with applications to Bayesian
regression models. Can. J. Stat. 31, 129–150 (2003)
Wang, J., Boyer, J., Genton, M.: A skew-symmetric representation of multivariate distributions. Stat. Sin.
14, 1259–1270 (2004)

123

Multivariate Exponential Families A Concise Guide To Statistical Inference PDF
100% (12)
Multivariate Exponential Families A Concise Guide To Statistical Inference PDF
15 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
A First Course in Multivariate Statistics: Bernard Flury
No ratings yet
A First Course in Multivariate Statistics: Bernard Flury
4 pages
Multivariate Normal Distribution
100% (1)
Multivariate Normal Distribution
8 pages
Statistic Paper Research
No ratings yet
Statistic Paper Research
35 pages
Promathe 056
No ratings yet
Promathe 056
138 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
A Multivariate Modified Skew-Normal Distribution
No ratings yet
A Multivariate Modified Skew-Normal Distribution
45 pages
Slides 4
No ratings yet
Slides 4
51 pages
DSA5205 2 DIstribution&Risk
No ratings yet
DSA5205 2 DIstribution&Risk
59 pages
Finite Mixture of Skewed Distributions: Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller
No ratings yet
Finite Mixture of Skewed Distributions: Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller
108 pages
Skew Gaussian Process For Nonlinear Regression
No ratings yet
Skew Gaussian Process For Nonlinear Regression
26 pages
1-Multivariate Normal Distributions-18-07-2024
No ratings yet
1-Multivariate Normal Distributions-18-07-2024
36 pages
Rs 130105
No ratings yet
Rs 130105
22 pages
Multivariate Exponential Families A Concise Guide To Statistical Inference One-Click Ebook Download
No ratings yet
Multivariate Exponential Families A Concise Guide To Statistical Inference One-Click Ebook Download
17 pages
Lachos 2010b Multivariate Measurement Error Models Based On Scale Mixtures of The Skew-Normal Distribution
No ratings yet
Lachos 2010b Multivariate Measurement Error Models Based On Scale Mixtures of The Skew-Normal Distribution
17 pages
Lecture 4 Sep 16
No ratings yet
Lecture 4 Sep 16
26 pages
Topic 3 Multivariate Models I (Week 2)
No ratings yet
Topic 3 Multivariate Models I (Week 2)
27 pages
Symmetry 11 01216
No ratings yet
Symmetry 11 01216
22 pages
(Lachos, 2010) SNI
No ratings yet
(Lachos, 2010) SNI
21 pages
Skew Normal Model Theories and Their Applications 1st Edition Complete Chapter Download
No ratings yet
Skew Normal Model Theories and Their Applications 1st Edition Complete Chapter Download
15 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Multuvariate Skew T
No ratings yet
Multuvariate Skew T
31 pages
DeLeon Etal ChileJStat2013
No ratings yet
DeLeon Etal ChileJStat2013
17 pages
Multi Varia Da 1
No ratings yet
Multi Varia Da 1
59 pages
BT Wk3 LectureNotes
No ratings yet
BT Wk3 LectureNotes
19 pages
Mixsmsn Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions
No ratings yet
Mixsmsn Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions
20 pages
The Multivariate Normal Distribution: Exactly Central Limit
No ratings yet
The Multivariate Normal Distribution: Exactly Central Limit
59 pages
Gaussian Mixture Models: Abstract in This Chapter We First Introduce The Basic Concepts of Random Variables
No ratings yet
Gaussian Mixture Models: Abstract in This Chapter We First Introduce The Basic Concepts of Random Variables
10 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
15 pages
Manuscript: Symmetrical and Asymmetrical Mixture Autoregressive Processes
No ratings yet
Manuscript: Symmetrical and Asymmetrical Mixture Autoregressive Processes
21 pages
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
No ratings yet
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
12 pages
BT Wk3 LectureNotes
No ratings yet
BT Wk3 LectureNotes
16 pages
Asdad
No ratings yet
Asdad
14 pages
10.1515 - Rose 2022 2092
No ratings yet
10.1515 - Rose 2022 2092
8 pages
Unit 19
No ratings yet
Unit 19
16 pages
Gaussian Mixture Model
No ratings yet
Gaussian Mixture Model
10 pages
Predicting Winning Lottery Numbers
No ratings yet
Predicting Winning Lottery Numbers
10 pages
Huang and Wand 2013
No ratings yet
Huang and Wand 2013
14 pages
Skew Normal Distribution - Wikipedia
No ratings yet
Skew Normal Distribution - Wikipedia
7 pages
1 s2.0 S0047259X11001047 Main
No ratings yet
1 s2.0 S0047259X11001047 Main
11 pages
Handout-3-Multivariate Normal
No ratings yet
Handout-3-Multivariate Normal
9 pages
Generative Learning Algorithms: CS229 Lecture Notes
No ratings yet
Generative Learning Algorithms: CS229 Lecture Notes
14 pages
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
No ratings yet
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
15 pages
Multivariate Statistical Analysis: The Multivariate Normal Distribution
No ratings yet
Multivariate Statistical Analysis: The Multivariate Normal Distribution
13 pages
01-DensityEstimation 2
No ratings yet
01-DensityEstimation 2
26 pages
cs229 Notes2
No ratings yet
cs229 Notes2
14 pages
Chapter 5. Multiple Random Variables 5.9: The Multivariate Normal Distribution
No ratings yet
Chapter 5. Multiple Random Variables 5.9: The Multivariate Normal Distribution
5 pages
Generative Learning Algorithms: CS229 Lecture Notes
No ratings yet
Generative Learning Algorithms: CS229 Lecture Notes
14 pages
Generative Learning Algorithms: CS229 Lecture Notes
No ratings yet
Generative Learning Algorithms: CS229 Lecture Notes
14 pages
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Practical Data Analysis With JMP
No ratings yet
Practical Data Analysis With JMP
8 pages
Random Vectors and Multivariate Normal Distribution
No ratings yet
Random Vectors and Multivariate Normal Distribution
6 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Geometricbrownian PDF
100% (1)
Geometricbrownian PDF
15 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
Statistical Methods For Data Analysis in Particle Physics: Luca Lista
No ratings yet
Statistical Methods For Data Analysis in Particle Physics: Luca Lista
268 pages
Bause Kritzinger SPN Book Screen
No ratings yet
Bause Kritzinger SPN Book Screen
216 pages
Statistical Method
No ratings yet
Statistical Method
136 pages
EDA Folder Assignment
No ratings yet
EDA Folder Assignment
13 pages
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
No ratings yet
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
68 pages
Spatial Modelling Methods
No ratings yet
Spatial Modelling Methods
108 pages
Spatial Statistical Modelling of Insurance Claims
No ratings yet
Spatial Statistical Modelling of Insurance Claims
79 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
45 pages
Chapter V Continuous Probability Distributions PDF - Dr. Ashraf
No ratings yet
Chapter V Continuous Probability Distributions PDF - Dr. Ashraf
42 pages
Introduction To Lévy Processes
No ratings yet
Introduction To Lévy Processes
8 pages
A New Avenue For Bayesian Inference With INLA
No ratings yet
A New Avenue For Bayesian Inference With INLA
14 pages
(Bandyopadhyay, 2010) Linear Mixed Models For Skew Normal Independent Bivariate Responses With An
No ratings yet
(Bandyopadhyay, 2010) Linear Mixed Models For Skew Normal Independent Bivariate Responses With An
13 pages
Ngesa Et Al 2014 A Flexible Random Effects Distribution in Disease Mapping Models
No ratings yet
Ngesa Et Al 2014 A Flexible Random Effects Distribution in Disease Mapping Models
12 pages
Ecography - 2014 - Rousset - Testing Environmental and Genetic Effects in The Presence of Spatial Autocorrelation
No ratings yet
Ecography - 2014 - Rousset - Testing Environmental and Genetic Effects in The Presence of Spatial Autocorrelation
10 pages
Use of Generalized Linear Mixed Models in The Spatial Analysis of Small-Area Malaria Incidence Rates in KwaZulu Natal, South Africa
No ratings yet
Use of Generalized Linear Mixed Models in The Spatial Analysis of Small-Area Malaria Incidence Rates in KwaZulu Natal, South Africa
9 pages
Zero-Inflated Poisson Regression Mixture Model
No ratings yet
Zero-Inflated Poisson Regression Mixture Model
8 pages
Comparing INLA and OpenBUGS For Hierarchical Poisson Modeling in Disease Mapping
No ratings yet
Comparing INLA and OpenBUGS For Hierarchical Poisson Modeling in Disease Mapping
10 pages
(Azzalini, 1985) A Class of Distributions Which Includes The Normal Ones
No ratings yet
(Azzalini, 1985) A Class of Distributions Which Includes The Normal Ones
9 pages
Modul 6. ARIMA Box-Jenkins Part 2
No ratings yet
Modul 6. ARIMA Box-Jenkins Part 2
21 pages
SASA2023 Proceedings
No ratings yet
SASA2023 Proceedings
59 pages
2018 Exam Stam Tables
No ratings yet
2018 Exam Stam Tables
17 pages
20-Markov Chains 1
No ratings yet
20-Markov Chains 1
17 pages
Chapter 3 Special Distributions
No ratings yet
Chapter 3 Special Distributions
43 pages
Chapter1. Lesson3. Mean Variance and Standard Deviation
No ratings yet
Chapter1. Lesson3. Mean Variance and Standard Deviation
23 pages
Fitting CAR and SAR Models
No ratings yet
Fitting CAR and SAR Models
30 pages
Haan C T Statistical Methods in Hydrology Solution
No ratings yet
Haan C T Statistical Methods in Hydrology Solution
24 pages
Probability of Simple Events
No ratings yet
Probability of Simple Events
31 pages
Skew Kurt
No ratings yet
Skew Kurt
10 pages
Biometrical J - 2007 - Yang - Testing Approaches For Overdispersion in Poisson Regression Versus The Generalized Poisson
No ratings yet
Biometrical J - 2007 - Yang - Testing Approaches For Overdispersion in Poisson Regression Versus The Generalized Poisson
20 pages
Revisiting Gaussian Markov Random Fields and Bayesian Disease Mapping
No ratings yet
Revisiting Gaussian Markov Random Fields and Bayesian Disease Mapping
19 pages
Women and Insurance Pricing Policies
No ratings yet
Women and Insurance Pricing Policies
18 pages
Mode Invgaussian and Unimodal Gamma Punzo
No ratings yet
Mode Invgaussian and Unimodal Gamma Punzo
13 pages
(Prates, 2013) Assessing Intervention Efficacy On High-Risk Drinkers Using Generalized Linear Mixed Models With A New Class of Link Functions
No ratings yet
(Prates, 2013) Assessing Intervention Efficacy On High-Risk Drinkers Using Generalized Linear Mixed Models With A New Class of Link Functions
13 pages
Tutorial 3: Markov Chain: Universiti Tunku Abdul Rahman (Utar) UDPS 2133mathematical Programming
No ratings yet
Tutorial 3: Markov Chain: Universiti Tunku Abdul Rahman (Utar) UDPS 2133mathematical Programming
4 pages
Introduction To Discrete Event Systems
No ratings yet
Introduction To Discrete Event Systems
10 pages
(Vernic, 2005) Multivariate Skew-Normal Distributions With Applications in Insurance
No ratings yet
(Vernic, 2005) Multivariate Skew-Normal Distributions With Applications in Insurance
14 pages
Lampiran Validitas Dan Reliabilitas
No ratings yet
Lampiran Validitas Dan Reliabilitas
10 pages
CS 107 Probability, AUA, Spring 2024, Lecture 01
No ratings yet
CS 107 Probability, AUA, Spring 2024, Lecture 01
13 pages
CHARACTERIZATIONS AND ENTROPY MEASURES OF THE Revised Manuscript
No ratings yet
CHARACTERIZATIONS AND ENTROPY MEASURES OF THE Revised Manuscript
27 pages
5.random Variable
No ratings yet
5.random Variable
28 pages
Naivebayes
No ratings yet
Naivebayes
7 pages
2nd Stat and Prob (Special Exam)
No ratings yet
2nd Stat and Prob (Special Exam)
4 pages
P&S Syllabus
No ratings yet
P&S Syllabus
2 pages
Assignment 4 (Updated)
No ratings yet
Assignment 4 (Updated)
6 pages
Mathematics by N K Singh
No ratings yet
Mathematics by N K Singh
7 pages
Minka Gamma
No ratings yet
Minka Gamma
3 pages

Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions

Uploaded by

Likelihood-Based Inference For Multivariate Skew Scale Mixtures of Normal Distributions

Uploaded by

AStA Adv Stat Anal (2016) 100:421–441

Likelihood-based inference for multivariate skew scale

Clécio S. Ferreira1 · Víctor H. Lachos2 ·

Keywords EM algorithm · ECME algorithm · Multivariate scale mixtures of normal

Y = μ + Σ 1/2 [δ|T0 | + (I p − δδ )1/2 T1 ], with δ = λ/(1 + λ λ)1/2 ,

2 A skew version of multivariate scale mixtures of normal distributions

where H (u; τ ) is a cdf of a one-dimensional positive random variable U indexed by

Definition 1 A p-dimensional random vector Y follows a multivariate skew scale

f (y) = 2 f 0 (y)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (6)

If p = 1, then we have the univariate SSMN distribution developed in Ferreira

where U ∼ H (.; τ ), T0 ∼ N (0, 1) and T1 ∼ N p (0, I p ) are mutually independent.

From (8), the joint pdf of Y, Υ and U is given by

2(1− p)/2 u p/2 u 1

where v = Σ −1/2 (y − μ) and h(·; τ ) is the corresponding pdf of U ∼ H (.; τ ).

2(2− p)/2 u p/2 u

Proposition 1 (An invariance result) If Y ∼ SSMN p (μ, Σ, λ; H ), then the condi-

Proposition 2 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then its hierarchical representation is

Proof Conditioned on U , it follows that

f Y (y|U = u) = 2φ p (y|μ, u −1 Σ)Φ1 (λ Σ −1/2 (y − μ))

Thus, clearly from (1), Y|U = u ∼ SN p (μ, u −1 Σ, u −1/2 λ). 

Proposition 3 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then the moment generating function

Proposition 4 Let Y ∼ SSMN p (μ, Σ, λ; H ). Then, the Mahalanobis distance

has the same distribution as D = (X−μ) Σ −1 (X−μ), where X ∼ S M N p (μ, Σ; H ).

Proof The pdf of Y ∼ SSMN p (μ, Σ, λ; H ) in (6) has a skew-symmetric construction

2.1 Examples of SSMN distributions

f (y) = 2t p (y|μ, Σ; ν)Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p , (13)

• The skew-slash (SSL) distribution, with shape parameter ν > 0, denoted by

The skew-slash distribution reduces to the skew-normal distribution when ν ↑ ∞.

• The skew-contaminated normal (SCN) distribution, denoted by

f (y) = 2{νφ p (y|μ, γ −1 Σ) + (1 − ν)φ p (y|μ, Σ)}Φ1 (λ Σ −1/2 (y − μ)), y ∈ R p .

The skew-contaminated normal distribution reduces to the skew-normal distribution

where η = E[{U (U + λ λ)}−1/2 ] can be evaluated by numerical integration. The

2.2 Maximum likelihood estimation

−3 −2 −1 0 1 2 3 4 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

Let y = (y1 , y2 , . . . , yn ) , u = (u 1 , . . . , u n ) and t = (t1 , . . . , tn ) . Treating u and

To facilitate the estimation process, consider the reparameterization Δ = Σ −1/2 λ.

(k)  (k) −1/2 (k)  (k) −1/2

where WΦ (u) = φ1 (u)/Φ1 (u).

where di = (yi − μ) Σ −1 (yi − μ) is the Mahalanobis distance.

(k+1)  (k+1) (k+1)

CML-step Fix μ(k+1) and Σ  (k+1) and update 

2.3 The observed information matrix in multivariate SSMN distributions

Suppose that we have observations on n independent individuals, Y1 , . . . , Yn , where

∂i (θ ) 1 ∂(log |Σ|) 1 ∂ Ki ∂ Ai

Simplifying the above expressions, we have

The derivatives of Σ, di and Ai are given in Appendix A. The expected values

3.1 Simulation study 1: parameter recovery

To examine the performance of our proposed method, we conducted a simulation

Parameter True value n = 300 n = 500 n = 1000

Parameter True value n = 300 n = 500 n = 1000

3.2 Simulation study 2: flexibility

where ρ > 0 and δ > 0. We use the notation U ∼ IG(ρ, δ).

X|U = u ∼ N p (μ + uΔλ, uΔ), U ∼ I G(ρ, δ),

where μ and λ are p-dimensional vectors of parameters, Δ is a p × p positive definite

Table 3 Simulation study

SN StN SSL SCN

μ1 −5.255 2.516 −3.004 0.075 −3.178 0.097 −3.303 0.089

ST StN SSL SCN

AIC 0 93.9 0 6.1

Table 5 AIS data

SN StN SSL SCN

μ1 22.45 0.35 22.39 0.32 22.35 0.31 22.35 0.30

Sample values and simulated envelope

introduce a new family of mixture models based on the multivariate skew-t-normal

Appendix 1: Details of the observed information matrix

Appendix 2: Joint, conditional and marginal distributions of (Y, U, T )

Note first that from (7), it follows that

f (y, u, t) = 2φ p (y|μ + At, Σ a ) φ1 (t|0, 1)h(u; τ )

where A = Σu 1/2δ u , Σ a = u1 Σ 1/2 (I p + λu λ

Thus, the marginal distribution of Y ∼ SSMN p (μ, Σ, λ; H ) is given by

Then the joint distribution of (Y, T ) is given by

f (y, t) = 2 f 0 (y|μ, Σ)φ1 (t|λ Σ −1/2 (y − μ), 1), y ∈ R p , t > 0, (24)

You might also like

Thus, clearly from (1), Y|U = u ∼ SN p (μ, u −1 Σ, u −1/2 λ).

(k) (k) −1/2 (k) (k) −1/2

(k+1) (k+1) (k+1)

CML-step Fix μ(k+1) and Σ (k+1) and update

∂i (θ ) 1 ∂(log |Σ|) 1 ∂ Ki ∂ Ai