0% found this document useful (0 votes)

11 views16 pages

2169 Edited

This paper presents a framework for simultaneous inference in finite mixtures of generalized linear regression models, focusing on testing regression coefficients and constant effects across mixture components. The authors derive confidence regions and p-values based on maximum likelihood estimates and evaluate the methods using both simulated and real-world data. The study aims to enhance model selection and interpretation by providing statistical tools for assessing the significance of regression coefficients in mixture models.

Uploaded by

arsalansyedm317

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

2169 Edited

Uploaded by

arsalansyedm317

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Austrian Journal of Statistics

2025, Volume 54, 55–70.

http://www.ajs.or.at/
AJS
doi:10.17713/ajs.v54i3.2169

Simultaneous Inference in Finite Mixtures of

Regression Models1

Friedrich Leisch Torsten Hothorn

LMU München LMU München

Abstract
A general framework for simultaneous inference in finite mixtures of generalized linear
regression models is presented. Assuming asymptotic normality of the maximum likeli-
hood estimate of all interesting model parameters, confidence regions and p-values using
a maximum norm for the multivariate t-statistic are derived. This allows to simultane-
ously test all regression coefficients whether they are zero. Another application is to test
for constant effects across mixture components. Size and power of the new methods are
evaluated using artificial data. A real world data set on the productivity of PhD students
is used to demonstrate the application of the procedures.

Keywords: R, simultaneous inference, finite mixture model, EM algorithm, latent class regres-
sion.

1. Introduction
We consider finite mixture models with K components of form

K
h(y|x, P, B, Γ) = πk f (y|x, βk , γk ) (1)
X

k=1

K
πk > 0, πk = 1
X

k=1

where y is a dependent variable with conditional density h, x ∈ RD is a D-dimensional

vector of independent variables, πk is the component proportion of component k, βk is the
component specific vector of regression coefficients and γk a component specific vector of
nuisance parameters for density function f . Further, let P = (π1 , . . . , πK )⊤ be the vector of
all component proportions, B = (β1⊤ , . . . , βK ⊤ )⊤ the vector of all regression coefficients and

Γ = (γ1⊤ , . . . , γK
⊤ )⊤ the vector of all other parameters in the model. The reason for splitting

1
This manuscript was written between 2010 and 2011 while both authors taught at LMU München. It
has not been published before and is printed here with only minor editorial changes (typos, table formatting,
addressing feedback from the review process, etc.). Hothorn (2025) explains the history and reflects on issues
regarding reproducibility of numerical results using this work as an example.
56 Simultaneous Inference in Finite Mixture Models

the set of all parameters into vectors P, B and Γ is that we are only interested in inference
on B in this paper.
If f is a normal density with component-specific mean βk⊤ x and variance σk2 , we have γk = (σk2 )
as component specific nuisance parameters and Equation (1) describes a mixture of standard
linear regression models, also called latent class regression. If f is another member of the
exponential family (binomial, gamma, Poisson, . . . ), we get a mixture of generalized linear
models (Wedel and DeSarbo 1995; Wang, Puterman, Cockburn, and Le 1996).
The corresponding log-likelihood cannot be maximized analytically and hence numerical
methods have to be used. Only in the simplest cases direct optimization using gradient
descent methods is feasible, and the most popular method for maximum likelihood estima-
tion of the parameters are variations of the EM algorithm (Dempster, Laird, and Rubin
1977). Below we use the EM algorithm to find a maximum of the log-likelihood. After EM
the full log-likelihood of all components is used to obtain a numerical estimate of the co-
variance matrix of all model parameters. We refer to Grün and Leisch (2008a) for a recent
introduction to EM estimation of mixtures of GLMs and identification problems of the model
class. Bayesian estimation of GLM mixtures is shown, e.g., in Lenk and DeSarbo (2000) or
Frühwirth-Schnatter (2006), but will not be considered below.
A lot of research effort has been devoted to the selection of the right number of mixture
components K, also called order selection. In a maximum likelihood (ML) context the most
popular method is probably usage of an information criterion like AIC or BIC, see also the
introduction of Chen and Khalili (2009) for an overview.
In this manuscript we are not concerned with order selection, but with inference on the
regression coefficients of a mixture model with fixed number of components. We will adapt
general theory for linear hypothesis tests to finite mixture models and demonstrate them on
two important goals of inference procedures:

• Which coefficients are zero such that the corresponding independent variable makes no
significant contribution to the respective component?

• Do the coefficients for a single independent variable have the same value over two or
more components?

Both types of tests have immediate implications for model selection and interpretation. If
coefficients do not significantly differ from zero, one may consider omitting them from the
model. If a coefficient has the same value in two or more components, we can reduce the
number of estimated parameters by fixing the parameter across these components.

2. Simultaneous inference procedures

In this section we present the underlying model assumptions and review some asymptotic
results necessary in the subsequent sections. The concepts presented in this section form the
basis for our new software implementation of simultaneous inference procedures. The set of
n observations is denoted as {(y, x)1 , . . . , (y, x)n )}. The model contains fixed but unknown
regression coefficients B ∈ Rp , where p = K · D.
We are primarily interested in linear functions ϑ := CB of linear combinations of the param-
eter vector B as specified through the constant matrix C ∈ Rc×p . We describe the underlying
model assumptions, the limiting distribution of estimates of our parameters of interest ϑ,
as well as the corresponding test statistics for hypotheses about ϑ and their limiting joint
distribution following Hothorn, Bretz, and Westfall (2008).
Suppose B̂n ∈ Rp is, e.g., the ML estimate of the unknown true B0 and Sn ∈ Rp×p is an
estimate of cov(B̂n ) with
an Sn −→ Σ ∈ Rp×p (2)
P
Austrian Journal of Statistics 57

for some positive, nondecreasing sequence an . Note that convergence rates and asymptotic
properties of ML estimators are non-trivial for finite mixture models, see Zhu and Zhang
(2004), Chen and Li (2009) and references therein. We assume in the following that all
parameters are in the interior of the parameter space, we do not have too many components
and the final estimate has been obtained by direct optimization of the full likelihood (see
Section 3 below). Note that we only propose inference on the regression coefficients, hence
avoiding the more problematic inference for mixture proportions (null hypothesis of πk = 0 on
boundary of parameter space) and variances (unbounded likelihood for decreasing variances).
Furthermore, we assume that a multivariate central limit theorem holds, i.e.,
d
n (B̂n − B0 ) −→ Np (0, Σ).
a1/2 (3)
a
Assuming that (2) and (3) hold for the ML estimate B̂n we get B̂n ∼ Np (B0 , Sn ). Then,
by Theorem 3.3.A in Serfling (1980), the linear function ϑ̂n = CB̂n , i.e., an estimate of our
parameters of interest, also follows an approximate multivariate normal distribution
a
ϑ̂n = CB̂n ∼ Nc (ϑ0 , S⋆n )

with covariance matrix S⋆n := CSn C⊤ for any fixed matrix C ∈ Rc×p . Thus we need not
to distinguish between elemental parameters B or derived parameters ϑ = CB that are of
interest to the researcher. Instead we have (in analogy to (2) and (3))
a
ϑ̂n ∼ Nc (ϑ0 , S⋆n ) (4)

with
an S⋆n −→ Σ⋆ := CΣC⊤ ∈ Rc×c
P

and that the c parameters in ϑ are themselves the parameters of interest to the researcher.
It is assumed that the diagonal elements of the covariance matrix are positive, i.e., Σ⋆jj > 0
for j = 1, . . . , c.
Then, the standardized estimator ϑ̂n is again asymptotically normally distributed
a
Tn := D−1/2
n (ϑ̂n − ϑ0 ) ∼ Nc (0, Rn ), (5)

where Dn = diag(S⋆n ) is the diagonal matrix given by the diagonal elements of S⋆n and

Rn = D−1/2
n S⋆n D−1/2
n ∈ Rc×c

is the correlation matrix of the c-dimensional statistic Tn . To finish note that

Tn = Dn−1/2 (ϑ̂n − ϑ0 )
= (an Dn )−1/2 an1/2 (ϑ̂n − ϑ0 )
d
−→ Nc (0, R)

with correlation matrix R.

We now focus on the derivation of suitable inference procedures. We start considering the
general linear hypothesis (Searle 1971) formulated in terms of our parameters of interest ϑ

H0 : ϑ := CB = m.

Under the conditions of H0 it holds that

a
Tn = Dn−1/2 (ϑ̂n − m) ∼ Nc (0, Rn ).

Note that a small global p-value leading to a rejection of H0 does not give further indication
about the nature of the significant result. Therefore, one is often interested in the individual
null hypotheses
H0j : ϑj = mj .
58 Simultaneous Inference in Finite Mixture Models

Testing the hypotheses set {H01 , . . . , H0c } simultaneously thus requires the individual assess-
ments while maintaining the familywise error rate.
A suitable scalar test statistic for testing the global hypothesis H0 is to consider the maximum
of the individual test statistics T1,n , . . . , Tc,n of the multivariate statistic Tn = (T1,n , . . . , Tc,n ),
leading to a max-t type test statistic max(|Tn |). The distribution of this statistic under the
conditions of H0 can be handled through the c-dimensional distribution

gν (R, t) := P(max(|Tn |) ≤ t)
Zt Zt
∼
= ··· φc (x1 , . . . , xc ; R, ν) dx1 · · · dxc (6)
−t −t

for some t ∈ R, where φc is the density function of either the limiting c-dimensional multivari-
ate normal (with ν = ∞ and the ‘≈’ operator) or the exact multivariate tc (ν, R)-distribution
(with ν < ∞ and the ‘=’ operator). Since R is usually unknown, we plug-in the consistent
estimate Rn . The resulting global p-value (exact or approximate, depending on context) for
H0 is 1 − gν (Rn , max |t|) when T = t has been observed. Efficient methods for approximating
the above multivariate normal and t integrals are described in Genz (1992); Genz and Bretz
(1999); Bretz, Genz, and Hothorn (2001) and Genz and Bretz (2002).
This max-t type test based on the test statistic max(|Tn |) also provides information, which
of the c individual null hypotheses H0j , j = 1, . . . , c is significant. Consider testing the c
null hypotheses H01 , . . . , H0c individually. We require that the familywise error rate, i.e., the
probability of falsely rejecting at least one true null hypothesis, is bounded by the nominal
significance level α ∈ (0, 1). In what follows we use adjusted p-values to describe the decision
rules. Adjusted p-values are defined as the smallest significance level for which one still rejects
an individual hypothesis H0j , given a particular multiple test procedure. In the present context
of single-step tests, the (at least asymptotic) adjusted p-value for the jth individual two-sided
hypothesis H0j : ϑj = mj , j = 1, . . . , c, is given by

pj = 1 − gν (Rn , |tj |),

where t1 , . . . , tc denote the observed test statistics. By construction, we can reject an in-
dividual null hypothesis H0j , j = 1, . . . , c, whenever the associated adjusted p-value is less
than or equal to the pre-specified significance level α, i.e., pj ≤ α. The adjusted p-values are
calculated from expression (6).
Similar results also hold for one-sided testing problems. The adjusted p-values for one-sided
cases are defined analogously, using one-sided multidimensional integrals instead of the two-
sided integrals (6). Again, we refer to Genz (1992); Genz and Bretz (1999); Bretz et al. (2001)
and Genz and Bretz (2002) for the numerical details. As specific examples for the general
procedure we now give explicit definitions for C and m for the two most important groups of
tests.

2.1. Tests for zero coefficients

Using x = (x1 , . . . , xD )⊤ ∈ RD and βk = (βk1 , . . . , βkD )⊤ we get the linear predictor of the
generalized linear regression model in component k
D
ηk = βk⊤ x =
X
βkd xd .
d=1

We are now interested in the c = K · D simultaneous tests

H0 : βkd = 0, H1 : βkd ̸= 0

for k = 1, . . . , K and d = 1, . . . , D. In this case the contrast matrix C is an identity matrix

of dimension c × c and m is a vector of c zeros.
Austrian Journal of Statistics 59

2.2. Tests for constant effects

To test whether coefficients have the same value for two or more components we perform
pairwise tests of equality on the set {β1d , . . . , βKd }. For fixed d we get K · (K − 1)/2 tests of
form
H0 : βkd = βld , H1 : βkd ̸= βld
for k, l = 1, . . . , K and k ̸= l. An equivalent set of hypotheses is of course given by

H0 : βkd − βld = 0, H1 : βkd − βld ̸= 0.

Overall we have D such groups of pairwise tests and hence a total of c = D · K · (K − 1)/2
simultaneous tests. In this case we have again that m is a vector of zeros, and each row of C
contains only zeros with exception of elements kd and ld, which are +1 and −1, respectively.

3. Size and power simulations

In order to empirically validate our theoretical results, we conducted a comprehensive series
of simulation experiments for standard linear models with Gaussian noise and generalized
linear models with Poisson dependent variable. We first describe the general design of both
series of experiments, then discuss some computational aspects, and finish this section with
size and power simulations for tests for zero coefficients and constant effects.

3.1. Simulation design

We consider a finite mixture of standard linear regression models with three components,
intercept x1 ≡ 1, and six standard uniform explanatory variables x2 , . . . , x7 ∼ U (0, 1). With
x = (x1 , x2 , . . . , x7 )⊤ = (1, x2 , . . . , x7 )⊤ we get the model for component k as

y = βk⊤ x + ϵ, ϵ ∼ N (0, 1/4).

The matrix of regression coefficients βkd for the three components is

3 1 1
 

2 2 2
 
 
1 3 3
 
 
[βkd ] = [β1 , β2 , β3 ] =  0 2 4
 
.
 

 0 1 5 

0 0 6
 
 
0 0 0

Note that the first row corresponds to the intercepts, the remaining six rows are the coefficients
for x2 , . . . , x7 . If we stack the three columns of the matrix we get our vector B of all regression
coefficients.
The set βkd was chosen to contain 1/3 zeros (7 entries), available for size evaluation for the test
of zero coefficients. The remaining 2/3 non-zero entries (14 in total) will be used to evaluate
the power of the test. The asymmetry between the two groups (1/3 vs. 2/3) is deliberate to
make room for size and power simulations for the pairwise tests of constant effects, where 3/7
(9 equal pairs) can be used to evaluate size and 4/7 (12 unequal pairs) to evaluate power.
E.g., row one (intercept of the model) contains two unequal pairs (first versus second and
third column) and one equal pair (second and third column). For both types of test we have
more cases to evaluate power because we want to have different levels of disagreement with
the null hypothesis.
In addition to the mixture of standard linear regression models we also use a mixture of
generalized linear models (GLMs) with Poisson distribution for the dependent variable and
log link. The regression coefficients are the same as above, the only difference is that the
60 Simultaneous Inference in Finite Mixture Models

distribution of the explanatory variables x2 , . . . , x7 is now U (−0.5, 0.5). The reason for the
shift is the log-link of the GLM, the sum of coefficients in the third component is 21, standard
uniform xi would result in a possible mean value of e21 for the Poisson dependent variable.
All computations in this paper were done using the statistical computing environment R (R
Core Team 2024) with extension packages flexmix (Grün and Leisch 2025; Leisch 2004; Grün
and Leisch 2008b) and multcomp (Hothorn, Bretz, and Westfall 2025; Hothorn et al. 2008).
Samples from the data generating processes (DGPs) described above can be obtained using
function ExLinear() in package flexmix.

3.2. Speeding up EM simulations

The two biggest disadvantages of the EM algorithm are its slow convergence and that it
may get stuck in a local maximum of the likelihood. Both are problematic for size and power
simulations, where we have to replicate the procedure for hundreds of times. Slow convergence
can be overcome by brute force of computing power, but getting stuck in local minima will
bias simulation results. When fitting the model to a real data set the analyst can manually
check for convergence, restart if the algorithm did not converge, and also have a look at the
differences between several restarts of the algorithm.
In simulations manual inspection is not an option, hence we have to take care that these
problems occur as rarely as possible. To get a valid estimate of the covariance matrix of the
model parameters after convergence only the distribution at convergence is of interest, it has
no influence which path the EM algorithm took to get there. So in theory it should make no
difference if we

1. start the EM algorithm using the true cluster memberships of the observations and then
find the maximum likelihood estimate for a given new data set, or

2. repeatedly start the EM algorithm with a random initialization and keep the estimates
with the best likelihood, assuming at least one run found the global maximum.

Assume B0 is the true parameter vector of the data generating process, and B̂n is the maxi-
mum likelihood estimate (MLE) for a given data set of size n. Then the distributions of B̂n
and the difference
δ(B0 , B̂n ) = ||B0 − B̂n ||
for some appropriate norm || · || is only a function of n and the data generating process, not
of the particular method used to obtain the MLE (assuming the method is capable of finding
it).
Starting the EM algorithm in the true solution has several computational advantages:

1. usually only a few iterations are needed to get from B0 to the MLE B̂n ,

2. with very high probability we cannot get stuck in a local optimum, hence only one EM
replication is needed, and

3. the components of B̂n have the same order as in B0 , no relabeling is needed.

Despite the theoretical arguments given above we did an extensive simulation study to em-
pirically confirm the arguments for our DGP to be on the safe side. We sample 1000 data
sets from the process described above and fit mixture models to obtain regression coefficient
estimates

B̂1 : start the EM algorithm with 20 random initializations, run each until convergence, keep
solution with best likelihood and relabel components by order of coefficients βk4 .

B̂2 : initialize EM algorithm with true cluster memberships and run until convergence.
Austrian Journal of Statistics 61

2.5

2.0
δ2

1.5

1.0

0.5

2 4 6 8

δ1

Figure 1: Scatterplot of δ1 versus δ2 : in approx. 85% of all replications estimates B̂1 and B̂2
coincide

We then compute the Euclidean distance between the solutions found and the true matrix
B0 :

δ1 = δ(B̂1 , B0 ) = ||B̂1 − B0 ||
δ2 = δ(B̂2 , B0 ) = ||B̂2 − B0 ||

Figure 1 shows a scatter plot of δ1 versus δ2 for the 1000 replications. The vast majority of
points (approx. 85%) lie on a straight line passing through the origin with slope 1. Points with
large δ1 correspond to runs where 20 runs of EM were not enough to find the true solution.
Figure 2 shows the logarithm of δ1 (left panel) and δ2 (right panel) versus the replication
number. A log scale was chosen to get a clear view of the majority of points and reduce the
influence of outliers. It can clearly be seen that the variation within the horizontal bands at
the bottom of both panels is identical, the only structural difference is the presence of 15%
outliers in the left panel.
Summarizing both figures we conclude that starting the EM algorithm in the true solution
preserves the variance of the MLE (the only quantity we are interested in this paper), but is
a very effective measure against getting stuck in local maxima of the likelihood. A nice side
effect is that this speeds up simulations on artificial data considerably.

3.3. Tests for zero coefficients

To test size and power of our simultaneous inference procedures we draw 10000 data sets of
size n = 300 and n = 600 from the DGPs described above (with 100 or 200 observations in
each of the three mixture components). The EM algorithm was initialized using true cluster
memberships, then the covariance matrix of all model parameters was numerically estimated
from the full likelihood of the model.
The model contains 21 regression coefficients (vector B). Which of these are not significantly
different from zero was tested using the simultaneous inference procedure. For comparison
62 Simultaneous Inference in Finite Mixture Models

0 200 400 600 800 1000

random start start with true clusters

10^1.0

10^0.5
log10(δi)

10^0.0

0 200 400 600 800 1000

replication

Figure 2: Scatterplot of δ1 and δ2 versus replication number

Table 1: Empirical size of tests for zero coefficients for a significance level of α = 0.05

Normal Poisson
n = 300 n = 600 n = 300 n = 600
raw p-values 0.5029 0.3683 0.3292 0.3205
Holm-adj. 0.1804 0.0832 0.0709 0.0581
mult. comp. 0.1162 0.0384 0.0319 0.0230

Table 2: Empirical power of p-values from multiple comparisons procedure for zero coefficients

Normal Poisson
n = 300 n = 600 n = 300 n = 600
βkd = 1 0.8162 0.9761 0.9888 1.0000
βkd > 1 0.9997 1.0000 0.9999 1.0000
found all 0.4165 0.9058 0.9548 0.9998

Table 3: Empirical power of Holm-adjusted p-values for zero coefficients

Normal Poisson
n = 300 n = 600 n = 300 n = 600
βkd = 1 0.8622 0.9854 0.9926 1.0000
βkd > 1 0.9998 1.0000 1.0000 1.0000
found all 0.5358 0.9423 0.9701 1.0000

we also performed separate t-tests which parameters are zero, and adjusted the resulting 21
p-values for multiple testing using Holm’s method (Holm 1979).
Table 1 shows the empirical size of the tests for a significance value of α = 0.05. Seven of
the 21 regression coefficients are zero, the table shows the percentage of replications were
at least one of the seven p-values was smaller than 0.05. The first row shows (as reference
only) unadjusted p-values of separate tests, these are not corrected for multiple testing and
Austrian Journal of Statistics 63

Table 4: Empirical size and power of multiple comparisons procedure for constant effects

Normal Poisson
n = 300 n = 600 n = 300 n = 600
Size 0.1189 0.0457 0.0449 0.0271
Power 0.4470 0.8185 0.9397 0.9995

hence the size is much larger than 0.05 in all cases. Both the Holm-adjusted p-values and
our simultaneous inference procedure have size problems for n = 300 and normal response.
Note that we estimate approximately 240 parameters (regression coefficients, component pro-
portions, covariance matrix of all parameters), using n = 300 was deliberately chosen as a
borderline case. In all other cases (normal n = 600, Poisson n = 300 and n = 600) the
simultaneous inference procedure is slightly conservative and has empirical size of 0.02–0.04.
The Holm-adjusted p-values are above the nominal size in all cases.
Table 2 shows the power of the simultaneous inference procedure for tests of zero coefficients
at a significance level of α = 0.05. The first row shows how often βkd with a value of
one (two times the standard deviation of the noise) were found significant, the second row
shows that βkd that are larger than one were always significant. The last row shows the
number of replications were all significant parameters were correctly identified. Except for
the problematic case with normal response and n = 300 power is generally very good. Table 3
shows the power of Holm-adjusted p-values, which is slightly better than the simultaneous
inference procedure. This was to be expected because the size of the Holm-adjusted p-values
is larger, it tends to reject the null hypothesis of βkd = 0 too often even if it is true.

3.4. Tests for constant effects

Next we use the simultaneous inference procedure to test for constant effects. Our simulation
design contains nine pairs of parameters which are identical for different components, e.g.,
β21 = β31 = 1. These nine pairs can be used to empirically analyze the size of a test for
constant effects, the remaining 12 pairs can be used to analyze power.
Table 4 shows the empirical size and power of our simultaneous tests for constant effects.
The first row shows the size, i.e., in how many replications was at least one pair of identical
coefficients βkd = βld wrongly identified to have a significant difference. The second row gives
the power, i.e., in how many replications all unequal pairs βkd ̸= βld were correctly identified.
Results are similar to the tests for zero coefficients: except for the problematic case with
normal response and n = 300 the tests are close to nominal size or slightly conservative, and
power is very good.

4. Example: PhD students

In this example we use a sample of 915 biochemistry graduate students from Long (1990)
with the following six variables:

art: integer, count of articles produced by the student during last 3 years of Ph.D.
female: nominal, gender of student (Male or Female)
married: nominal, marital status of student (Single or Married)
kid5: integer, number of children aged 5 or younger
phd: metric, prestige of Ph.D. department
ment: integer, count of articles produced by Ph.D. mentor during last 3 years

The goal is to predict the number of articles produced by the students from the other variables.
The dependent variable is a count, hence we use a Poisson-GLM. Finite mixtures have become
64 Simultaneous Inference in Finite Mixture Models

Table 5: Summary statistics for the predicted number of papers per student in the two-
component model

Comp. 1 Comp. 2
Min. 0.48 1.68
1st Qu. 0.79 2.62
Median 0.95 3.14
Mean 1.04 3.59
3rd Qu. 1.15 3.84
Max. 5.16 32.93

Table 6: Test for significance of regression coefficients using standard asymptotic theory
for the two-component model. Estimate, standard error and unadjusted p-value for each
component.

β
c1 SE(β
c1 ) p-value β
c2 SE(β
c2 ) p-value
(Intercept) −0.509 0.222 0.022 1.184 0.211 0.000
female −0.140 0.102 0.170 −0.302 0.098 0.002
married 0.251 0.115 0.028 0.090 0.110 0.415
kid5 −0.212 0.074 0.004 −0.181 0.072 0.012
phd 0.092 0.050 0.067 −0.026 0.049 0.590
ment 0.025 0.003 0.000 0.032 0.004 0.000

a popular tool especially for Poisson-GLMs, because they can be used to account for zero-
inflation or over-dispersion (e.g., Wang et al. 1996).
Fitting a two-component model to the data gives an AIC of 3148 and a BIC of 3211, for a
three-component model we get an AIC of 3142 and a BIC of 3238. Hence, the AIC favors
three components, the BIC two components, and we will have a look at both below. Models
with more components are discarded by both information criteria.
First we have a look at the model with two components. Classifying students based on the
a-posteriori probabilities, the first component contains 746 students, the second 169. For each
component, the predicted number of papers per student is summarized in Table 5. Note that
the mean number of publications in component 2 is three times higher than in component 1.
A test for significance of regression coefficients using standard asymptotic theory is shown in
Table 6. Not surprisingly the number of papers per student is positively correlated with the
number of papers by the respective mentor with very high significance in both components,
similarly a negative correlation with the number of kids of age five and younger. Gender has
no significant effect in component 1 (low number of publications), but has a significant effect
in component 2, where women belonging to the component are seemingly less productive
than men. Based on the classifications to components using the a-posteriori probabilities,
the proportion of men and women in both components is very similar, with 47% females in
component 1 and 41% females in component 2.
Correcting the tests for multiple testing shows that the significance of gender in component 2
is only borderline, see Table 7. There is also no significant difference between the coefficient
for gender in components 1 and 2, see Table 8. Fitting a model with a constant effect for
gender across both components gives no significant gender effect (details omitted for brevity).
The reason for the slightly significant gender effect in component 2 above can clearly be seen
in the thee-component model with component sizes 314 (152 female), 588 (269 female), and
13 (0 female) based on the classifications obtained using the a-posteriori probabilities. For
each component, the predicted number of papers per student is summarized in Table 9.
Austrian Journal of Statistics 65

Table 7: Test for significance of regression coefficients using the simultaneous inference
procedure for the two-component model

β
c1 SE(β
c1 ) p-value β
c2 SE(β
c2 ) p-value
(Intercept) −0.509 0.222 0.209 1.184 0.211 0.000
female −0.140 0.102 0.848 −0.302 0.098 0.023
married 0.251 0.115 0.264 0.090 0.110 0.995
kid5 −0.212 0.074 0.046 −0.181 0.072 0.120
phd 0.092 0.050 0.514 −0.026 0.049 1.000
ment 0.025 0.003 0.000 0.032 0.004 0.000

Table 8: Test for significant differences of regression coefficients for the two-component model

Contrast SE p-value
C2.(Intercept)−C1.(Intercept) = 0 1.693 0.274 0.000
C2.female−C1.female = 0 −0.162 0.142 0.769
C2.married−C1.married = 0 −0.161 0.158 0.840
C2.kid5−C1.kid5 = 0 0.031 0.103 0.999
C2.phd−C1.phd = 0 −0.118 0.070 0.388
C2.ment−C1.ment = 0 0.006 0.005 0.708

Table 9: Summary statistics for the predicted number of papers per student in the three-
component model

Comp. 1 Comp. 2 Comp. 3

Min. 0.23 0.64 0.21
1st Qu. 0.51 0.88 0.45
Median 0.80 1.76 3.42
Mean 1.39 1.73 3.70
3rd Qu. 2.08 2.12 6.28
Max. 12.43 12.06 55.19

The model splits the students into three groups of low, medium and very high productivity
(although the intercepts in components 1 and 2 are not significantly different). The last group
is very small and consists of 13 outliers which happen to be all male. For the big mass of
more than 900 remaining students we observe no gender effect, see Table 10. Note that we
can estimate gender effects for component 3: Although no women are in this component with
higher probability than in the first two, they still have positive probabilities to be there and
hence contribute to the mixture likelihood.
What we do observe is a marriage effect. Components 1 and 2 both contain approximately
twice as many married students than singles. The main difference between the components
is that marriage has a negative impact on the number of publications in component 1, while
it is positive in component 2. Besides removing the zero coefficients the three-component
model could be further reduced by estimating a constant coefficient for the mentor, because
the three coeffiecents are all approximately 0.026 and do not differ significantly, see Table 11.

5. Conclusions
We have presented a general framework for simultaneous inference in finite mixtures of regres-
sion models. The asymptotic normality of the maximum likelihood estimate of all interesting
66 Simultaneous Inference in Finite Mixture Models

Table 10: Test for significance of regression coefficients using the simultaneous inference
procedure for the three-component model

β
c1 SE(β
c1 ) p-value β
c2 SE(β
c2 ) p-value β
c3 SE(β
c3 ) p-value
(Intercept) 0.071 0.341 1.000 −0.373 0.223 0.762 2.166 0.536 0.001
female −0.056 0.165 1.000 −0.021 0.109 1.000 −2.486 1.272 0.536
married −1.261 0.317 0.001 1.037 0.235 0.000 0.322 0.385 0.999
kid5 −0.241 0.179 0.937 −0.162 0.062 0.130 0.011 0.256 1.000
phd 0.215 0.084 0.154 −0.011 0.044 1.000 −0.285 0.144 0.510
ment 0.027 0.006 0.000 0.026 0.004 0.000 0.026 0.009 0.047

Table 11: Test for significant differences of regression coefficients for the three-component
model
Contrast SE p-value
C2.(Intercept)−C1.(Intercept) = 0 −0.444 0.452 0.983
C3.(Intercept)−C1.(Intercept) = 0 2.095 0.635 0.014
C3.(Intercept)−C2.(Intercept) = 0 2.539 0.591 0.000
C2.female−C1.female = 0 0.035 0.185 1.000
C3.female−C1.female = 0 −2.430 1.312 0.533
C3.female−C2.female = 0 −2.465 1.293 0.492
C2.married−C1.married = 0 2.298 0.283 0.000
C3.married−C1.married = 0 1.583 0.526 0.036
C3.married−C2.married = 0 −0.715 0.461 0.758
C2.kid5−C1.kid5 = 0 0.079 0.200 1.000
C3.kid5−C1.kid5 = 0 0.252 0.314 0.996
C3.kid5−C2.kid5 = 0 0.173 0.252 0.999
C2.phd−C1.phd = 0 −0.227 0.100 0.260
C3.phd−C1.phd = 0 −0.501 0.161 0.027
C3.phd−C2.phd = 0 −0.274 0.154 0.588
C2.ment−C1.ment = 0 −0.001 0.007 1.000
C3.ment−C1.ment = 0 −0.000 0.011 1.000
C3.ment−C2.ment = 0 0.000 0.010 1.000

model parameters is assumed to derive confidence regions and p-values using a maximum
norm for the multivariate t-statistic. The new methods are much closer to nominal signif-
icance levels in size simulations than classical p-values adjusted for multiple testing while
losing almost no power. An example shows that interpretation of model parameters can be
quite different when comparing the two approaches. The most important advantage of the
new method is that the (possibly very high) correlations between different model parameters
are correctly accounted for.

References

Bretz F, Genz A, Hothorn LA (2001). “On the Numerical Availability of Multiple Comparison
Procedures.” Biometrical Journal, 43(5), 645–656. doi:10.1002/1521-4036(200109)43:
5<645::AID-BIMJ645>3.0.CO;2-F.

Chen J, Khalili A (2009). “Order Selection in Finite Mixture Models with a Nonsmooth
Penalty.” Journal of the American Statistical Association, 104(485), 187–196. doi:10.
1198/jasa.2009.0103.
Austrian Journal of Statistics 67

Chen J, Li P (2009). “Hypothesis Test for Normal Mixture Models: The EM Approach.” The
Annals of Statistics, 37(5a), 2523–2542. doi:10.1214/08-aos651.

Dempster AP, Laird NM, Rubin DB (1977). “Maximum Likelihood from Incomplete Data
via the EM-Alogrithm.” Journal of the Royal Statistical Society: Series B (Methodological),
39(1), 1–38. doi:10.1111/j.2517-6161.1977.tb01600.x.

Frühwirth-Schnatter S (2006). Finite Mixture and Markov Switching Models. Springer-Verlag,

New York, NY, U.S.A. doi:10.1007/978-0-387-35768-3.

Genz A (1992). “Numerical Computation of Multivariate Normal Probabilities.” Journal

of Computational and Graphical Statistics, 1(2), 141–149. doi:10.1080/10618600.1992.
10477010.

Genz A, Bretz F (1999). “Numerical Computation of Multivariate t-probabilities with Ap-

plication to Power Calculation of Multiple Contrasts.” Journal of Statistical Computation
and Simulation, 63(4), 103–117. doi:10.1080/00949659908811962.

Genz A, Bretz F (2002). “Methods for the Computation of Multivariate t-probabilities.”

Journal of Computational and Graphical Statistics, 11(4), 950–971. doi:10.1198/
106186002394.

Grün B, Leisch F (2008a). “Finite Mixtures of Generalized Linear Regression Models.” In

Shalabh, C Heumann (eds.), Recent Advances in Linear Models and Related Areas. Physica
Verlag, Heidelberg, Deutschland. doi:10.1007/978-3-7908-2064-5_11.

Grün B, Leisch F (2008b). “FlexMix Version 2: Finite Mixtures with Concomitant Variables
and Varying and Constant Parameters.” Journal of Statistical Software, 28(4), 1–35. doi:
10.18637/jss.v028.i04.

Grün B, Leisch F (2025). flexmix: Flexible Mixture Modeling. doi:10.32614/CRAN.package.

flexmix. R package version 2.3-20.

Holm S (1979). “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian

Journal of Statistics, 6(2), 65–70.

Hothorn T (2025). “Did We Practice What We Preached?” Austrian Journal of Statistics.

Hothorn T, Bretz F, Westfall P (2008). “Simultaneous Inference in General Parametric

Models.” Biometrical Journal, 50(3), 346–363. doi:10.1002/bimj.200810425.

Hothorn T, Bretz F, Westfall P (2025). multcomp: Simultaneous Inference in General Para-

metric Models. doi:10.32614/CRAN.package.multcomp. R package version 1.4-28.

Leisch F (2004). “FlexMix: A General Framework for Finite Mixture Models and Latent
Class Regression in R.” Journal of Statistical Software, 11(8), 1–18. doi:10.18637/jss.
v011.i08.

Leisch F, Hothorn T (2025). “Reproducibility Material for “Simultaneous Inference in Finite

Mixtures of Regression Models”.” doi:10.5281/zenodo.14950545.

Lenk PJ, DeSarbo WS (2000). “Bayesian Inference for Finite Mixtures of Generalized Linear
Models with Random Effects.” Psychometrika, 65(1), 93–119. doi:10.1007/bf02294188.

Long JS (1990). “The Origins of Sex Differences in Science.” Social Forces, 68(4), 1297–1315.
doi:10.2307/2579146.

R Core Team (2024). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
68 Simultaneous Inference in Finite Mixture Models

Searle SR (1971). Linear Models. John Wiley & Sons, New York.

Serfling RJ (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons,
New York. doi:10.1002/9780470316481.

Wang P, Puterman ML, Cockburn IM, Le ND (1996). “Mixed Poisson Regression Models
with Covariate Dependent Rates.” Biometrics, 52(2), 381–400. doi:10.2307/2532881.

Wedel M, DeSarbo WS (1995). “A Mixture Likelihood Approach for Generalized Linear

Models.” Journal of Classification, 12(1), 21–55. doi:10.1007/bf01202266.

Zhu HT, Zhang H (2004). “Hypothesis Testing in Mixture Regression Models.” Journal of
the Royal Statistical Society: Series B (Methodological), 66(1), 3–16. doi:10.1046/j.
1369-7412.2003.05379.x.
Austrian Journal of Statistics 69

Reproducibility of simulation results

The main text contains simulation results computed at the time of the initial submission in
2010. The tables below report differences to simulation results obtained from re-running the
simulation code in fall 2024. Old results are striked out and new results are printed sans-serif.
The code required to reproduce the numerical results is available from Leisch and Hothorn
(2025).

Table 12: Empirical size of tests for zero coefficients for a significance level of α = 0.05;
updated Table 1

Normal Poisson
n = 300 n = 600 n = 300 n = 600
raw p-values 0.5029 0.5052 0.3683 0.3726 0.3292 0.3326 0.3205 0.3123
Holm-adj. 0.1804 0.1832 0.0832 0.0853 0.0709 0.0699 0.0581 0.0531
mult. comp. 0.1162 0.1163 0.0384 0.0383 0.0319 0.0344 0.02300.0208

Table 13: Empirical power of p-values from multiple comparisons procedure for zero coeffi-
cients; updated Table 2

Normal Poisson
n = 300 n = 600 n = 300 n = 600
βkd = 1 0.8162 0.8130 0.9761 0.9771 0.9888 0.9877 1.00000.9998
βkd > 1 0.9997 1.0000 0.9999 1.0000 1.0000
found all 0.4165 0.4095 0.9058 0.9094 0.9548 0.9505 0.99980.9994

Table 14: Empirical power of Holm-adjusted p-values for zero coefficients; updated Table 3

Normal Poisson
n = 300 n = 600 n = 300 n = 600
βkd = 1 0.8622 0.8626 0.9854 0.9860 0.9926 0.9918 1.0000
βkd > 1 0.9998 1.0000 1.0000 1.0000
found all 0.5358 0.5372 0.9423 0.9447 0.9701 0.9672 1.00000.9998

Table 15: Empirical size and power of multiple comparisons procedure for constant effects;
updated Table 4

Normal Poisson
n = 300 n = 600 n = 300 n = 600
Size 0.1189 0.1179 0.0457 0.0452 0.0449 0.0458 0.02710.0290
Power 0.4470 0.4551 0.8185 0.8167 0.9397 0.9375 0.99950.9998

The updated simulation results were obtained in the computing environment listed below.

R version 4.4.2 (2024-10-31)

Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default

70 Simultaneous Inference in Finite Mixture Models

BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Berlin

tzcode source: system (glibc)

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] multcomp_1.4-28 TH.data_1.1-3 MASS_7.3-61 survival_3.8-3
[5] mvtnorm_1.3-3 flexmix_2.3-19 lattice_0.22-6

loaded via a namespace (and not attached):

[1] codetools_0.2-20 Matrix_1.7-2 nnet_7.3-19 splines_4.4.2
[5] modeltools_0.2-23 zoo_1.8-12 stats4_4.4.2 grid_4.4.2
[9] sandwich_3.1-1 compiler_4.4.2 tools_4.4.2

Affiliation:
Friedrich Leisch, Torsten Hothorn
Institut für Statistik
Ludwig-Maximilians-Universität München
Ludwigstraße 33, 80539 München, Deutschland

Austrian Journal of Statistics http://www.ajs.or.at/

published by the Austrian Society of Statistics http://www.osg.or.at/
Volume 54 Submitted: 2024-10-15
2025 Accepted: 2025-03-13

Linear Regression
83% (6)
Linear Regression
499 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
VarSelThesis PDF
No ratings yet
VarSelThesis PDF
106 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
Estimando Una Funcion de Distribucion Con Datos Truncados
No ratings yet
Estimando Una Funcion de Distribucion Con Datos Truncados
16 pages
Fitting Mixtures of Linear Regressions
No ratings yet
Fitting Mixtures of Linear Regressions
26 pages
Project Report
No ratings yet
Project Report
56 pages
Week 11
No ratings yet
Week 11
11 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
Week 2 DrBuddhananda Banerjee Vector RV
No ratings yet
Week 2 DrBuddhananda Banerjee Vector RV
10 pages
Reading 5 A
No ratings yet
Reading 5 A
10 pages
1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
No ratings yet
1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
12 pages
Generalized Linear Models With 1-Bit Measurements: Asymptotics of The Maximum Likelihood Estimator
No ratings yet
Generalized Linear Models With 1-Bit Measurements: Asymptotics of The Maximum Likelihood Estimator
12 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Problem Set 8
0% (1)
Problem Set 8
3 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
3 Models For A Mixture Setting 2005
No ratings yet
3 Models For A Mixture Setting 2005
17 pages
Ecology - Sampling Techniques
100% (1)
Ecology - Sampling Techniques
25 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
8 pages
Pengantar Analisis Real I
No ratings yet
Pengantar Analisis Real I
177 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
GMM Estimation PDF
No ratings yet
GMM Estimation PDF
35 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Composites As Factors
No ratings yet
Composites As Factors
11 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
7 pages
Least Squares PDF
No ratings yet
Least Squares PDF
192 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Robust Bayesian Model Selection For Heavy-Tailed Linear Regression Using Finite Mixtures
No ratings yet
Robust Bayesian Model Selection For Heavy-Tailed Linear Regression Using Finite Mixtures
24 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
The Least Squer Method
No ratings yet
The Least Squer Method
192 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Regression On A Cylinder: A Project Submitted To The Faculty of The Graduate School of The University of Minnesota BY
No ratings yet
Regression On A Cylinder: A Project Submitted To The Faculty of The Graduate School of The University of Minnesota BY
42 pages
Statistical Inference in Nonlinear Sure Model
No ratings yet
Statistical Inference in Nonlinear Sure Model
7 pages
Language Change: The Tree Model and Wave Model - Chanelle Katsidzira
100% (1)
Language Change: The Tree Model and Wave Model - Chanelle Katsidzira
16 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Y ⊗ I − X β k WBΓ
No ratings yet
Y ⊗ I − X β k WBΓ
2 pages
Regression 101
No ratings yet
Regression 101
18 pages
Australian/New Zealand Standard
No ratings yet
Australian/New Zealand Standard
36 pages
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Box 1965
No ratings yet
Box 1965
12 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Cheming e
No ratings yet
Cheming e
61 pages
Linear Regression Analysis: Module - Ii
No ratings yet
Linear Regression Analysis: Module - Ii
11 pages
Statistics Formula Tables
No ratings yet
Statistics Formula Tables
8 pages
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
No ratings yet
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
7 pages
Eco No Metrics
No ratings yet
Eco No Metrics
312 pages
SCULPFUN S30 Series User Manual
100% (1)
SCULPFUN S30 Series User Manual
72 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Quiz About Moon
No ratings yet
Quiz About Moon
1 page
04 Validation of Reportable Range of Hematology Instrumentation
No ratings yet
04 Validation of Reportable Range of Hematology Instrumentation
5 pages
Lambertian Diffuse Reflectance
No ratings yet
Lambertian Diffuse Reflectance
50 pages
Myp 3 2023-24
No ratings yet
Myp 3 2023-24
10 pages
Film Studies Coursework Reflective Analysis
100% (2)
Film Studies Coursework Reflective Analysis
8 pages
Exercise 1 Orthographic Projection Isometric Views UNiM
No ratings yet
Exercise 1 Orthographic Projection Isometric Views UNiM
6 pages
Term Paper On Overpopulation in The Philippines
100% (1)
Term Paper On Overpopulation in The Philippines
8 pages
Axioms and Postulate. - 20250126 - 191423 - 0000
No ratings yet
Axioms and Postulate. - 20250126 - 191423 - 0000
31 pages
Ebook Moursund Games
No ratings yet
Ebook Moursund Games
157 pages
5 Minute - Italian InternetActivities
No ratings yet
5 Minute - Italian InternetActivities
4 pages
Mantilla 2015
No ratings yet
Mantilla 2015
9 pages
Byjus
No ratings yet
Byjus
8 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
36 pages
GSM Paper
No ratings yet
GSM Paper
29 pages
Effect of Solid Waste Management On Socio-Economic Development of Urban Area: A Case of Kicukiro District
No ratings yet
Effect of Solid Waste Management On Socio-Economic Development of Urban Area: A Case of Kicukiro District
13 pages
ECCC - Qifeng Shu - PPT - 123
No ratings yet
ECCC - Qifeng Shu - PPT - 123
15 pages
Data Responden
No ratings yet
Data Responden
4 pages
Nephelometry
No ratings yet
Nephelometry
22 pages
Human Area Network
No ratings yet
Human Area Network
9 pages
PHYS 369 - Final Exam Formula Sheet
No ratings yet
PHYS 369 - Final Exam Formula Sheet
2 pages
WATER Draft
No ratings yet
WATER Draft
6 pages
November 2018 (v2) QP - Paper 4 CIE Maths IGCSE
No ratings yet
November 2018 (v2) QP - Paper 4 CIE Maths IGCSE
2 pages
Psy 202 Experiment Review
No ratings yet
Psy 202 Experiment Review
4 pages
Gema Uv PP 2630 - TDS
No ratings yet
Gema Uv PP 2630 - TDS
1 page
Assignment 3 - Reflective Writing Based On Sharing Session With Industry
No ratings yet
Assignment 3 - Reflective Writing Based On Sharing Session With Industry
4 pages
AIW Unit Plan - Ind. Tech Example
No ratings yet
AIW Unit Plan - Ind. Tech Example
4 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

2169 Edited

Uploaded by

2169 Edited

Uploaded by

Austrian Journal of Statistics

2025, Volume 54, 55–70.

Simultaneous Inference in Finite Mixtures of

Friedrich Leisch Torsten Hothorn

where y is a dependent variable with conditional density h, x ∈ RD is a D-dimensional

2. Simultaneous inference procedures

is the correlation matrix of the c-dimensional statistic Tn . To finish note that

with correlation matrix R.

Under the conditions of H0 it holds that

pj = 1 − gν (Rn , |tj |),

2.1. Tests for zero coefficients

We are now interested in the c = K · D simultaneous tests

for k = 1, . . . , K and d = 1, . . . , D. In this case the contrast matrix C is an identity matrix

2.2. Tests for constant effects

H0 : βkd − βld = 0, H1 : βkd − βld ̸= 0.

3. Size and power simulations

3.1. Simulation design

y = βk⊤ x + ϵ, ϵ ∼ N (0, 1/4).

The matrix of regression coefficients βkd for the three components is

3.2. Speeding up EM simulations

3. the components of B̂n have the same order as in B0 , no relabeling is needed.

3.3. Tests for zero coefficients

0 200 400 600 800 1000

random start start with true clusters

0 200 400 600 800 1000

Figure 2: Scatterplot of δ1 and δ2 versus replication number

Table 3: Empirical power of Holm-adjusted p-values for zero coefficients

3.4. Tests for constant effects

4. Example: PhD students

Comp. 1 Comp. 2 Comp. 3

Frühwirth-Schnatter S (2006). Finite Mixture and Markov Switching Models. Springer-Verlag,

Genz A (1992). “Numerical Computation of Multivariate Normal Probabilities.” Journal

Genz A, Bretz F (1999). “Numerical Computation of Multivariate t-probabilities with Ap-

Genz A, Bretz F (2002). “Methods for the Computation of Multivariate t-probabilities.”

Grün B, Leisch F (2008a). “Finite Mixtures of Generalized Linear Regression Models.” In

Grün B, Leisch F (2025). flexmix: Flexible Mixture Modeling. doi:10.32614/CRAN.package.

Holm S (1979). “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian

Hothorn T (2025). “Did We Practice What We Preached?” Austrian Journal of Statistics.

Hothorn T, Bretz F, Westfall P (2008). “Simultaneous Inference in General Parametric

Hothorn T, Bretz F, Westfall P (2025). multcomp: Simultaneous Inference in General Para-

Leisch F, Hothorn T (2025). “Reproducibility Material for “Simultaneous Inference in Finite

Wedel M, DeSarbo WS (1995). “A Mixture Likelihood Approach for Generalized Linear

Reproducibility of simulation results

R version 4.4.2 (2024-10-31)

Matrix products: default

time zone: Europe/Berlin

attached base packages:

other attached packages:

loaded via a namespace (and not attached):

Austrian Journal of Statistics http://www.ajs.or.at/

You might also like