0% found this document useful (0 votes)
103 views57 pages

Latent Clustering W Mplus v2

Uploaded by

Arie Rahadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views57 pages

Latent Clustering W Mplus v2

Uploaded by

Arie Rahadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Mixture models in the social,

behavioral, and education sciences:


Classification applications using Mplus

James A. Bovaird, PhD


Associate Professor of Educational Psychology
Courtesy Associate Professor of Survey Research & Methodology
Program Director, Quantitative, Qualitative & Psychometric Methods Program
Director, Nebraska Academy for Methodology, Analytics & Psychometrics
Faculty Fellow, University of Nebraska Public Policy Center
Statistical Classification
• Fundamental premise:
– Systematic intra-sample heterogeneity exists
– But information necessary to identify such heterogeneity has not been explicitly
measured
• Traditional distance-based methods of classification:
– Connectivity-based (i.e. hierarchical) clustering
– Centroid-based (i.e. k-means), or partitional, clustering
• Model-based classification:
– Finite mixture models
– Treats the unmeasured group information as a latent variable
• Applications:
– Latent profile analysis (LPA)
– Latent class analysis (LCA)
– Latent growth mixture models (LGMM)
– Latent Markov models (LMM)
– Latent transition analysis (LTA)
Estimator vs. Inferentiator
• Classification methods are a set of mathematical
algorithms
• The results of these algorithms can be interpreted as
evidence implying or not implying the presence of
multiple groups
• They are estimators, not inferentiators
• Do not confuse computer output with the truth, or even
with the best result
• As Rindskopf (2003) writes, “researchers may not know
what is right but only what model is most helpful in
achieving other scientific goals” (p. 367).
Prior Theory
• Rindskopf (2003), arguing that theory can guide class
extraction, writes that “no statistical theory will help;
it is subject-matter theory that must be used” (p. 366).
• Cudeck and Henly (2003) agree: “If latent classes are
being studied, no method can ever conclusively
demonstrate how many subpopulations exist nor
which individuals belong to which group” (p. 378).
• But: “[T]his approach reverses the normal
hypothetico-deductive process of science” (Bauer &
Curran, 2003, p. 358).
“Traditional” Cluster Analysis
• Cluster Analysis (CA) is the name given to a diverse collection of techniques that can be
used to classify objects
– The classification has the effect of reducing the dimensionality of a data table by reducing
the number of rows (cases).
– Think of it as “factor analyzing” persons instead of variables.

• Purpose: the classification of cases into different groups called clusters (or classes) so
that cases within a cluster are more similar to each other than they are to cases in
other clusters.
– The data set is partitioned into subsets (clusters), so that the data in each subset (ideally)
share some common trait
– often proximity according to some defined distance measure.

• The underlying mathematics of most of these methods are relatively simple but large
numbers of calculations are needed which can put a heavy demand on the computer.

• Classification depends on the method used.


– Similarity and dissimilarity can be measured multiple ways
– No single correct classification
• Attempts to define 'optimal' classifications
Cluster Analysis Terminology
• Hierarchical
– resembles a phylogenetic classification
– Like exploratory non-iterative EFA
• Non-hierarchical
– Like iterative EFA where factors = k

• Divisive
– Begins with all cases in one cluster. This cluster is gradually broken down into smaller
and smaller clusters.
• Agglomerative
– Start with (usually) single member clusters. These are gradually fused until one large
cluster is formed.

• Monothetic scheme
– cluster membership is based on a single characteristic
• Polythetic scheme
– use more than one characteristic (variables)
Types of Traditional Clustering
• Hierarchical, or connectivity-
based, algorithms:
– find successive clusters using
previously established clusters
– Agglomerative (“bottom-up”)
algorithms begin with each
element as a separate cluster
and merge them into
successively larger clusters
– Divisive (“top-down”) algorithms
begin with the whole set and
proceed to divide it into
successively smaller clusters.

• Partitional, or centroid-based,
algorithms:
– determine all clusters at once
Distance Measures
• Determines how the similarity of
two elements is calculated.
– Influences the shape and size of the
clusters
– some elements may be close to one
another according to one distance and
further away according to another.
• Common distance functions:
– Euclidean (i.e. “as the crow flies”):
– Squared Euclidean
– Manhattan (also called “city block”)
– Mahalanobis
– Chebychev
• Alternatives to “distance”
– Semantic relatedness
• “Distance” based on databases and search
engines, learned from analysis of a corpus
City Block distance
Clustering Algorithms
• Complete linkage: the maximum distance between elements of each
cluster
• Single linkage: the minimum distance between elements of each cluster
• Average linkage: the mean distance between elements of each cluster
• Sum of all intra-cluster variance
• Ward’s criterion: the increase in variance for the cluster being merged

• Each agglomeration occurs at a greater distance between clusters than the


previous agglomeration
– Stop rules: Distance criterion (clusters are too far apart to be merged)
vs. Number criterion (sufficiently small number of clusters)
Algorithm & Distance Metric Matters
Nearest neighbor, squared Euclidean distance Nearest neighbor, cosine distance
unstandardized variables standardized variables

Nearest neighbor, squared Euclidean distance Furthest neighbor, squared Euclidean distance
standardized variables standardized variables
Choosing the Number of Clusters
Agglomeration Schedule

• Common guideline to determine what Cluster Combined


Stage Cluster First
Appears

number of clusters should be chosen Stage


1
Cluster 1
7
Cluster 2
8
Coefficients
68.980
Cluster 1
0
Cluster 2
0
Next Stage
6
– Similar to using a “scree” plot in EFA 2 6 11 75.460 0 0 5
3 1 12 90.440 0 0 4
• Choose a number of clusters so that 4 1 4 152.690 3 0 7

adding another cluster doesn't add any


5 2 6 240.740 0 2 7
6 7 10 262.340 1 0 9

new meaningful information


7 1 2 315.490 4 5 8
8 1 3 379.250 7 0 9

– The percentage of variance explained by the


9
10
1
1
7
5
450.490
486.360
8
9
6
0
10
11
clusters (Y-axis) against the number of clusters 11 1 9 1722.270 10 0 0

(X-axis)
– The distance between the clusters (y-axis) 2000

against the stage when the cluster was created 1800

(x-axis) 1600

1400

1200

1000

800

600

400

200

0
1 2 3 4 5 6 7 8 9 10 11
Partitional Clustering:
K-Means
• Assigns each point to the cluster whose center (centroid) is nearest
– Centroid is the average of all the points in the cluster
• Steps:
– Choose the number of clusters, k.
– Randomly generate k clusters and determine the cluster centers, or directly
generate k random points as cluster centers.
– Assign each point to the nearest cluster center.
– Re-compute the new cluster centers.
– Repeat the two previous steps until some convergence criterion is met
(usually that the assignment hasn't changed).
• Advantages:
– Simplicity
– speed (great with large datasets)
• Disadvantages:
– Clusters depend on the initial random assignments - different clusters for
different runs
– Minimizes intra-cluster variance - does not ensure a global minimum of
variance
Partitional Clustering:
Fuzzy c-means
• Each point has a degree of belonging to clusters rather than
belonging completely to just one cluster
• Points on the edge of a cluster may be in the cluster to a lesser
degree than points in the center of cluster
• For each point x we have a coefficient giving the degree of being in
the kth cluster uk(x)
– Usually, the sum of those coefficients is defined to be 1 (think
probability):
• Centroid of a cluster is the mean of all points, weighted by their
degree of belonging to the cluster:
– The degree of belonging is related to the inverse of the distance to the
cluster
– Coefficients are normalized and fuzzyfied with a real parameter m > 1
– For m = 2, this is equivalent to normalizing the coefficient linearly to
make their sum 1. When m is close to 1, then cluster center closest to
the point is given much more weight than the others, and the algorithm
is similar to k-means.
Model-Based Classification:
Finite Mixture Models

• “[Mixture modeling] may provide an


approximation to a complex but unitary
population distribution of individual
trajectories” (Bauer & Curran, 2003, p. 339)
• Consider two examples
– A lognormal distribution MAY BE correctly
approximated as being composed of two
simpler curves
– A normal distribution is correctly
approximated as being composed of one
simple curve
Introduction to Mixture Modeling
• Model-based clustering
– Based on ML estimates of posterior membership probabilities rather than ad-hoc distance
measures
– Units in the same latent class share a common joint probability distribution among the
observed variables

• Empirical methods available to assist in model selection

• Modeling a “mixture” of subgroups from a population


– Population is a mixture of qualitatively different groups of individuals

• Representation of heterogeneity in a finite number of latent classes

• Identify these different groups by similarities in response patterns


Overview of Mixture Models

Muthen (2009)
Mixture Model Parameters
1. Class membership (or latent class) probability: number of classes (k) & relative
size of each class
– Where the number of classes (K) in the latent variable (C) represents the
number of latent types defined by the model
– For example, if the latent variable has three classes, the population can be
described as (a) being either three types or three levels of the underlying
latent continuum
• Minimum of 2 latent classes
– The relative size of each class indicates whether the population is relatively
evenly distributed among the K classes
• or whether some of the classes represent relatively large segments of the population
• or relatively small segments of the population (i.e. potential outliers)
2. A set of “traditional” parameters for each moment or association in the model
– means, variances, regression coefficients, covariances, factor loadings, etc.
Model Fit
• Log-likelihood
• G2 (likelihood ratio statistic)
• AIC
• BIC/SBC
• CAIC
• Adjusted BIC/SBC
• Entropy
Likelihood Ratio (G ) 2

• Like the Pearson χ2 statistic, the G2 statistic has asymptotic


chi-square distributions with respect to the degrees of
freedom, and thus the probability of acceptance of the
alternative hypothesis can be determined (McCutcheon,
2002, p. 68)
– Can be used to evaluate nested models that vary in the number
of parameters, but have the same number of latent classes.
• However…
– χ2 (or G2) values are not useful for determining the optimal
model because the likelihood ratios between the k-class and k-1
class model do not follow a chi-square distribution.
Parsimony Indices
• Information criteria (IC) approaches penalize the likelihood
for the increased number of parameters required to estimate
more complex (i.e., less parsimonious) models.”
(McCutcheon, 2002, pp. 68-69)
– Analogous to use of closeness of fit (RMSEA, etc.) tests instead of χ2
test in SEM, or adjusted r2 instead of r2
– Without parsimony, simply increase complexity to improve model
fit
• AIC tends to overestimate the number of classes present,
whereas the BIC (and by extension the CAIC) may
underestimate the number of classes present, particularly in
small samples” (McLachlan & Peel, 2000, p. 341)
Entropy
• Summary measure for the quality of the classification.
• Measures how clearly distinguishable the classes are based
on how distinctly each individual’s estimated class
probability is.
• If each individual has a high probability of being in just one class,
this will be high.
• Ranges from 0 to 1. Values close to 1 indicate high classification
accuracy, whereas values close to 0 indicate low classification
certainty.
– Entropy values of .40, .60, and .80 represent low, medium
and high class separation.
– No criterion for “close-fitting” or “exact-fitting”
Select the Optimal Class Model
• It is necessary to investigate multiple model fit indices in
order to select the final optimal model. Various statistical
indices :
– Information criteria (IC) statistics
• Bayesian Information Criterion (BIC), Akaike Information
Criterion (AIC)
• Sample-Size Adjusted BIC (SSABIC);
– Entropy values
– Likelihood Ratio Tests (LRT)
• Lo-Mendell-Rubin Likelihood Ratio Test (LMR LRT; TECH11)
• Bootstrap Likelihood Ratio Test (BLRT; TECH 14)
Likelihood Ratio Tests (LMR-LRT & BRT)
• Two LRTs, are often used for model comparison when determining the
optimal number of classes.
• Lo-Mendell-Rubin likelihood ratio test (LMR-LRT)
– Tests class K is better fit to data compared to K-1 class
• 2 vs. 1; 3 vs 2; 4 vs 3, etc.
• Bootstrapped Likelihood Ratio Test (BLRT)
– Using BLRT, the likelihood ratio test between the k-1 and k-class models is conducted
through a bootstrap procedure (Asparouhov & Muthen, 2012)

• Muthen (2002) suggests Lo, Mendell, and Rubin’s (2001) LMR Likelihood
Ratio Test (LMR-LRT)
• Nylund et al (2007) recommends BIC and Bootstrap Likelihood Ratio Test.

• In Mplus, TECH11 for LMR-LRT, TECH14 for BLRT


Select the optimal class model
• Selecting the optimal class model involves
considering more than fit indices. When
selecting the optimal class model, we must also
take into account:
• The theoretical expectations
• The substantive meaning and interpretability of
each class solution
• The need for parsimony
• The sample size of the smallest class
Issues: Local Likelihood Maxima
• Parameters are estimated with ML and
are iterative in nature (e.g., EM
algorithm).
• Ideally, the iteration will result in
successful convergence on the global
maximum solution.
• However, the algorithm cannot
distinguish between a global maximum
and a local maximum.
• The iterative optimization process could
stop prematurely and return a sub-
optimal set of parameter values
depending on the choice of the initial
starting values.
• Avoid extracting a large number of latent
classes, because local maxima are more
likely to occur in models with more
classes.
Issues: Convergence
• When the model is not identified, the model does
not converge and standard errors, related p-values
and other meaningful estimates are not estimated.
• Models often fail to converge when too many
parameters are simultaneously estimated in the
model.
• Non-convergence may also occur due to the use of
inappropriate data, such as variables measured on
different scales.
Issues: Convergence
• Larger samples & smaller models help (more
restrictive models).
• Supply good starting values.
• Check convergence using the iteration history,
increase the number of iterations.
• Run several models to the end and compare
estimates.
False Positives False Negatives
• “From this model, the researcher • “What is not always appreciated
might be tempted to conclude that about this model is that
the sample data arise from two nonnormality of f(x) is a necessary
unobserved groups, one large with a condition for estimating the
mean around 6, the other smaller parameters of the normal
group with a mean around 10.” components g1(x) and g2(x).” (Bauer
(Bauer & Curran, 2003a, p. 344) & Curran, 2003a, p. 342)
• “The AIC, the BIC, and the CAIC • Consider the distribution of height
supported selection of two classes in between men and women
almost 100% of the replications…”
(p. 349)
• Actually, it’s a lognormal distribution
False Positives & False Negatives
• “Not only is nonnormality required for the
solution of the model to be nontrivial, it may
well also be a sufficient condition for extracting
multiple components.” (Bauer & Curran, 2003,
343)
• Consider the height data again:
– Not clear if it will extract sexes – two obvious groups
– But what if a more sensible division is between
socio-economic groups, or diet, or…
Multiple Overlapping Sets of Latent Classes
• “Girls on average are shorter
at maturity than boys,
obviously. But there are slow
growers and fast growers,
early spurters and late
developers. The list of
plausible distinctions would
also include ethnic groups,
age cohorts, and classes
based on health status that
affect growth” (Cudeck &
Henley, 2003, pp. 381-382)
No Right Answer
• Some of these drawbacks can be mitigated if one
abandons the belief that mixture modeling is able
to recover the “true” populations that have been
sampled
• Muthen (2003) writes that “there are many
examples of equivalent models in statistics” (p.
376). A better approach may be to view mixture
modeling as presenting a model of what
populations may have been sampled
• But what about when we need to know?
Using Mplus to
Model Mixtures
Mplus Example:
Detecting Examinee Strategy
• GOAL: to detect differential examinee
strategies based on RT and accuracy

• On the examinee level, can a graphical


technique be used to detect different
examinee strategies, and can the existence of
such strategies be confirmed through a model-
based approach?
Detecting Examinee Strategy: Behavior
Types
• “Solution” behavior
– Power tests: solely solution behavior

• “Rapid-guessing” behavior
– Incidence increases as time expires and item difficulty
increases
– Can lead to bias in test/item and person parameters

• Schnipke & Scrams (1997) identified these behaviors


using RT
Mplus Syntax: 2 Classes
TITLE: Latent Class Modeling Example MODEL:
DATA: FILE = RT.txt; %OVERALL%

VARIABLE: %c#1%
NAMES = item1-item6; [item1-item6*1];
item1-item6;
USEVARIABLES = item1-item6;
CLASSES = c(2); ! change the (#) to reflect the # %c#2%
[item1-item6*2];
of classes k; item1-item6;
ANALYSIS:
TYPE=MIXTURE; OUTPUT: tech11 ! LMR-LRT test;
STARTS = 20 4; ! default is 20 4; tech14; !
STITERATIONS = 10; ! default is 10; bootstrap-LRT test;
LRTBOOTSTRAP = 50; ! default determined by the
SAVEDATA:
FILE = RTsol.txt;
program (between 2-100);
SAVE = CPROB; ! saves
LRTSTARTS = 2 1 40 8 ! k-1 class model has 2 & 1 random out class
sets of probabilities;
start values
! k class model has 40 & 8
random sets of
start values
Convergence & Model Quality
RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES

1 perturbed starting value run(s) did not converge in the initial stage
optimizations.

Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers:

-3170.320 76974 16
-3170.320 851945 18
-3170.320 27071 15
-3170.320 608496 4

THE BEST LOGLIKELIHOOD VALUE HAS BEEN REPLICATED. RERUN WITH AT LEAST TWICE THE
RANDOM STARTS TO CHECK THAT THE BEST LOGLIKELIHOOD IS STILL OBTAINED AND REPLICATED.

THE MODEL ESTIMATION TERMINATED NORMALLY


Model Fit
MODEL FIT INFORMATION

Number of Free Parameters 25

Loglikelihood

H0 Value -3170.320
H0 Scaling Correction Factor 1.2359
for MLR

Information Criteria

Akaike (AIC) 6390.640


Bayesian (BIC) 6496.005
Sample-Size Adjusted BIC 6416.653
(n* = (n + 2) / 24)
Class Counts & Proportions
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED
MODEL
Latent Classes
1 400.46531 0.80093
2 99.53469 0.19907

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON ESTIMATED
POSTERIOR PROBABILITIES
Latent Classes
1 400.46529 0.80093
2 99.53471 0.19907

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THEIR MOST
LIKELY LATENT CLASS MEMBERSHIP
Class Counts and Proportions

Latent Classes
1 405 0.81000
2 95 0.19000
Classification Quality
CLASSIFICATION QUALITY

Entropy 0.847

Average Latent Class Probabilities for Most Likely Latent Class Membership
(Row) by Latent Class (Column)

1 2
1 0.970 0.030
2 0.080 0.920

Classification Probabilities for the Most Likely Latent Class Membership


(Column) by Latent Class (Row)

1 2
1 0.981 0.019
2 0.122 0.878
Model Results
Estimate S.E. Est./S.E. Estimate S.E.
P-Value Est./S.E. P-Value
Latent Class 1 Latent Class 2

Means Means
ITEM1 3.027 0.033 91.883 0.000 ITEM1 2.773 0.063 44.191 0.000
ITEM2 3.202 0.038 84.323 0.000 ITEM2 2.790 0.069 40.403 0.000
ITEM3 2.966 0.038 77.543 0.000 ITEM3 2.411 0.125 19.234 0.000
ITEM4 2.896 0.036 80.627 0.000 ITEM4 2.315 0.142 16.303 0.000
ITEM5 3.979 0.053 75.078 0.000 ITEM5 2.158 0.279 7.748 0.000
ITEM6 4.089 0.064 63.537 0.000 ITEM6 1.984 0.270 7.346 0.000

Variances Variances
ITEM1 0.257 0.019 13.300 0.000 ITEM1 0.267 0.037 7.226 0.000
ITEM2 0.342 0.024 14.189 0.000 ITEM2 0.346 0.053 6.480 0.000
ITEM3 0.387 0.031 12.337 0.000 ITEM3 1.005 0.293 3.424 0.001
ITEM4 0.394 0.033 11.990 0.000 ITEM4 1.096 0.313 3.505 0.000
ITEM5 0.422 0.032 13.138 0.000 ITEM5 1.790 0.324 5.522 0.000
ITEM6 0.383 0.060 6.335 0.000 ITEM6 1.825 0.249 7.324 0.000
K vs K-1 Classes: LMR-LRT
TECHNICAL 11 OUTPUT

Random Starts Specifications for the k-1 Class Analysis Model


Number of initial stage random starts 20
Number of final stage optimizations 4

VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES


H0 Loglikelihood Value -3532.398
2 Times the Loglikelihood Difference 724.156
Difference in the Number of Parameters 13
Mean 20.634
Standard Deviation 95.477
P-Value 0.0001 ** 2 versus 1 class

LO-MENDELL-RUBIN ADJUSTED LRT TEST


Value 715.302
P-Value 0.0002
K vs K-1 Classes: BLRT
TECHNICAL 14 OUTPUT

PARAMETRIC BOOTSTRAPPED LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES


H0 Loglikelihood Value -3532.398
2 Times the Loglikelihood Difference 724.156
Difference in the Number of Parameters 13
Approximate P-Value 0.0000 ** 2 versus 1 class
Successful Bootstrap Draws 49

WARNING: OF THE 49 BOOTSTRAP DRAWS, 42 DRAWS HAD BOTH A SMALLER LRT VALUE THAN
THE OBSERVED LRT VALUE AND NOT A REPLICATED BEST LOGLIKELIHOOD VALUE FOR THE 2-
CLASS MODEL. THIS MEANS THAT THE P-VALUE MAY NOT BE TRUSTWORTHY DUE TO LOCAL
MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS USING THE LRTSTARTS OPTION.

WARNING: 1 OUT OF 50 BOOTSTRAP DRAWS DID NOT CONVERGE. INCREASE THE NUMBER OF
RANDOM STARTS USING THE LRTSTARTS OPTION.
Mplus Syntax: 3 Classes
TITLE: Latent Class Modeling Example MODEL:
DATA: FILE = RT.txt; %OVERALL%

VARIABLE: %c#1%
NAMES = ID item1-item6; [item1-item6*1]; item1-item6;
USEVARIABLES = item1-item6;
%c#2%
CLASSES = c(3); ! change the (#) to reflect the # [item1-item6*2]; item1-item6;
of classes k; %c#3%
ANALYSIS: [item1-item6*2.5]; item1-item6;
TYPE=MIXTURE;
STARTS = 50 10; ! default is 20 4; OUTPUT: tech11 ! LMR-LRT test;
STITERATIONS = 10; ! default is 10; tech14; !
bootstrap-LRT test;
LRTBOOTSTRAP = 50; ! default determined by the
SAVEDATA:
program (between 2-100);
FILE = RTsol.txt;
LRTSTARTS = 10 5 40 8 ! k-1 class model has 2 & 1 random SAVE = CPROB; ! saves
sets of out class
start values probabilities;
! k class model has 40 & 8
random sets of
start values
K vs K-1 Classes:
3 vs 2
TECHNICAL 14 OUTPUT TECHNICAL 11 OUTPUT

Random Starts Specifications for the k-1 Class Analysis Model Random Starts Specifications for the k-1 Class Analysis Model
Number of initial stage random starts 50
Number of initial stage random starts 50
Number of final stage optimizations 10
Number of final stage optimizations 10
Random Starts Specification for the k-1 Class Model for Generated Data
Number of initial stage random starts 100 VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 2 (H0)
Number of final stage optimizations 20
VERSUS 3 CLASSES
Random Starts Specification for the k Class Model for Generated Data
Number of initial stage random starts 100 H0 Loglikelihood Value -1650.905
Number of final stage optimizations 20 2 Times the Loglikelihood Difference 125.401
Number of bootstrap draws requested 100 Difference in the Number of Parameters 8
Mean
PARAMETRIC BOOTSTRAPPED LIKELIHOOD RATIO TEST FOR 2 (H0) VERSUS 3 22.145
CLASSES Standard Deviation 30.132
P-Value
H0 Loglikelihood Value -
0.0110
1650.905
2 Times the Loglikelihood Difference 125.401
** 3 versus 2 class
Difference in the Number of Parameters 8
Approximate P-Value LO-MENDELL-RUBIN ADJUSTED LRT TEST
0.0000
Successful Bootstrap Draws 100 Value 122.625
P-Value 0.0120
WARNING: OF THE 100 BOOTSTRAP DRAWS, 52 DRAWS HAD BOTH A
SMALLER LRT VALUE THAN THE OBSERVED LRT VALUE AND NOT A
REPLICATED BEST LOGLIKELIHOOD VALUE FOR THE 3-CLASS MODEL. THIS
MEANS THAT THE P-VALUE MAY NOT BE TRUSTWORTHY DUE TO LOCAL
MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS USING THE
Mplus Syntax: 4 Classes
TITLE: Latent Class Modeling Example MODEL:
DATA: FILE = RT.txt; %OVERALL%

VARIABLE: %c#1%
NAMES = ID item1-item6; [item1-item6*1]; item1-item6;
USEVARIABLES = item1-item6;
%c#2%
CLASSES = c(4); ! change the (#) to reflect the # [item1-item6*2]; item1-item6;
of classes k; %c#3%
ANALYSIS: [item1-item6*2.5]; item1-item6;
TYPE=MIXTURE;
STARTS = 50 10; ! default is 20 4; %c#4%
STITERATIONS = 10; ! default is 10; [item1-item6*3]; item1-item6;
LRTBOOTSTRAP = 50; ! default determined by the
OUTPUT: tech11 tech14;
SAVEDATA: FILE = RTsol.txt; SAVE = CPROB;
program (between 2-100);
LRTSTARTS = 2 1 40 8 ! k-1 class model has 2 & 1 random
sets of
start values
! k class model has 40 & 8
random sets of
start values
Model Fit & Number of Classes
# Classes VLMR Adj-LMR BLRT Entropy n1 n2 n3 n4
2 0.0001 0.0002 0.0000 0.847 405 95
3 0.0003 0.0003 0.0000 0.790 56 194 250
4 0.5412 0.5440 0.0000 0.773 58 104 207 131

IC Indices
6600
6500
6400
6300
6200
6100
6000
5900
5800
2 3 4

AIC BIC Adj-BIC -2LL


Contextualizing the Results
2 Class 3 Class
5 5

4 4

3 3

2 2

1 1

0 0
1 2 3 4 5 6 1 2 3 4 5 6

Series1 Series2 Series1 Series2 Series3


4 Class
5

0
1 2 3 4 5 6

1 2 3 4
Further Contextualizing the Results:
Accuracy
Sample 3 - Power Sample 3 - 70%
1.0 1.0

0.5 n = 357
0.5
n = 405 0.0
0.0
n = 284
-0.5
z-ln(RT)

-0.5

z-ln(RT)
n = 347 -1.0 n = 115
-1.0
-1.5
-1.5
O = f >/= 0.50 -2.0 n = 62
-2.0 X = f < 0.50 n = 66
-2.5 O = f >/= 0.50
X = f < 0.50
-2.5
-3.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1

9
11

13

15

17

19

21

23

25

27

29

31

33
Items Items

Sample 3 - 70% Sample 3 - 50%


1.0 1.0
n = 331 n = 332
0.0 0.5
n = 281
-1.0 0.0
n = 119
n = 258
z-ln(RT)

-2.0 z-ln(RT)
-0.5
n = 71
-3.0 -1.0 n = 136
-4.0 n = 16 -1.5
-5.0 O = f >/= 0.50 -2.0 O = f >/= 0.50 n = 92
X = f < 0.50
-6.0 X = f < 0.50
-2.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Items Items
Mixture CFA Modeling
Structural Equation Mixture Modeling
Zero-Inflated Poisson (ZIP) Regression as a
Two-Class Model
Growth Mixture Modeling (GMM)
Hidden Markov Model
All Available to YOU Through the Program
Syntax & Simulation Files
Thank You!

[email protected]

Nebraska Academy for Methodology, Analytics &


Psychometrics (MAP Academy)
http://mapacademy.unl.edu/

You might also like