Latent Clustering W Mplus v2
Latent Clustering W Mplus v2
• Purpose: the classification of cases into different groups called clusters (or classes) so
that cases within a cluster are more similar to each other than they are to cases in
other clusters.
– The data set is partitioned into subsets (clusters), so that the data in each subset (ideally)
share some common trait
– often proximity according to some defined distance measure.
• The underlying mathematics of most of these methods are relatively simple but large
numbers of calculations are needed which can put a heavy demand on the computer.
• Divisive
– Begins with all cases in one cluster. This cluster is gradually broken down into smaller
and smaller clusters.
• Agglomerative
– Start with (usually) single member clusters. These are gradually fused until one large
cluster is formed.
• Monothetic scheme
– cluster membership is based on a single characteristic
• Polythetic scheme
– use more than one characteristic (variables)
Types of Traditional Clustering
• Hierarchical, or connectivity-
based, algorithms:
– find successive clusters using
previously established clusters
– Agglomerative (“bottom-up”)
algorithms begin with each
element as a separate cluster
and merge them into
successively larger clusters
– Divisive (“top-down”) algorithms
begin with the whole set and
proceed to divide it into
successively smaller clusters.
• Partitional, or centroid-based,
algorithms:
– determine all clusters at once
Distance Measures
• Determines how the similarity of
two elements is calculated.
– Influences the shape and size of the
clusters
– some elements may be close to one
another according to one distance and
further away according to another.
• Common distance functions:
– Euclidean (i.e. “as the crow flies”):
– Squared Euclidean
– Manhattan (also called “city block”)
– Mahalanobis
– Chebychev
• Alternatives to “distance”
– Semantic relatedness
• “Distance” based on databases and search
engines, learned from analysis of a corpus
City Block distance
Clustering Algorithms
• Complete linkage: the maximum distance between elements of each
cluster
• Single linkage: the minimum distance between elements of each cluster
• Average linkage: the mean distance between elements of each cluster
• Sum of all intra-cluster variance
• Ward’s criterion: the increase in variance for the cluster being merged
Nearest neighbor, squared Euclidean distance Furthest neighbor, squared Euclidean distance
standardized variables standardized variables
Choosing the Number of Clusters
Agglomeration Schedule
(X-axis)
– The distance between the clusters (y-axis) 2000
(x-axis) 1600
1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11
Partitional Clustering:
K-Means
• Assigns each point to the cluster whose center (centroid) is nearest
– Centroid is the average of all the points in the cluster
• Steps:
– Choose the number of clusters, k.
– Randomly generate k clusters and determine the cluster centers, or directly
generate k random points as cluster centers.
– Assign each point to the nearest cluster center.
– Re-compute the new cluster centers.
– Repeat the two previous steps until some convergence criterion is met
(usually that the assignment hasn't changed).
• Advantages:
– Simplicity
– speed (great with large datasets)
• Disadvantages:
– Clusters depend on the initial random assignments - different clusters for
different runs
– Minimizes intra-cluster variance - does not ensure a global minimum of
variance
Partitional Clustering:
Fuzzy c-means
• Each point has a degree of belonging to clusters rather than
belonging completely to just one cluster
• Points on the edge of a cluster may be in the cluster to a lesser
degree than points in the center of cluster
• For each point x we have a coefficient giving the degree of being in
the kth cluster uk(x)
– Usually, the sum of those coefficients is defined to be 1 (think
probability):
• Centroid of a cluster is the mean of all points, weighted by their
degree of belonging to the cluster:
– The degree of belonging is related to the inverse of the distance to the
cluster
– Coefficients are normalized and fuzzyfied with a real parameter m > 1
– For m = 2, this is equivalent to normalizing the coefficient linearly to
make their sum 1. When m is close to 1, then cluster center closest to
the point is given much more weight than the others, and the algorithm
is similar to k-means.
Model-Based Classification:
Finite Mixture Models
Muthen (2009)
Mixture Model Parameters
1. Class membership (or latent class) probability: number of classes (k) & relative
size of each class
– Where the number of classes (K) in the latent variable (C) represents the
number of latent types defined by the model
– For example, if the latent variable has three classes, the population can be
described as (a) being either three types or three levels of the underlying
latent continuum
• Minimum of 2 latent classes
– The relative size of each class indicates whether the population is relatively
evenly distributed among the K classes
• or whether some of the classes represent relatively large segments of the population
• or relatively small segments of the population (i.e. potential outliers)
2. A set of “traditional” parameters for each moment or association in the model
– means, variances, regression coefficients, covariances, factor loadings, etc.
Model Fit
• Log-likelihood
• G2 (likelihood ratio statistic)
• AIC
• BIC/SBC
• CAIC
• Adjusted BIC/SBC
• Entropy
Likelihood Ratio (G ) 2
• Muthen (2002) suggests Lo, Mendell, and Rubin’s (2001) LMR Likelihood
Ratio Test (LMR-LRT)
• Nylund et al (2007) recommends BIC and Bootstrap Likelihood Ratio Test.
• “Rapid-guessing” behavior
– Incidence increases as time expires and item difficulty
increases
– Can lead to bias in test/item and person parameters
VARIABLE: %c#1%
NAMES = item1-item6; [item1-item6*1];
item1-item6;
USEVARIABLES = item1-item6;
CLASSES = c(2); ! change the (#) to reflect the # %c#2%
[item1-item6*2];
of classes k; item1-item6;
ANALYSIS:
TYPE=MIXTURE; OUTPUT: tech11 ! LMR-LRT test;
STARTS = 20 4; ! default is 20 4; tech14; !
STITERATIONS = 10; ! default is 10; bootstrap-LRT test;
LRTBOOTSTRAP = 50; ! default determined by the
SAVEDATA:
FILE = RTsol.txt;
program (between 2-100);
SAVE = CPROB; ! saves
LRTSTARTS = 2 1 40 8 ! k-1 class model has 2 & 1 random out class
sets of probabilities;
start values
! k class model has 40 & 8
random sets of
start values
Convergence & Model Quality
RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES
1 perturbed starting value run(s) did not converge in the initial stage
optimizations.
Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers:
-3170.320 76974 16
-3170.320 851945 18
-3170.320 27071 15
-3170.320 608496 4
THE BEST LOGLIKELIHOOD VALUE HAS BEEN REPLICATED. RERUN WITH AT LEAST TWICE THE
RANDOM STARTS TO CHECK THAT THE BEST LOGLIKELIHOOD IS STILL OBTAINED AND REPLICATED.
Loglikelihood
H0 Value -3170.320
H0 Scaling Correction Factor 1.2359
for MLR
Information Criteria
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON ESTIMATED
POSTERIOR PROBABILITIES
Latent Classes
1 400.46529 0.80093
2 99.53471 0.19907
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THEIR MOST
LIKELY LATENT CLASS MEMBERSHIP
Class Counts and Proportions
Latent Classes
1 405 0.81000
2 95 0.19000
Classification Quality
CLASSIFICATION QUALITY
Entropy 0.847
Average Latent Class Probabilities for Most Likely Latent Class Membership
(Row) by Latent Class (Column)
1 2
1 0.970 0.030
2 0.080 0.920
1 2
1 0.981 0.019
2 0.122 0.878
Model Results
Estimate S.E. Est./S.E. Estimate S.E.
P-Value Est./S.E. P-Value
Latent Class 1 Latent Class 2
Means Means
ITEM1 3.027 0.033 91.883 0.000 ITEM1 2.773 0.063 44.191 0.000
ITEM2 3.202 0.038 84.323 0.000 ITEM2 2.790 0.069 40.403 0.000
ITEM3 2.966 0.038 77.543 0.000 ITEM3 2.411 0.125 19.234 0.000
ITEM4 2.896 0.036 80.627 0.000 ITEM4 2.315 0.142 16.303 0.000
ITEM5 3.979 0.053 75.078 0.000 ITEM5 2.158 0.279 7.748 0.000
ITEM6 4.089 0.064 63.537 0.000 ITEM6 1.984 0.270 7.346 0.000
Variances Variances
ITEM1 0.257 0.019 13.300 0.000 ITEM1 0.267 0.037 7.226 0.000
ITEM2 0.342 0.024 14.189 0.000 ITEM2 0.346 0.053 6.480 0.000
ITEM3 0.387 0.031 12.337 0.000 ITEM3 1.005 0.293 3.424 0.001
ITEM4 0.394 0.033 11.990 0.000 ITEM4 1.096 0.313 3.505 0.000
ITEM5 0.422 0.032 13.138 0.000 ITEM5 1.790 0.324 5.522 0.000
ITEM6 0.383 0.060 6.335 0.000 ITEM6 1.825 0.249 7.324 0.000
K vs K-1 Classes: LMR-LRT
TECHNICAL 11 OUTPUT
WARNING: OF THE 49 BOOTSTRAP DRAWS, 42 DRAWS HAD BOTH A SMALLER LRT VALUE THAN
THE OBSERVED LRT VALUE AND NOT A REPLICATED BEST LOGLIKELIHOOD VALUE FOR THE 2-
CLASS MODEL. THIS MEANS THAT THE P-VALUE MAY NOT BE TRUSTWORTHY DUE TO LOCAL
MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS USING THE LRTSTARTS OPTION.
WARNING: 1 OUT OF 50 BOOTSTRAP DRAWS DID NOT CONVERGE. INCREASE THE NUMBER OF
RANDOM STARTS USING THE LRTSTARTS OPTION.
Mplus Syntax: 3 Classes
TITLE: Latent Class Modeling Example MODEL:
DATA: FILE = RT.txt; %OVERALL%
VARIABLE: %c#1%
NAMES = ID item1-item6; [item1-item6*1]; item1-item6;
USEVARIABLES = item1-item6;
%c#2%
CLASSES = c(3); ! change the (#) to reflect the # [item1-item6*2]; item1-item6;
of classes k; %c#3%
ANALYSIS: [item1-item6*2.5]; item1-item6;
TYPE=MIXTURE;
STARTS = 50 10; ! default is 20 4; OUTPUT: tech11 ! LMR-LRT test;
STITERATIONS = 10; ! default is 10; tech14; !
bootstrap-LRT test;
LRTBOOTSTRAP = 50; ! default determined by the
SAVEDATA:
program (between 2-100);
FILE = RTsol.txt;
LRTSTARTS = 10 5 40 8 ! k-1 class model has 2 & 1 random SAVE = CPROB; ! saves
sets of out class
start values probabilities;
! k class model has 40 & 8
random sets of
start values
K vs K-1 Classes:
3 vs 2
TECHNICAL 14 OUTPUT TECHNICAL 11 OUTPUT
Random Starts Specifications for the k-1 Class Analysis Model Random Starts Specifications for the k-1 Class Analysis Model
Number of initial stage random starts 50
Number of initial stage random starts 50
Number of final stage optimizations 10
Number of final stage optimizations 10
Random Starts Specification for the k-1 Class Model for Generated Data
Number of initial stage random starts 100 VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 2 (H0)
Number of final stage optimizations 20
VERSUS 3 CLASSES
Random Starts Specification for the k Class Model for Generated Data
Number of initial stage random starts 100 H0 Loglikelihood Value -1650.905
Number of final stage optimizations 20 2 Times the Loglikelihood Difference 125.401
Number of bootstrap draws requested 100 Difference in the Number of Parameters 8
Mean
PARAMETRIC BOOTSTRAPPED LIKELIHOOD RATIO TEST FOR 2 (H0) VERSUS 3 22.145
CLASSES Standard Deviation 30.132
P-Value
H0 Loglikelihood Value -
0.0110
1650.905
2 Times the Loglikelihood Difference 125.401
** 3 versus 2 class
Difference in the Number of Parameters 8
Approximate P-Value LO-MENDELL-RUBIN ADJUSTED LRT TEST
0.0000
Successful Bootstrap Draws 100 Value 122.625
P-Value 0.0120
WARNING: OF THE 100 BOOTSTRAP DRAWS, 52 DRAWS HAD BOTH A
SMALLER LRT VALUE THAN THE OBSERVED LRT VALUE AND NOT A
REPLICATED BEST LOGLIKELIHOOD VALUE FOR THE 3-CLASS MODEL. THIS
MEANS THAT THE P-VALUE MAY NOT BE TRUSTWORTHY DUE TO LOCAL
MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS USING THE
Mplus Syntax: 4 Classes
TITLE: Latent Class Modeling Example MODEL:
DATA: FILE = RT.txt; %OVERALL%
VARIABLE: %c#1%
NAMES = ID item1-item6; [item1-item6*1]; item1-item6;
USEVARIABLES = item1-item6;
%c#2%
CLASSES = c(4); ! change the (#) to reflect the # [item1-item6*2]; item1-item6;
of classes k; %c#3%
ANALYSIS: [item1-item6*2.5]; item1-item6;
TYPE=MIXTURE;
STARTS = 50 10; ! default is 20 4; %c#4%
STITERATIONS = 10; ! default is 10; [item1-item6*3]; item1-item6;
LRTBOOTSTRAP = 50; ! default determined by the
OUTPUT: tech11 tech14;
SAVEDATA: FILE = RTsol.txt; SAVE = CPROB;
program (between 2-100);
LRTSTARTS = 2 1 40 8 ! k-1 class model has 2 & 1 random
sets of
start values
! k class model has 40 & 8
random sets of
start values
Model Fit & Number of Classes
# Classes VLMR Adj-LMR BLRT Entropy n1 n2 n3 n4
2 0.0001 0.0002 0.0000 0.847 405 95
3 0.0003 0.0003 0.0000 0.790 56 194 250
4 0.5412 0.5440 0.0000 0.773 58 104 207 131
IC Indices
6600
6500
6400
6300
6200
6100
6000
5900
5800
2 3 4
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 6 1 2 3 4 5 6
0
1 2 3 4 5 6
1 2 3 4
Further Contextualizing the Results:
Accuracy
Sample 3 - Power Sample 3 - 70%
1.0 1.0
0.5 n = 357
0.5
n = 405 0.0
0.0
n = 284
-0.5
z-ln(RT)
-0.5
z-ln(RT)
n = 347 -1.0 n = 115
-1.0
-1.5
-1.5
O = f >/= 0.50 -2.0 n = 62
-2.0 X = f < 0.50 n = 66
-2.5 O = f >/= 0.50
X = f < 0.50
-2.5
-3.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1
9
11
13
15
17
19
21
23
25
27
29
31
33
Items Items
-2.0 z-ln(RT)
-0.5
n = 71
-3.0 -1.0 n = 136
-4.0 n = 16 -1.5
-5.0 O = f >/= 0.50 -2.0 O = f >/= 0.50 n = 92
X = f < 0.50
-6.0 X = f < 0.50
-2.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Items Items
Mixture CFA Modeling
Structural Equation Mixture Modeling
Zero-Inflated Poisson (ZIP) Regression as a
Two-Class Model
Growth Mixture Modeling (GMM)
Hidden Markov Model
All Available to YOU Through the Program
Syntax & Simulation Files
Thank You!