0% found this document useful (0 votes)
3 views12 pages

Questions For Applied Multivariate Analysis

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

Questions For Applied Multivariate Analysis

Uploaded by

PRANAV T V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Short Answer

Factor Analysis
1. Define factor analysis. What is its main objective?
2. Explain the orthogonal factor model in factor analysis.
3. Describe the significance of factor loadings in factor analysis.
4. What is meant by the estimation of factor loadings?
5. Explain the concept of factor rotation in factor analysis. Why is it used?
6. Differentiate between orthogonal and oblique rotations in factor analysis.
7. What is the purpose of eigenvalues in factor analysis?
8. How does factor analysis differ from principal component analysis?

Discrimination and Classification


9. Define discrimination analysis. When is it used?
10. Explain the purpose of classification in multivariate analysis.
11. What assumptions are made for classifying a multivariate normal population?
12. Describe Fisher’s Linear Discriminant Function and its main objective.
13. How is Fisher’s linear discriminant function used in classification?
14. Explain the concept of quadratic discriminant analysis.
15. What is the main difference between linear and quadratic discriminant analysis?
16. Briefly describe logistic regression and its use in classification.

Logistic Regression
17. Explain how logistic regression handles binary classification.
18. Define the logit function in logistic regression.
19. What are the assumptions underlying logistic regression?
20. Differentiate between logistic regression and linear regression in terms of application.
21. Why is logistic regression considered a type of generalized linear model?

Factor Analysis and Orthogonal Factor Model


22. Given the following correlation matrix for three variables,calculate the communalities assuming a one-factor model.
 
1.0 0.8 0.6
0.8 1.0 0.7
0.6 0.7 1.0

23. For a dataset with the factor loading matrix:


 
0.7 0.5
L = 0.6 0.4
0.5 0.8
calculate the uniqueness for each variable.
24. Suppose a two-factor model has the following estimated factor loadings:
 
0.8 0.3
L = 0.6 0.5
0.7 0.2
Perform an orthogonal rotation on these loadings using the varimax rotation method.

1
25. Factor Extraction: Given the covariance matrix below for three variables, extract the first two factors using
principal component analysis. Calculate the factor loadings for each variable on the factors. Covariance matrix:
 
2 0.5 0.8
Σ = 0.5 1.5 0.3
0.8 0.3 1.8

26. Factor Rotation: For the factor loading matrix below, perform a varimax rotation and obtain the rotated factor
loadings. Factor loading matrix:  
0.8 0.3
L = 0.4 0.6
0.7 0.5

Estimation of Factor Loadings


27. Given a dataset with three variables and a correlation matrix, perform principal component analysis to extract the
first principal component as a factor and calculate its loadings.
28. In a factor analysis, if the initial factor loading for two variables on a single factor is 0.6 and 0.8 respectively,
calculate the variance explained by the factor.

Discrimination and Classification


29. Given the following mean vectors and covariance matrix for two classes in a two-dimensional space, calculate the
linear discriminant function.
 
2
• Class 1 mean vector: µ1 =
3
 
5
• Class 2 mean vector: µ2 =
7
 
1 0
• Common covariance matrix: Σ =
0 1
   
3 5
30. Suppose you have a two-class problem where the mean vectors are µ1 = and µ2 = , and the covariance
4 6
matrices are identical. Calculate the boundary of classification using Fisher’s linear discriminant.

Fisher’s Linear Discriminant Function


   
2 4
31. Given two classes with sample means µ1 = and µ2 = and pooled within-class covariance matrix Σ =
  3 6
1 0.5
, find the Fisher discriminant vector.
0.5 1
32. For two groups
  with the following covariance matrix and means, compute the Fisher discriminant score for a sample
4
point x = .
5
 
2
• Group 1 mean: µ1 =
3
 
6
• Group 2 mean: µ2 =
7
 
1 0.2
• Pooled covariance matrix: Σ =
0.2 1

Quadratic Discriminant Analysis (QDA)


     
2 5 1 0.5
33. Given two classes with mean vectors µ1 = and µ2 = and covariance matrices Σ1 = and
3 7 0.5 2
 
2 0.3
Σ2 = , derive the QDA decision boundary.
0.3 1

2
Logistic Regression
 
p
34. For a logistic regression model with the equation log 1−p = 1.5 + 0.7x1 − 0.3x2 , calculate the probability p of the
event occurring when x1 = 2 and x2 = 1.
35. Suppose a logistic regression model gives the following coefficients: intercept = 0.5, x1 coefficient = 1.2. Calculate
the odds ratio associated with a one-unit increase in x1

Discrimination and Classification


36. Classification of Multivariate Normal Populations: Consider two multivariate normal populations with mean
vectors µ1 = [3, 4]T , µ2 = [6, 7]T and common covariance matrix
 
1 0.5
Σ=
0.5 2
Derive the linear discriminant function and use it to classify the point [5, 5]T . Determine which population this
point is most likely to belong to.
37. Quadratic Discriminant Analysis: Suppose two multivariate normal populations have covariance matrices
   
2 0 1 0
Σ1 = and Σ2 =
0 1 0 1.5
and mean vectors µ1 = [1, 2]T , µ2 = [3, 4]T . Calculate the quadratic discriminant function and classify the point
[2, 3]T .

Fisher’s Linear Discriminant Function


38. Fisher’s Linear Discriminant Analysis: Given two classes with the following parameters:
 
2 0.4
• Class 1: µ1 = [4, 5]T , Σ1 =
0.4 1.5
 
2 0.3
• Class 2: µ2 = [6, 8]T , Σ2 =
0.3 1.7
Construct Fisher’s linear discriminant function. Use this function to classify the new observation [5, 6]T and explain
the results.
39. Fisher’s Discriminant for Unequal Covariance Matrices: Consider two classes with means µ1 = [1, 2]T and
µ2 = [4, 6]T and unequal covariance matrices:
   
1 0.3 1.5 0.4
Σ1 = and Σ2 =
0.3 1.5 0.4 2
Develop Fisher’s linear discriminant function and discuss how you would apply it despite unequal covariances.
Classify a new observation [3, 4]T .

Logistic Regression
40. Logistic Regression Model Fitting: Suppose you are given the following data for a binary classification problem:

X1 X2 Y
2 3 0
4 6 0
5 7 1
6 8 1
8 9 1

Fit a logistic regression model of the form


eβ0 +β1 X1 +β2 X2
P (Y = 1|X) =
1 + eβ0 +β1 X1 +β2 X2
Estimate the coefficients β0 , β1 , and β2 using maximum likelihood estimation and interpret the results.
41. Prediction with Logistic Regression: A logistic regression model is given by
 
P (Y = 1)
ln = −3 + 0.5X1 + 0.7X2
1 − P (Y = 1)
Use this model to predict the probability P (Y = 1) for a new observation where X1 = 4 and X2 = 5.

3
Factor Analysis and Factor Rotation
42. Suppose you have the following covariance matrix for four variables:
 
4 2 1 0.5
2 3 0.8 1 
Σ=  1 0.8 2 0.6

0.5 1 0.6 1.5

Use maximum likelihood estimation to find the loadings of the first two factors and calculate the communalities of
each variable.
43. Varimax Rotation in Factor Analysis: Given the factor loading matrix below for three factors, perform a
varimax rotation to obtain the rotated factor loadings.
 
0.7 0.2 0.5
0.6 0.3 0.4
L= 0.8 0.5 0.1

0.4 0.7 0.2

Describe the impact of the rotation on the interpretability of the factors.

Discrimination and Classification


44. Classification with Prior Probabilities: Suppose two populations have multivariate normal distributions with
parameters:
 
1 0.2
• Population 1: µ1 = [3, 3] , Σ1 =
T
, and prior probability P1 = 0.6.
0.2 1
 
1 −0.2
• Population 2: µ2 = [5, 5] , Σ2 =
T
, and prior probability P2 = 0.4.
−0.2 1

Derive the Bayes discriminant function for classifying new observations and classify the point [4, 4]T using this
function.
45. Performance Evaluation in Quadratic Discriminant Analysis (QDA): Using QDA, classify a set of obser-
vations for two populations:
 
1 0
• Population 1: µ1 = [2, 3] , Σ1 =
T
.
0 2
 
2 0.5
• Population 2: µ2 = [4, 6]T , Σ2 = .
0.5 1

Classify the following points: [3, 4]T , [5, 5]T , and [2, 6]T . Calculate and discuss the misclassification rate if known
population labels are available for these points.

Logistic Regression
46. Multiclass Logistic Regression: Consider a dataset with three classes (Class 1, Class 2, Class 3) and predictor
variables X1 and X2 . For the following observations:

Class X1 X2
1 2 3
1 3 4
2 5 6
2 6 7
3 8 9
3 9 10

Fit a multinomial logistic regression model to predict the class label based on X1 and X2 . Provide the coefficient
estimates and interpret their meaning.
47. Regularized Logistic Regression: You are given a dataset with predictor variables X1 , X2 , and a binary response
variable Y where Y = 1 indicates success and Y = 0 indicates failure. The data is as follows:

4
X1 X2 Y
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1

Fit a logistic regression model with L2 regularization (Ridge) to predict Y based on X1 and X2 . Discuss the effect
of the regularization parameter on the coefficient estimates.

48. Interpreting Model Coefficients: Consider a logistic regression model with the following form:
 
P (Y = 1)
ln = −1 + 0.3X1 − 0.5X2 + 0.7X3
1 − P (Y = 1)

Using the data point (X1 , X2 , X3 ) = (4, 2, 5), calculate the probability P (Y = 1). Explain the impact of each
coefficient on the predicted probability for this data point.

5
Introduction to Cluster Analysis
1. Define cluster analysis and its primary goal.
2. Explain the importance of cluster analysis in data mining.
3. What are the main applications of cluster analysis?
4. Given a dataset of 10 samples with two features each, calculate the Euclidean and Manhattan distances between
all sample pairs. Based on these distances, suggest an appropriate clustering method and justify your choice.

Similarity and Dissimilarity Measures


5. Differentiate between similarity and dissimilarity measures.
6. List three common similarity measures used in cluster analysis.
7. Briefly describe Euclidean distance as a dissimilarity measure.
8. Suppose you have a dataset with 6 observations and three variables. Calculate the similarity matrix using cosine
similarity and the dissimilarity matrix using Jaccard similarity. Interpret the results and explain which observations
are more similar to each other.

Proximity Measures
9. Define a proximity measure and its role in clustering.
10. Mention any two proximity measures suitable for categorical data.
11. How does the choice of proximity measure impact clustering results?
12. Consider a dataset with the following five 2-dimensional points: (2, 3), (3, 3), (4, 4), (5, 5), and (6, 8). Compute the
proximity matrix using the Euclidean distance measure. Describe the role of proximity in clustering these points
and identify pairs of points with high similarity.

Types of Clustering
13. Differentiate between hierarchical and non-hierarchical clustering.
14. What is the primary characteristic of partitional clustering?
15. Describe one advantage and one disadvantage of hierarchical clustering.

Hierarchical Clustering: Agglomerative and Divisive Methods


16. Define agglomerative hierarchical clustering.
17. What is a dendrogram, and how is it used in hierarchical clustering?
18. Differentiate between agglomerative and divisive clustering methods.
19. Using the proximity matrix from the previous question, perform agglomerative clustering with the single-linkage
method. Create a dendrogram and indicate the clusters formed at each stage until only one cluster remains.
20. Given a dataset with 8 observations, divide the data iteratively using the divisive clustering approach. Show each
step in the division and calculate the Euclidean distances to split the clusters based on minimizing within-cluster
variance.

Non-Hierarchical Clustering: K-means Clustering Algorithm


21. Explain the K-means clustering algorithm in brief.
22. What are the main assumptions of the K-means algorithm?
23. Mention one advantage and one limitation of K-means clustering.
24. For the dataset consisting of points: (1, 1), (2, 1), (4, 3), (5, 4), initialize two random centroids at (1, 1) and (5, 4).
Perform two iterations of the K-means algorithm to update the cluster centroids. Show the calculations for assigning
points to clusters and updating the centroids.

6
K-nearest-neighbor Classifiers
25. Describe the K-nearest-neighbor (KNN) algorithm for classification.
26. What are the main requirements for implementing KNN classifiers?
27. How does the choice of K affect the performance of a KNN classifier?
28. You have a dataset with 6 observations in two-dimensional space. Four of these belong to Class A: (1, 2), (2, 3),
(3, 1), and (3, 2); and two belong to Class B: (6, 7) and (7, 8). Use K = 3 in the K-nearest-neighbor classification
algorithm to classify a new point at (5, 5). Explain the process and show all calculations.

K-medoids Clustering
29. Differentiate between K-means and K-medoids clustering.
30. What is the primary goal of the K-medoids algorithm?
31. Mention one advantage of K-medoids over K-means in terms of robustness to outliers.
32. For a dataset containing five points: (1, 1), (2, 2), (3, 3), (8, 8), and (9, 9), use the K-medoids algorithm to cluster
the data into two clusters. Choose initial medoids as (1, 1) and (9, 9), and perform one iteration of the algorithm.
Show how you calculate the cost and update the medoids.

Long Answer
33. Given the following dataset of points in a two-dimensional space: (2, 3), (3, 3), (6, 5), (8, 8), and (7, 5), perform
K-means clustering with K = 2. Show all steps, including initialization, assignment, and updates to centroids, and
provide the final clusters and centroids.
34. Given a distance matrix for the following points: A(1, 2), B(2, 2), C(3, 5), D(5, 6), calculate the clustering using
agglomerative hierarchical clustering. Provide a dendrogram representation of the clusters formed.
35. Consider the following dataset with two classes:
• Class 1: (1, 1), (1, 2), (2, 1)
• Class 2: (6, 5), (7, 5), (7, 6)
Using a K-nearest neighbor classifier with K = 2, classify the point (5, 4). Show all calculations for distance
measures used.
36. Given the following data points: (1, 2), (1, 4), (1, 0), (10, 2), (10, 4), (10, 0), apply the K-medoids algorithm with
K = 2. Calculate the medoids, the assignments of points to clusters, and the total cost.
37. Consider the following two sets of data points:
• Set 1: (1, 2), (2, 3)
• Set 2: (3, 4), (5, 6)
Calculate the Euclidean and Manhattan dissimilarity measures between the two sets of data points and discuss the
differences in the results.
38. Given three points A(2, 3), B(3, 5), and C(5, 7), compute the proximity matrix using the cosine similarity measure.
Show all calculations and interpret the results.
39. Explain how agglomerative and divisive hierarchical clustering methods differ and provide a numerical example
where both methods are applied to the same dataset. Compare the clustering results in terms of the number of
clusters formed.
40. Given a clustering result from a K-means algorithm with three clusters, calculate the Silhouette Coefficient for each
point. Use the following distance values:
• Distances within the same cluster: (1, 0.5, 0.2)
• Distances to nearest cluster: (2, 2.5, 3)
Provide a conclusion based on the Silhouette Coefficients.
41. For the points P1 (1, 2), P2 (4, 6), and P3 (5, 8), compute the pairwise distances using both the Euclidean and Cheby-
shev distance measures. Which measure suggests a stronger relationship between the points?
42. Using a dataset of 5 points: (1, 1), (1, 3), (2, 2), (8, 8), and (9, 9), run the K-means algorithm for K = 2 for two
iterations. Start with initial centroids at (1, 1) and (8, 8). Show the changes in centroids and assignments after
each iteration.

7
Unit 4

Short Answer
Density Search Clustering Techniques
1. Define density-based clustering. How does it differ from other clustering methods?
2. Explain the concept of core points, border points, and noise points in density-based clustering.
3. Describe the DBSCAN algorithm and its key parameters.
4. Given a dataset with coordinates of points, apply the DBSCAN algorithm with specified values for the parameters
ϵ and minP ts. Identify the clusters and noise points.
5. Consider a dataset where the density threshold is defined as 3 points within a radius of 2 units. Determine whether
each point forms a core point, border point, or noise point.

Clustering with Constraints


6. What are clustering constraints? Provide examples of soft and hard constraints in clustering.
7. Discuss the impact of constraints on the clustering outcome. Why might they be necessary?
8. Explain the role of background knowledge in constrained clustering.
9. You are provided with a dataset and a set of must-link and cannot-link constraints. Given two clusters, assign
points to clusters while satisfying these constraints.
10. Apply constrained K-means clustering on a small dataset with must-link constraints between certain pairs of points.
Show the initial and final cluster assignments.

Fuzzy Clustering
11. Define fuzzy clustering. How does it differ from traditional clustering methods?
12. Describe the concept of membership degrees in fuzzy clustering.
13. Explain the Fuzzy C-Means algorithm and its objective function.
14. For a set of three data points, calculate the membership values for each cluster center using the Fuzzy C-means
algorithm with given cluster centers.
15. Given a dataset and two initial cluster centers, calculate one iteration of fuzzy membership values for each point.

Optimization Clustering Techniques


16. What is the objective of optimization in clustering? Give examples of optimization techniques used.
17. Discuss how the k-means clustering algorithm can be seen as an optimization problem.
18. Explain the concept of silhouette score and its role in optimizing clustering.
19. Given a dataset and a distance matrix, calculate the objective function for the K-medoids clustering algorithm for
a specific assignment.
20. A cluster optimization algorithm is run on a dataset, minimizing the sum of squared distances to cluster centers.
Given initial clusters, calculate the objective function before and after an iteration.

Discrete Data Clustering


21. What challenges arise in clustering discrete data compared to continuous data?
22. Describe a method for clustering categorical data, such as the k-modes algorithm.
23. Explain how similarity measures for discrete data differ from those for continuous data.
24. Given a dataset of categorical variables, use K-modes clustering to assign points to clusters. Calculate the mode of
each cluster at the first iteration.
25. For a categorical dataset, calculate the initial cluster assignments for two clusters using a similarity measure suitable
for categorical data.

8
Mixture for Categorical Data
26. What is a mixture model for categorical data? How does it differ from traditional clustering models?
27. Describe the role of the Expectation-Maximization (EM) algorithm in fitting mixture models.
28. Explain the concept of multinomial distributions in the context of categorical data mixture models.
29. Consider a dataset with two categories and apply a basic EM algorithm iteration to estimate cluster membership
probabilities.
30. Given a dataset and initial probabilities for two clusters, compute the expected counts of each category under the
current mixture model.

Latent Class Analysis


31. Define latent class analysis (LCA) and its applications in statistical modeling.
32. Discuss how LCA can be used to identify unobserved subgroups within a population.
33. Explain the relationship between LCA and mixture models.
34. Given a table of observed frequencies for binary variables, estimate the probability of each latent class under a
specified model.
35. For a dataset with two latent classes, calculate the expected probabilities for each observed combination of variables
under a specified model.

Mixture Models for Mixed Mode Data


36. What are mixture models for mixed mode data? Provide an example of their application.
37. Discuss the challenges in modeling data that contain both categorical and continuous variables.
38. Describe how the EM algorithm can be adapted for mixed-mode data.
39. A dataset has mixed categorical and continuous variables. Given initial parameters, perform one step of the EM
algorithm for a two-component mixture model.
40. For a mixed dataset, compute the likelihood of each point belonging to each component of a Gaussian-categorical
mixture model, given the initial parameters.

long answer
1. Density Search Clustering Techniques:
Given the following dataset of points in a two-dimensional space: (1, 2), (1, 4), (1, 0), (2, 2), (2, 3), (3, 3), apply the
DBSCAN algorithm with ϵ = 1 and a minimum number of points minP ts = 2. Identify the clusters formed.
2. Clustering with Constraints:
Given a set of data points in a 2D space and a set of must-link and cannot-link constraints, describe how you would
modify the K-means algorithm to incorporate these constraints. Provide a hypothetical dataset and the modified
algorithm steps.
3. Fuzzy Clustering:
Given a dataset of three points: (1, 1), (1, 2), (2, 1) and c = 2 clusters. Calculate the membership values using the
fuzzy C-means algorithm. Assume a fuzziness parameter m = 2.
4. Optimization Clustering Techniques:
Given the following data points: (1, 1), (1, 2), (1, 3), (10, 10), find the optimal clustering solution using K-means with
k = 2. Show your calculations for the centroid updates and final clusters.
5. Discrete Data Clustering:
Consider a dataset with the following discrete categorical attributes for 5 observations:
• Observation 1: (A, B)
• Observation 2: (A, C)
• Observation 3: (B, C)
• Observation 4: (B, D)

9
• Observation 5: (C, D)
Apply the K-medoids algorithm with k = 2 and show the medoids and clusters formed.
6. Mixture for Categorical Data:
Suppose you have a dataset of categorical data with the following two features: Color (Red, Blue) and Shape (Circle,
Square). Create a simple mixture model for this data and calculate the probabilities for each category under the
mixture model.
7. Latent Class Analysis:
You have responses from 100 individuals on a questionnaire with binary outcomes (Yes/No). Construct a hypo-
thetical dataset and perform latent class analysis to identify the number of latent classes. Calculate the class
probabilities and the expected frequencies for each class.
8. Mixture Models for Mixed Mode Data:
Given a dataset consisting of both continuous (age, income) and categorical (gender, occupation) features, describe
how you would implement a mixture model to cluster this data. Provide a numerical example to illustrate the
clustering process.

10
Short Answer
1. Define a finite mixture model and explain its purpose in cluster analysis.
2. What are the main advantages of using finite mixture densities in clustering?
3. Describe the key assumptions underlying finite mixture models in clustering.
4. What is the purpose of inference in finite mixture models?
5. Explain the difference between parameter estimation and inference in finite mixture models.
6. Describe how model selection is performed in finite mixture models.
7. List the common methods for estimating parameters in finite mixture models.
8. What is the role of the likelihood function in estimating finite mixture models?
9. Briefly explain the concept of the latent variable in the context of finite mixture models.
10. What is the purpose of the Expectation-Maximization (EM) algorithm in finite mixture models?
11. Describe the steps of the EM algorithm in the context of finite mixture models.
12. Explain why the EM algorithm is suitable for maximum likelihood estimation in mixture models.
13. Define the maximum likelihood estimation for mixtures of multivariate normal densities.
14. Describe the parameters involved in a multivariate normal mixture model.
15. Explain why multivariate normal mixtures are commonly used in clustering.
16. What is non-Gaussian model-based clustering?
17. Explain one advantage of using non-Gaussian models in clustering over Gaussian models.
18. Give an example of a non-Gaussian distribution that can be used for model-based clustering.
19. For a given data set with three clusters, calculate the finite mixture density given the following Gaussian distribu-
tions:
• Cluster 1: Mean = 2, Variance = 1, Weight = 0.3
• Cluster 2: Mean = 5, Variance = 1.5, Weight = 0.5
• Cluster 3: Mean = 8, Variance = 2, Weight = 0.2
Find the mixture density for x = 4.
20. Given a mixture model with two Gaussian components, N (µ1 = 0, σ12 = 1) and N (µ2 = 5, σ22 = 2), with weights
0.4 and 0.6 respectively, calculate the probability of data point x = 3 belonging to each component.
21. Given a mixture of two Gaussian distributions with weights 0.6 and 0.4, means 1 and 3, and variances 1.5 and 2
respectively, estimate the mean of the entire mixture model.
22. For a data set with two clusters following Gaussian distributions N (µ1 , σ12 ) and N (µ2 , σ22 ), initialize an EM algorithm
by calculating the E-step using:
23. * Initial means µ1 = 1, µ2 = 4
24. Variances σ12 = 2, σ22 = 3
25. Weights w1 = 0.5, w2 = 0.5
Compute the responsibilities for data point x = 2.
26. Suppose a data set is generated from a mixture of two multivariate normal distributions with parameters:
 
1 0
• Component 1: Mean vector µ1 = [2, 3], Covariance matrix Σ1 = , Weight = 0.4
0 1
 
2 0.5
• Component 2: Mean vector µ2 = [5, 7], Covariance matrix Σ2 = , Weight = 0.6
0.5 1.5
Calculate the log-likelihood of the data point x = [3, 4].
27. In a mixture model with a Gaussian and a Poisson component, the Gaussian component has mean 3, variance 1,
and weight 0.7, while the Poisson component has mean 4 and weight 0.3. For a given data point x = 3, calculate
the probability density under the mixture model.

11
Long Answer
Finite Mixture Densities as Models for Cluster Analysis
1. A dataset consists of 100 points generated from a mixture of two Gaussian distributions with equal mixing proportions.
The first Gaussian component has mean µ1 = 5 and variance σ12 = 1, while the second has mean µ2 = 10 and variance
σ22 = 2.
(a) Write the probability density function for this mixture model.

(b) Calculate the likelihood of observing the data point x = 7 under this mixture model.

Inference in Finite Mixture Models


2. You have a mixture model with two normal distributions, where θ1 = (µ1 , σ12 ) and θ2 = (µ2 , σ22 ). The dataset given
includes the following points: x = {2, 3, 5, 8, 9}.
(a) Assume equal mixing proportions and set up the log-likelihood function for this dataset.
(b) Using initial values µ1 = 2 and µ2 = 8, compute the initial log-likelihood and update the estimates of µ1 and µ2
for one iteration using maximum likelihood estimation.

Estimation in Finite Mixture Models


3. Given a sample dataset x = {1.5, 2.0, 3.5, 4.0, 5.5} from a two-component mixture of Gaussian distributions with
unknown means µ1 and µ2 and variances σ12 = 1 and σ22 = 1.5, respectively:
(a) Assuming known mixing proportions of 0.6 and 0.4 for components 1 and 2, respectively, calculate the means µ1
and µ2 using the method of moments.
(b) Verify your solution by calculating the mixture distribution’s expected mean.

Likelihood Maximization via the Expectation-Maximization (EM) Algorithm


4. A sample dataset x = {3, 4, 5, 6, 7} is generated from a two-component Gaussian mixture with unknown means and
variances. Use the EM algorithm to estimate the means µ1 and µ2 and variances σ12 and σ22 , assuming equal mixing
proportions.
(a) Initialize with µ1 = 4, µ2 = 6, σ12 = 1, and σ22 = 1, and compute the E-step to find the expected membership
weights.
(b) Perform the M-step to update µ1 , µ2 , σ12 , and σ22 .

Maximum Likelihood Estimation of Mixtures of Multivariate Normal Densities


5. You are given a dataset with multivariate observations x1 = [2, 3], x2 = [3, 4], x3 = [5, 6] from a two-component
Gaussian mixture with unknown mean vectors µ1 , µ2 and covariance matrices Σ1 , Σ2 . Assume equal mixing proportions.
(a) Write down the log-likelihood function for this multivariate Gaussian mixture.
(b) With initial values µ1 = [2, 3] and µ2 = [5, 6] and identity matrices for Σ1 and Σ2 , calculate the log-likelihood for
the first iteration and provide an updated estimate for µ1 and µ2 .

Non-Gaussian Model-Based Clustering


6. A dataset x = {3, 6, 7, 9, 12} is assumed to follow a mixture of two Poisson distributions with rates λ1 and λ2 . Assume
equal mixing proportions and use the EM algorithm to estimate λ1 and λ2 .
(a) Initialize with λ1 = 3 and λ2 = 9, and perform the E-step to compute the expected assignment of each data point
to the components.
(b) Perform the M-step to update λ1 and λ2 for the next iteration.

12

You might also like