STAT3006: Tutorial 2

The document discusses generating random multivariate normal distributions and mixture distributions for simulation purposes. It provides methods for randomly generating valid correlation matrices and using them to simulate observations from a mixture of normal distributions with random parameters. Steps are outlined for analyzing the simulated mixture data, including clustering methods like K-means and mixture modeling to determine the optimal number of components.

Uploaded by

DomMMK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views3 pages

STAT3006: Tutorial 2

Uploaded by

DomMMK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

STAT3006

Tutorial 2
1. In introductory statistics courses, you would have come across rules to explain
what happens to the mean and variance under linear combinations of univariate
random variables.
You have likely seen the following.
Given random variables X1 and X2 and constants a and b, consider the linear
combination,
Y = aX1 + bX2 .

The mean of the linear combination is the linear combination of the means,

E(Y ) = aE(X1 ) + bE(X2 )

The variance of the linear combination is as follows:

Var(Y ) = a2 Var(X1 ) + b2 Var(X2 ) + 2 a b Cov(X1 , X2 ).

Derive corresponding expressions for the mean and variance of linear combina-
tions of arbitrary p-dimensional random variables (assume dependent): A X + B Y .
Hints: let Z = (X T Y T )T , consider a suitable matrix C and the properties of
C Z. Assume E(X) = µX , E(Y ) = µY , Cov(X, X) = ΣXX , Cov(Y, Y ) = ΣY Y ,
Cov(X, Y ) = ΣXY .

2. Random mixture distributions

It is sometimes useful to simulate data from a multivariate normal distribution,
e.g. as part of simulating from a mixture distribution. However, if one wants to
simulate data from a randomly chosen but valid multivariate normal distribu-
tion, this is more challenging.
Bayesian statistics provides one possible answer with a variety of prior distribu-
tions available for e.g. proportions, means and covariance matrix. Particular sets
of parameters can be sampled from these distributions, which are sufficient to
describe a randomly-generated multivariate normal distribution.
Here the usual choices of prior distribution would be a Dirichlet distribution
for proportions, a normal distribution with high variances for the means and
a Wishart or inverse-Wishart distribution for the covariance matrices. Choosing
hyper-parameters for the last of these can be difficult.
One alternative is to produce a random covariance matrix by filling it with ran-
dom values and then correcting it so that it forms a valid covariance matrix. We
know that it has to be symmetric. That’s easily achieved by only filling in the
diagonal and upper triangle of the matrix and then copying the upper triangle
to the lower triangle. The other requirement is that the matrix be positive semi-
definite, but we will assume that it needs to be positive definite to be much use.

1
So that we don’t have to think about scaling of one variable relative to another,
it is often desirable to make the covariance matrix a correlation matrix. That is,
have all the variance terms equal to 1, or have all the diagonal entries equal to
1. Then we could fill in all the upper triangular terms with, for example, values
drawn from a standard normal distribution N(0, 1). We could reject any values
with magnitude above 1 since these cannot be part of a correlation matrix. We
could also draw from the continuous uniform distribution U[−1, 1]. Having then
copied these values to the lower triangle, we would have a symmetric matrix, but
one with a low chance of being a valid correlation or covariance matrix.
However, this can be corrected, albeit at the expense of not knowing the distri-
bution of the resulting matrices or their elements. Assume that the symmetric
matrix produced so far is B. This matrix will have eigenvalues and eigenvectors.
λ is an eigenvalue of B and v is an eigenvector of B if

Bv = λv

A statement if thus type is true for every eigenvalue.

Now consider the eigenvalues of B + aI, a > 0.

(B + aI)v = Bv + av = λv + av = (λ + a)v

So (λ + a) is an eigenvalue of B + aI, with v being the corresponding eigenvector.

For B to be positive definite, all of its eigenvalues must be positive. If that’s al-
ready true for a randomly generated symmetric matrix, then we can just use that
as a covariance matrix. If not, we look at the smallest eigenvalue, λ1 , and replace
B with B + (−λ1 + ϵ)I, where ϵ > 0 is an arbitrary small value. The resulting
symmetric matrix will have all positive eigenvalues and hence be positive defi-
nite and suitable as a covariance matrix.
Note that the set of eigenvalues of B + (−λ1 + ϵ)I are those of B, but with each
increased by (−λ1 + ϵ). The eigenvectors are common to B.
The particular attraction of this method is that any zeroes which one desires to
put in the covariance matrix (or inverse covariance matrix, if generating that in-
stead), remain after the eigenvalue ”correction”.
Note that it is easy to convert this into a correlation matrix if desired using the
conversion equation (1.17) in the lecture notes, implemented in R via cov2cor().
In the following, you will use this method as part of simulating from a mixture
distribution.

(a) Randomly generate three valid 3×3 correlation matrices. These will be used
as the covariance matrices of three components in a mixture.
(b) Draw three random three-dimensional vectors where each element is drawn
from a standard normal distribution. These will be the means of the three
components.

2
(c) Draw one observation from a Dirichlet distribution to use as the mixture
proportions. The Dirichlet distribution is the multivariate analogue of the
beta distribution - it produces a set of values which sum to 1. The following
code should work.

install.packages("MCMCpack")
library(MCMCpack)
props = rdirichlet(1, c(1,1,1) )

(d) Draw 100 observations from this mixture distribution.

(e) Plot two contours of the marginal distributions of each mixture component
in each possible pair of dimensions. Make sure to use the proportions in
doing this, and use the same weighted density contour levels for each com-
ponent.
(f) Choose the two clusters which seem to be closest together. For these, con-
duct a Hotelling’s T 2 test (assuming common covariance matrices) to com-
pare their means. Use R to determine the sample means and common sam-
ple covariance matrix and calculate the F statistic. Show all working.
(g) Try K-means clustering of the data with K varying from 1 to 5.
(h) Use the Gap statistic method to try to choose the optimal number of clusters.
(i) Try mixture model clustering of the data with 1 to 5 clusters.
(j) Use BIC to try to choose the optimal number of clusters for the mixture.

Random Vectors 1
No ratings yet
Random Vectors 1
8 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Covariance Matrix
No ratings yet
Covariance Matrix
14 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
Hiller - Dynamic Programming PDF
No ratings yet
Hiller - Dynamic Programming PDF
6 pages
Pratt Truss Technical Drawing
No ratings yet
Pratt Truss Technical Drawing
8 pages
Matlab Code For Ecg
No ratings yet
Matlab Code For Ecg
5 pages
Chapter 10 PERT-CPM PDF
100% (1)
Chapter 10 PERT-CPM PDF
24 pages
MVA Part I
No ratings yet
MVA Part I
39 pages
M&C Questions
No ratings yet
M&C Questions
7 pages
Unit - II Data Analysis
No ratings yet
Unit - II Data Analysis
49 pages
PSF - Week8 - Samp - pdf-BIVARIATE NORMAL DISTRIBUTION
No ratings yet
PSF - Week8 - Samp - pdf-BIVARIATE NORMAL DISTRIBUTION
25 pages
Final 2
No ratings yet
Final 2
85 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Huang and Wand 2013
No ratings yet
Huang and Wand 2013
14 pages
Murat Hamutcuoglu
No ratings yet
Murat Hamutcuoglu
1 page
Paramest PDF
No ratings yet
Paramest PDF
37 pages
Applied Multivariate Analysis: Frequently Asked Questions
No ratings yet
Applied Multivariate Analysis: Frequently Asked Questions
7 pages
Example Problems
No ratings yet
Example Problems
10 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Sst304 Lesson 1
No ratings yet
Sst304 Lesson 1
8 pages
Multivariate Material
No ratings yet
Multivariate Material
58 pages
Presented By:-Manjot Singh Bilkhu (20135100) Shashank Bharadwaj (20135032) Shailendra Azad (20135159) Guided By: - Prof. V.K.Srivastava
No ratings yet
Presented By:-Manjot Singh Bilkhu (20135100) Shashank Bharadwaj (20135032) Shailendra Azad (20135159) Guided By: - Prof. V.K.Srivastava
23 pages
Block 3 L1. Normal Distribution (Intro)
No ratings yet
Block 3 L1. Normal Distribution (Intro)
9 pages
1.12.2024-BSC-301-CSBS-class Note - 2024-25
No ratings yet
1.12.2024-BSC-301-CSBS-class Note - 2024-25
58 pages
Simple Algebra
No ratings yet
Simple Algebra
5 pages
Stat
No ratings yet
Stat
53 pages
The Multivariate Normal Distribution: Exactly Central Limit
No ratings yet
The Multivariate Normal Distribution: Exactly Central Limit
59 pages
Multivariate Normal - Chi Square
No ratings yet
Multivariate Normal - Chi Square
19 pages
Lesson 10
No ratings yet
Lesson 10
68 pages
Tut2 Questions
No ratings yet
Tut2 Questions
3 pages
Capitulo 1 Rencher
No ratings yet
Capitulo 1 Rencher
19 pages
Prs l5
No ratings yet
Prs l5
24 pages
Data Science Roadmap
No ratings yet
Data Science Roadmap
4 pages
AI and DAA Practical Record
No ratings yet
AI and DAA Practical Record
40 pages
Topic 3 Multivariate Models I (Week 2)
No ratings yet
Topic 3 Multivariate Models I (Week 2)
27 pages
Full ml-2
No ratings yet
Full ml-2
1 page
Mobile Device Training Strategies in Federated Learning: An Evolutionary Game Approach
No ratings yet
Mobile Device Training Strategies in Federated Learning: An Evolutionary Game Approach
6 pages
MA 214 Lecture 5
No ratings yet
MA 214 Lecture 5
123 pages
Random Vectors:: A Random Vector Is A Column Vector Whose Elements Are Random Variables
No ratings yet
Random Vectors:: A Random Vector Is A Column Vector Whose Elements Are Random Variables
7 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Intro Class PDF
No ratings yet
Intro Class PDF
7 pages
HASTS215 - HSTS215 NOTES Chapter1 - 2
No ratings yet
HASTS215 - HSTS215 NOTES Chapter1 - 2
24 pages
Econometrics
No ratings yet
Econometrics
9 pages
Burda@th - If.uj - Edu.pl Atg@th - If.uj - Edu.pl Corresponding Author: Bwaclaw@th - If.uj - Edu.pl
No ratings yet
Burda@th - If.uj - Edu.pl Atg@th - If.uj - Edu.pl Corresponding Author: Bwaclaw@th - If.uj - Edu.pl
9 pages
Stat 1
No ratings yet
Stat 1
6 pages
Sst414-Leson 1
No ratings yet
Sst414-Leson 1
6 pages
Cs8792 Unit 2 Notes
No ratings yet
Cs8792 Unit 2 Notes
65 pages
SVM Lab.7
No ratings yet
SVM Lab.7
4 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Sta 809 A
No ratings yet
Sta 809 A
58 pages
A E T S F E: Pplied Conometric IME Eries Ourth Dition
No ratings yet
A E T S F E: Pplied Conometric IME Eries Ourth Dition
43 pages
Multi Varia Da 1
No ratings yet
Multi Varia Da 1
59 pages
Random Vectors
No ratings yet
Random Vectors
9 pages
The Mvtnorm Package: R Topics Documented
No ratings yet
The Mvtnorm Package: R Topics Documented
12 pages
Note 1
No ratings yet
Note 1
5 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
Name Course & Year: BSMA-2 Date: Module 6: Forecasting (Exercise Problems)
No ratings yet
Name Course & Year: BSMA-2 Date: Module 6: Forecasting (Exercise Problems)
8 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Ai Fundamentals Final Quiz Source by Ate Zein
No ratings yet
Ai Fundamentals Final Quiz Source by Ate Zein
25 pages
Unit 19
No ratings yet
Unit 19
16 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Dynamics Solved Problems
0% (1)
Dynamics Solved Problems
7 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Murphy Gaussians
No ratings yet
Murphy Gaussians
15 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
A Soil-Structure Interaction Procedure For The Design of Bridges On Drilled Shafts - SEI Congress 2018 - Matos, Kimmle
No ratings yet
A Soil-Structure Interaction Procedure For The Design of Bridges On Drilled Shafts - SEI Congress 2018 - Matos, Kimmle
14 pages
8.1 Procedures and Functions UPDATED (MT-L)
No ratings yet
8.1 Procedures and Functions UPDATED (MT-L)
9 pages
Random Vectors and Multivariate Normal Distribution
No ratings yet
Random Vectors and Multivariate Normal Distribution
6 pages
A Nonlocal Bayesian Image Denoising Algorithm: M. Lebrun, A. Buades, J. M. Morel
No ratings yet
A Nonlocal Bayesian Image Denoising Algorithm: M. Lebrun, A. Buades, J. M. Morel
24 pages
Activity in English III Court
No ratings yet
Activity in English III Court
3 pages
AOD (Practice Assignment)
No ratings yet
AOD (Practice Assignment)
4 pages
Management Science
No ratings yet
Management Science
36 pages
Generate Two Correlated Noise
No ratings yet
Generate Two Correlated Noise
6 pages
STAT3006: Tutorial 1: Sample Solutions
No ratings yet
STAT3006: Tutorial 1: Sample Solutions
10 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Linear Algebra Quiz
No ratings yet
Linear Algebra Quiz
4 pages
Lab PDF
No ratings yet
Lab PDF
11 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Assignment 1 - MAB261: Figure 1: Random Walk Plot
No ratings yet
Assignment 1 - MAB261: Figure 1: Random Walk Plot
2 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Math3302 Fa2021 Syllabus
No ratings yet
Math3302 Fa2021 Syllabus
4 pages
Algorithms and Data Structures-Searching Algorithms
No ratings yet
Algorithms and Data Structures-Searching Algorithms
15 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
1) Common Univariate Summaries: I) I) Iii) I) Ii)
No ratings yet
1) Common Univariate Summaries: I) I) Iii) I) Ii)
5 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Package Mvtnorm': R Topics Documented
No ratings yet
Package Mvtnorm': R Topics Documented
17 pages
CH 00
No ratings yet
CH 00
4 pages
Lecture 20
No ratings yet
Lecture 20
18 pages
Real Variables with Basic Metric Space Topology
From Everand
Real Variables with Basic Metric Space Topology
Robert B. Ash
5/5 (1)

STAT3006: Tutorial 2

Uploaded by

STAT3006: Tutorial 2

Uploaded by

STAT3006

E(Y ) = aE(X1 ) + bE(X2 )

The variance of the linear combination is as follows:

Var(Y ) = a2 Var(X1 ) + b2 Var(X2 ) + 2 a b Cov(X1 , X2 ).

2. Random mixture distributions

A statement if thus type is true for every eigenvalue.

So (λ + a) is an eigenvalue of B + aI, with v being the corresponding eigenvector.

(d) Draw 100 observations from this mixture distribution.

You might also like