0% found this document useful (0 votes)
169 views2 pages

ST4250 23S1 Assignment 2

This document contains instructions for an assignment on multivariate data analysis. It provides 4 questions analyzing bivariate and multivariate normal distributions, including determining distributions of linear combinations of variables, performing hypothesis tests on mean vectors and covariance matrices, and constructing confidence intervals. Students are instructed to show all working and submit solutions by the given deadline as a PDF file.

Uploaded by

Loo Guan Yee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views2 pages

ST4250 23S1 Assignment 2

This document contains instructions for an assignment on multivariate data analysis. It provides 4 questions analyzing bivariate and multivariate normal distributions, including determining distributions of linear combinations of variables, performing hypothesis tests on mean vectors and covariance matrices, and constructing confidence intervals. Students are instructed to show all working and submit solutions by the given deadline as a PDF file.

Uploaded by

Loo Guan Yee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

NATIONAL UNIVERSITY OF SINGAPORE

Department of Statistics and Data Science

2023/24 Semester 1 ST4250 Multivariate Data Analysis Assignment 2


Submit a soft copy of your solution in pdf format with filename “name student_number
Assign2.pdf” (e.g. Tan Ah Teck A1234567B Assign2.pdf) to the tutorial submission folder in
Canvas before 11:59 pm on 12/10/23 (Thursday). Late submission and photos (e.g. jpg files)
will not be accepted. You may convert your hard copy solution into a pdf file using any scan
Apps. Show all your workings leading to the answers or conclusions. Please be reminded
that copy others’ solutions is not allowed.

Question 1
Let 𝑋𝑋 = (𝑋𝑋1 𝑋𝑋2 )𝑇𝑇 and 𝑌𝑌 = (𝑌𝑌1 𝑌𝑌2 )𝑇𝑇 . Suppose that
2 2 1
𝑋𝑋 ~ 𝑁𝑁2 �� � , � ��
1 1 2
and
2𝑋𝑋 + 1 1 0
𝑌𝑌| 𝑋𝑋 ~𝑁𝑁2 �� 1 �,� ��.
𝑋𝑋1 − 𝑋𝑋2 0 2
𝑋𝑋
(a) Let 𝑉𝑉 = (𝑋𝑋1 𝑋𝑋2 𝑌𝑌1 𝑌𝑌2 )𝑇𝑇 = � �. What is the distribution of 𝑉𝑉?
𝑌𝑌
(b) What is the distribution of 𝑌𝑌? Hence determine the conditional distribution of 𝑌𝑌2 |𝑌𝑌1.
(c) Let 𝑊𝑊 = 𝑋𝑋 − 𝑌𝑌. Find a matrix B such that 𝑊𝑊 = 𝐵𝐵𝑉𝑉. Hence determine the distribution
of 𝑊𝑊.

Question 2
Energy consumption in 2001, by state, from the major sources
𝑥𝑥1 = petroleum 𝑥𝑥2 = natural gas 𝑥𝑥3 = coal
𝑥𝑥4 = hydroelectric power 𝑥𝑥5 = nuclear electric power
15
are recorded in quadrillions (10 ) of BTUs (Source: Statistical Abstract of the United States
2006)
A random sample of 20 states is selected. The resulting mean and covariance matrix are
0.861 1.365 1.116 0.346 −0.006 0.095
⎛ 0.532 ⎞ ⎛ 1.116 0.937 0.271 −0.010 0.073 ⎞
𝑥𝑥̅ = ⎜0.414⎟ , 𝑆𝑆 = ⎜ 0.346 0.271 0.202 −0.008 0.033 ⎟
0.050 −0.006 −0.010 −0.008 0.015 −0.001
⎝0.137⎠ ⎝ 0.095 0.073 0.033 −0.001 0.018 ⎠
(a) Using the summary statistics, determine the sample mean and variance of a state’s total
energy consumption for these major sources.
(b) Determine the sample mean and variance of the excess of petroleum consumption over
natural gas consumption. Also find the sample covariance of this variable with the total
variable in part (a)

1
Question 3
Let 𝑋𝑋 ~ 𝑁𝑁2 �𝜇𝜇, Σ�, where Σ is unknown. We have an independent and identically distributed
sample of size 𝑛𝑛 = 60 providing a sample mean vector 𝑥𝑥̅ = (0.8 0.4 )𝑇𝑇 and a sample
2 −1
covariance matrix 𝑆𝑆 = � �.
−1 2
1 1
(a) Test H0: 𝜇𝜇 = � � against H1: 𝜇𝜇 ≠ � � at the 5% significance level.
0.5 0.5
1 0
(b) Test H0: Σ = Σ0 against H1: Σ ≠ Σ0 , where Σ0 = � �, at the 5% significance level.
0 2

Question 4
Use the following R program to generate a data set of 100 pairs of observations from a
bivariate normal distribution.

> library(mvtnorm)
> mu1 <- c(3,1)
> si1 <- matrix(c(4,2,2,9), ncol=2)
> set.seed(4250)
> simunorm <- rmvnorm(100, mean = mu1, sigma = si1)
> simunorm <- as.data.frame(simunorm)

The function “rmvnorm” directly generates realizations from the multivariate normal
distribution. The object "simunorm " is therefore a 100 × 2 data frame, containing 𝑛𝑛 = 100
independent realizations from a bivariate normal distribution with mean “mu1” and variance
matrix “si1”

Reproduce the object "simunorm " by using the R code given above. In the following
questions, we shall treat "simunorm " as if it contains the observed data.

(a) Using the function “apply(simunorm, 2, mean)” and “var” to get the sample mean
𝑥𝑥̅ and the sample variance, S and hence compute
𝑇𝑇
𝑇𝑇 2 = 𝑛𝑛 �𝑥𝑥̅ − 𝜇𝜇0 � 𝑆𝑆 −1 �𝑥𝑥̅ − 𝜇𝜇0 �,
3
where 𝜇𝜇0 = � �.
1
3 3
(b) Test H0: 𝜇𝜇 = � � against H1: 𝜇𝜇 ≠ � � at the 5% significance level.
1 1
(c) Find 95% simultaneous confidence intervals for 𝜇𝜇𝑖𝑖 , 𝑖𝑖 = 1, 2.

You might also like