0% found this document useful (0 votes)
47 views

Assignment1 Solved

This document contains the student Nisarg Mistry's answers to 10 questions regarding multivariate statistical analysis of pastry data. The questions cover calculating mean and scaling vectors, drawing scatter plots before and after preprocessing data, conducting principal component analysis to determine R^2 values, and interpreting loading vectors and component scores.

Uploaded by

Akash Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Assignment1 Solved

This document contains the student Nisarg Mistry's answers to 10 questions regarding multivariate statistical analysis of pastry data. The questions cover calculating mean and scaling vectors, drawing scatter plots before and after preprocessing data, conducting principal component analysis to determine R^2 values, and interpreting loading vectors and component scores.

Uploaded by

Akash Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

NAME: Nisarg Mistry, 400486175

SEP767: Multivariate Statistical Methods for Big Data


Analysis and Process Improvement
ASSIGNMENT 1

1. Calculate the mean centering vector (a 5 x 1 vector).


Answer: Take an average of all 50 observations of each variable:
17.202
2857.6
Mean = 11.52
20.86
[128.18]
Now, calculate the mean centering vector by subtracting the value of the mean from each
observation of each variable and then take the average of those 50 observations of each
variable:
−1.77636𝐸 − 15
9.09𝐸 − 14
Mean Centering = 4.26𝐸 − 16
5.68𝐸 − 16
[ −6.8𝐸 − 15 ]

2. Calculate the scaling vector (a 5 x 1 vector).


Answer: To get the scaling vector, divide means centering value of each variable by their
respective standard deviation:
−1.1158𝐸 − 15
7.31𝐸 − 16
Scaling Vector = 2.4𝐸 − 16
1.04𝐸 − 16
[ −2.2𝐸 − 16 ]

3. What steps you would take to apply the centering and scaling vectors to the X matrix?
Answer: Follow the steps below to apply to center and scaling vectors for the X matrix:
1) First of all, take the mean of all 50 observations of each variable.
2) Also, calculate the standard deviation of all 50 observations of each variable (oil, density,
crispy, fracture, hardness)
3) Now, for mean centering, subtract the respective mean of each variable from their
respective observations and take the average of those observations.
4) Then after, for the scaling vector, divide the mean centering by the standard deviation.

4. Draw a scatter plot of Crispy vs. Fracture using all 50 observations from the raw data table.
Answer:
5. Draw a scatter plot of Crispy vs. Fracture after you have centered and scaled the data.
What observations can you make comparing the two scatter plots?
Answer:

By observing the two scatter plots, we can see that the values of both the variables (Crispy
and Fracture) after pre-processing have got numerically stable i.e. values can be easily
visualized.

6. Use Aspen ProMV (or a software tool of your choice) to construct a PCA model on this data.
What is the R2 for the first and second components? What is the total R2 using 2 components?
Answer:
R2 for 1st component 0.606
R2 for 2nd component 0.865
Total 1.471
7. Report the R2 value for each of the 5 variables after (a) one component and (b) two-
component.
Answer: (a) One component
Oil: 0.634545
Density: 0.694746
Crispy: 0.859157
Fracture: 0.771434
Hardness: 0.071332

(b) Two-component
Oil: 0.177803
Density: 0.164905
Crispy: 0.050623
Fracture: 0.063421
Hardness: 0.838953

8. Write down the values of the p1 loading vector. Also, create a bar plot of these values.
Answer: p1 values:
Oil: 0.457533
Density: -0.47875
Crispy: 0.532388
Fracture: -0.50448
Hardness: 0.153403

0.457533
−0.47875
P1 loading vector = 0.532388
−0.50448
[ 0.153403 ]
9. What are the characteristics of pastries with a large negative t1 value?
Answer: It is observed that the pre-processed values of density and hardness are high
compared to oil, crispy, and fracture values. Also, the centered value of oil is negative due to
which the product results into negative t1 value.

10. Replicate the calculation of t1 for pastry B554. Show each of the 5 terms that make up
this linear combination.
Answer:
0.457533
−0.47875
P1 loading vector = 0.532388
−0.50448
[ 0.153403 ]

B554 -1.5716 0.983133 -0.856062 -0.15733 0.154847

After multiplying each centered value by the loading vector by using the formula, T = X*P,
T1 = [(-1.5716)*(0.457533)] + [(0.983133)*(-0.47875)] + [(-0.856062)*(0.532388)] + [(-
0.15733)*(-0.50448)] + [(0.154847)*(0.153403)] = -1.54236

T1 = -1.54236

You might also like