0% found this document useful (0 votes)
168 views

R Programming

This homework assignment asks students to: 1) Perform principal component analysis (PCA) and clustering on personality data and interpret the results. 2) Examine factor loadings and eigenvalues to determine the number of factors to retain from the PCA. 3) Compare k-means and hierarchical clustering results to interpret personality clusters. 4) Analyze and interpret the factor and cluster solutions to understand patterns in the personality data.

Uploaded by

whereareyoudear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views

R Programming

This homework assignment asks students to: 1) Perform principal component analysis (PCA) and clustering on personality data and interpret the results. 2) Examine factor loadings and eigenvalues to determine the number of factors to retain from the PCA. 3) Compare k-means and hierarchical clustering results to interpret personality clusters. 4) Analyze and interpret the factor and cluster solutions to understand patterns in the personality data.

Uploaded by

whereareyoudear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Introduction to Computational Statistics (DSSH 6301),

Homework 11, Assignments


For all problems, please show all your work. As described in the Homework Guidelines, use RMarkdown to
write up your work as a .Rmd file, knit the result to a PDF file, and submit that PDF file to Blackboard.
(You can also knit to an HTML or Word document and save that as a PDF, as decribed in the Homework
Guidelines. Be sure to use R code for all your calculations, and the LATEX equation format to write up
any math. See the Homework Guidelines in Course Resources on Blackboard for more formatting details.
NOTES: include question number and text in your solutions, provide solutions in the same
order as given in the assignment some problems may have multiple solutions. If you did not have
points taken o for a question, we considered your solution correct be concise use minimum necessary
number of signigficant digits; round() and signif() are your friends follow Googles R Style Guide regex
sandbox R: r-project R-intro short-intro refcard formatR RStudio RMarkdown RevolutionR
swirl LATEX: latex-project wiki sheet sandbox hostmath knitr::kable tables generator detexify
tcolorbox
Please perform a principal component analysis (PCA) and clustering using bfi data set in the psych package
(25 personality items thought to boil down to a few core personality types) and interpret the results. You
can load the data using, data(bfi) after loading the psych package; you may need to clean it a bit first
with na.omit() to remove the observations with NA items, or else impute those missing items. It might also
help to use scale() on your dataset before analysis. scale() takes all your variables (columns) and rescales
them to have a mean of 0 and a sd of 1, so that you can more easily compare all your principal components
or clusters to see which are larger or smaller.
1. Examine the factor eigenvalues or variances (or the sdev or standard deviations as reported by prcomp or
princomp, which you then need to square to get the variances). Plot these in a scree plot and use the elbow
test to guess how many factors should be retained. What proportion of the total variance does your subset of
variables explain?
2. Examine the loadings of the PCs on the variables (sometimes called the rotation in the function
output) - ie, the projection of the principal components on the variables - focusing on just the first one or
two PCs. Sort the variables by their loadings, and try to interpret what the first one or two PCs factors
mean. This may require looking more carefully into the dataset to understand exactly what each of the
variables were measuring. You can find more about the data in the psych package using ?psych or visiting
http://personality-project.org
3. First use k-means and examine the centers of the first two or three clusters. How are they similar to and
dierent from the factor loadings of the first couple factors?
4. Next use hierarchical clustering. Print the dendrogram, and use that to guide your choice of the number
of clusters. Use cutree to generate a list of which clusters each observation belongs to. Aggregate the data
by cluster and then examine those centers (the aggregate means) as you did in (3). Can you interpret all of
them meaningfully using the methods from (3) to look at the centers?
5. From the factor and cluster analysis, what can you say more generally about what you have learned about
your data?

You might also like