HW 4
HW 4
HW 4
Victor Wei,Yutong Wang,Erica Hwang
3/13/2022
Q1
leukemia_data <- read_csv("leukemia_data.csv")
## New names:
## * FCGRT -> FCGRT...2
## * FCGRT -> FCGRT...3
## * PGK1 -> PGK1...6
## * GUSBP11 -> GUSBP11...19
## * VDAC1 -> VDAC1...21
## * ...
1a.
leukemia_data <- leukemia_data %>% mutate(Type=as.factor(Type))
Print the number of patients with each leukemia sub-type by using table().
file:///D:/HW4/HW4.html 1/12
11/7/24, 4:45 PM HW 4
table(leukemia_data$Type)
##
## BCR-ABL E2A-PBX1 Hyperdip50 MLL OTHERS T-ALL TEL-AML1
## 15 27 64 20 79 43 79
Based on the table, the Leukemia sub-type “BCR-ABL” has the least occurance among all sub-types.
1b.
Running PCA on our leukemia results gives us:
file:///D:/HW4/HW4.html 2/12
11/7/24, 4:45 PM HW 4
file:///D:/HW4/HW4.html 3/12
11/7/24, 4:45 PM HW 4
concluded that we will need 201 PCs in order to explain 90% of the total variation in the data.
cumsum(PVE)[200:205]
1c.
Generate a scatter plot using plot():
file:///D:/HW4/HW4.html 4/12
11/7/24, 4:45 PM HW 4
biplot(PCA, scale=0,col=plot_colors,cex=0.5)
file:///D:/HW4/HW4.html 5/12
11/7/24, 4:45 PM HW 4
Then, we add type labels to the plot accordign to our leukemia data set.
file:///D:/HW4/HW4.html 6/12
11/7/24, 4:45 PM HW 4
that “T-ALL” is the most clearly separated from the others along the PC2 axis.
file:///D:/HW4/HW4.html 7/12
11/7/24, 4:45 PM HW 4
1d.
plot(PCA$x[,1],PCA$x[,3],col=plot_colors,cex=0.5,xlab="PC1",ylab="PC3")
legend( x = "topleft",
legend = colors_df_u$type,
col = colors_df_u$color, lwd = 2, lty = c(0,0),
pch = c(17,19) )
file:///D:/HW4/HW4.html 8/12
11/7/24, 4:45 PM HW 4
concluded that the third PC performs better when discriminating between leukemia types by plotting the data projected onto the first and third
principal components, but not the second.
1e.
By using the filter() function, we generate a subset of our data set that only includes “T-ALL”,“TEL-AML1”, and “Hyperdip50”.
file:///D:/HW4/HW4.html 9/12
11/7/24, 4:45 PM HW 4
file:///D:/HW4/HW4.html 10/12
11/7/24, 4:45 PM HW 4
Same plot, but color all the branches and labels to have 5 different groups
file:///D:/HW4/HW4.html 11/12
11/7/24, 4:45 PM HW 4
file:///D:/HW4/HW4.html 12/12