0% found this document useful (0 votes)

12 views12 pages

HW 4

Uploaded by

tommyhi1234567

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

HW 4

Uploaded by

tommyhi1234567

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

11/7/24, 4:45 PM HW 4

HW 4
Victor Wei,Yutong Wang,Erica Hwang
3/13/2022

Q1
leukemia_data <- read_csv("leukemia_data.csv")

## New names:
## * FCGRT -> FCGRT...2
## * FCGRT -> FCGRT...3
## * PGK1 -> PGK1...6
## * GUSBP11 -> GUSBP11...19
## * VDAC1 -> VDAC1...21
## * ...

## Rows: 327 Columns: 3142

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Type
## dbl (3141): FCGRT...2, FCGRT...3, 31444_s_at, TMSB10, PGK1...6, EIF3K, 31503...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

1a.
leukemia_data <- leukemia_data %>% mutate(Type=as.factor(Type))

Print the number of patients with each leukemia sub-type by using table().

file:///D:/HW4/HW4.html 1/12
11/7/24, 4:45 PM HW 4

table(leukemia_data$Type)

##
## BCR-ABL E2A-PBX1 Hyperdip50 MLL OTHERS T-ALL TEL-AML1
## 15 27 64 20 79 43 79

Based on the table, the Leukemia sub-type “BCR-ABL” has the least occurance among all sub-types.

1b.
Running PCA on our leukemia results gives us:

PCA <- prcomp(leukemia_data[,-c(1)],center=TRUE,scale=TRUE)

Plot the PVE of each PC and the cumulative PVE.

PVE <- PCA$sdev^2 / sum(PCA$sdev^2)

plot(PVE,xlab="Principal Component",ylab="Proportion of Variance Explained ",type='b')

file:///D:/HW4/HW4.html 2/12
11/7/24, 4:45 PM HW 4

plot(cumsum(PVE),xlab="Principal Component ",

ylab=" Cumulative Proportion of Variance Explained ", ylim=c(0,1), type='b')

file:///D:/HW4/HW4.html 3/12
11/7/24, 4:45 PM HW 4

From the outputed result, it can be

concluded that we will need 201 PCs in order to explain 90% of the total variation in the data.

cumsum(PVE)[200:205]

## [1] 0.8990880 0.9002490 0.9013977 0.9025446 0.9036898 0.9048285

1c.
Generate a scatter plot using plot():

file:///D:/HW4/HW4.html 4/12
11/7/24, 4:45 PM HW 4

colors <- rainbow(7)

plot_colors <- colors[leukemia_data$Type]
colors_df <- data.frame(color=plot_colors,type=leukemia_data$Type)
plot(PCA$x[,1],PCA$x[,2],col=plot_colors,cex=0.5,xlab="PC1",ylab="PC2")

biplot(PCA, scale=0,col=plot_colors,cex=0.5)

file:///D:/HW4/HW4.html 5/12
11/7/24, 4:45 PM HW 4

Then, we add type labels to the plot accordign to our leukemia data set.

colors_df_u <- unique(colors_df)

plot(PCA$x[,1],PCA$x[,2],col=plot_colors,cex=0.5,xlab="PC1",ylab="PC2")
legend(x = "topleft", legend = colors_df_u$type,
col = colors_df_u$color, lwd = 2, lty = c(0,0),
pch = c(17,19) )

file:///D:/HW4/HW4.html 6/12
11/7/24, 4:45 PM HW 4

From our result, it can be concluded

that “T-ALL” is the most clearly separated from the others along the PC2 axis.

To find genes with highest absolute loadings for PC1:

pc1_l <- as.data.frame(PCA$rotation[,1])

names(pc1_l) <- c("loading")
pc1_l$gene <- row.names(pc1_l)
pc1_l$abs_loading <- abs(pc1_l$loading)
pc1_l %>% arrange(desc(abs_loading)) %>% head()

file:///D:/HW4/HW4.html 7/12
11/7/24, 4:45 PM HW 4

## loading gene abs_loading

## SEMA3F -0.04517148 SEMA3F 0.04517148
## CCT2 0.04323818 CCT2 0.04323818
## LDHB 0.04231619 LDHB 0.04231619
## COX6C 0.04183480 COX6C 0.04183480
## SNRPD2 0.04179822 SNRPD2 0.04179822
## ELK3 -0.04155821 ELK3 0.04155821

1d.
plot(PCA$x[,1],PCA$x[,3],col=plot_colors,cex=0.5,xlab="PC1",ylab="PC3")
legend( x = "topleft",
legend = colors_df_u$type,
col = colors_df_u$color, lwd = 2, lty = c(0,0),
pch = c(17,19) )

file:///D:/HW4/HW4.html 8/12
11/7/24, 4:45 PM HW 4

Yes. Based on the plot, it can be

concluded that the third PC performs better when discriminating between leukemia types by plotting the data projected onto the first and third
principal components, but not the second.

1e.
By using the filter() function, we generate a subset of our data set that only includes “T-ALL”,“TEL-AML1”, and “Hyperdip50”.

subsetl <- leukemia_data %>% filter(Type %in% c("T-ALL","TEL-AML1","Hyperdip50"))

We then generate a Euclidean distance matrix from our subsets.

file:///D:/HW4/HW4.html 9/12
11/7/24, 4:45 PM HW 4

scaledl <- scale(subsetl[,-1])

distancel <- dist(scaledl)
fit.complete <- hclust(distancel, method="complete")
plot(fit.complete, hang=-1, cex=0.8, main="complete Linkage Clustering")

Create a dendrogram based on our hierarchical clustering result

file:///D:/HW4/HW4.html 10/12
11/7/24, 4:45 PM HW 4

dendrogram <- scaledl %>%

dist %>%
hclust %>%
as.dendrogram %>%
set("labels_col", value = c("skyblue", "orange", "grey"), k=3) %>%
set("branches_k_color", value = c("skyblue", "orange", "grey"), k = 3) %>%
set("labels_cex", 0.3) %>%
plot(horiz=TRUE, axes=FALSE)

Same plot, but color all the branches and labels to have 5 different groups

file:///D:/HW4/HW4.html 11/12
11/7/24, 4:45 PM HW 4

dendrogram <- scaledl %>%

dist %>%
hclust %>%
as.dendrogram %>%
set("labels_col", k=5) %>%
set("branches_k_color", k = 5) %>%
set("labels_cex", 0.3) %>%
plot(horiz=TRUE, axes=FALSE)

file:///D:/HW4/HW4.html 12/12

Hanuman Chalisa Bengali Large
75% (4)
Hanuman Chalisa Bengali Large
5 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
Quarter 3 - Module 8: The Power (Positivity, Optimism and Resiliency) To Cope
100% (1)
Quarter 3 - Module 8: The Power (Positivity, Optimism and Resiliency) To Cope
3 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
CRI StatisticalModeling Methods
No ratings yet
CRI StatisticalModeling Methods
89 pages
Ggplot2 Slides
No ratings yet
Ggplot2 Slides
82 pages
Basic Stats For Ecology
No ratings yet
Basic Stats For Ecology
26 pages
Ielts Writing Task 2
No ratings yet
Ielts Writing Task 2
52 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
R Practice
No ratings yet
R Practice
38 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
05 Data Transformation Exploration Visualization
No ratings yet
05 Data Transformation Exploration Visualization
38 pages
R Note
No ratings yet
R Note
56 pages
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
No ratings yet
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
17 pages
Pairwise DGE OCCC
No ratings yet
Pairwise DGE OCCC
29 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Report PSA Assessement
No ratings yet
Report PSA Assessement
21 pages
Basi Concepts
No ratings yet
Basi Concepts
32 pages
DATA ANALYTICS With R - 2025
No ratings yet
DATA ANALYTICS With R - 2025
21 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
Cambridge International AS & A Level: Geography 9696/41
No ratings yet
Cambridge International AS & A Level: Geography 9696/41
24 pages
Assignments: Statistics Workshop 1: Introduction To R. Tuesday May 26, 2009
No ratings yet
Assignments: Statistics Workshop 1: Introduction To R. Tuesday May 26, 2009
39 pages
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied To Brain Cancer Microarray Data
No ratings yet
R Tutorial: Geometric Interpretation of Gene Co-Expression Network Analysis, Applied To Brain Cancer Microarray Data
27 pages
Notes 3
No ratings yet
Notes 3
19 pages
Data Wrangling
No ratings yet
Data Wrangling
32 pages
Pool
No ratings yet
Pool
13 pages
Questions With No Solutions
No ratings yet
Questions With No Solutions
20 pages
R - Lecture #2
No ratings yet
R - Lecture #2
21 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
R Commands
No ratings yet
R Commands
18 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
No ratings yet
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
17 pages
Combined 16 30
No ratings yet
Combined 16 30
15 pages
Document (26) - Copy 2
No ratings yet
Document (26) - Copy 2
17 pages
New Text Document
No ratings yet
New Text Document
8 pages
R
No ratings yet
R
6 pages
Rcourse Partviz
No ratings yet
Rcourse Partviz
9 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
STA2050 Assignment 2
No ratings yet
STA2050 Assignment 2
10 pages
Affy Diffexp Clustering Exercise-1
No ratings yet
Affy Diffexp Clustering Exercise-1
16 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
DevRes wk1-2
No ratings yet
DevRes wk1-2
6 pages
Exercise Solutions
No ratings yet
Exercise Solutions
30 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Cluster
No ratings yet
Cluster
2 pages
Cell Broadcast (GBSS19.1 01)
No ratings yet
Cell Broadcast (GBSS19.1 01)
87 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
Packages: Thistleton and Sadigov Getting Started With Packages in R Week 1
No ratings yet
Packages: Thistleton and Sadigov Getting Started With Packages in R Week 1
6 pages
Lab4Instructions Knitr
No ratings yet
Lab4Instructions Knitr
5 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
Lab3Instructions Knitr
No ratings yet
Lab3Instructions Knitr
5 pages
CH 4 Force System Resultant
No ratings yet
CH 4 Force System Resultant
50 pages
Easy Differential Expression: F. Hahne and W. Huber
No ratings yet
Easy Differential Expression: F. Hahne and W. Huber
6 pages
LabNote 3
No ratings yet
LabNote 3
3 pages
Identifying Differentially Expressed Genes
No ratings yet
Identifying Differentially Expressed Genes
3 pages
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
No ratings yet
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
2 pages
R Commands
No ratings yet
R Commands
2 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Ds
No ratings yet
Ds
2 pages
Manual Autoclave PDF
No ratings yet
Manual Autoclave PDF
104 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
USACE 87 Wetland Delineation Manual PDF
No ratings yet
USACE 87 Wetland Delineation Manual PDF
143 pages
Trade Mogul Trading Guide
No ratings yet
Trade Mogul Trading Guide
17 pages
Economic and Product Design Considerations in Machining
No ratings yet
Economic and Product Design Considerations in Machining
29 pages
03jul201502074415 PDF
No ratings yet
03jul201502074415 PDF
6 pages
Net Zero Tool
No ratings yet
Net Zero Tool
19 pages
Handlebars
No ratings yet
Handlebars
5 pages
Coursework Assessment Summary Form Cie
100% (2)
Coursework Assessment Summary Form Cie
8 pages
9155EN
No ratings yet
9155EN
27 pages
NPP0085 Jec DD Me DWG 00133
No ratings yet
NPP0085 Jec DD Me DWG 00133
1 page
Shared Module 4 5-Inventory, QM, SCM-Logistics
No ratings yet
Shared Module 4 5-Inventory, QM, SCM-Logistics
107 pages
Iq Check Real Time PCR Solution
No ratings yet
Iq Check Real Time PCR Solution
4 pages
Section 7 Gravitational Fields
No ratings yet
Section 7 Gravitational Fields
39 pages
Disability Project Work
No ratings yet
Disability Project Work
16 pages
Value of Philippine Literature
No ratings yet
Value of Philippine Literature
14 pages
MFR11 Manual
No ratings yet
MFR11 Manual
59 pages
Position Description BIM Manager
No ratings yet
Position Description BIM Manager
5 pages
After Effects Reference (006-050)
No ratings yet
After Effects Reference (006-050)
45 pages
New Microsoft Word Document (3) BBBB
No ratings yet
New Microsoft Word Document (3) BBBB
85 pages
Rakesh Resume
No ratings yet
Rakesh Resume
2 pages
Itinerary of Travel
No ratings yet
Itinerary of Travel
4 pages
Numerical Methods and Reservoir Simulation: Al-Ayen University College of Petroleum Engineering
No ratings yet
Numerical Methods and Reservoir Simulation: Al-Ayen University College of Petroleum Engineering
13 pages
RICOH Pro L4130/L4160 Print Guide: First, Confirm The Following Items
No ratings yet
RICOH Pro L4130/L4160 Print Guide: First, Confirm The Following Items
8 pages
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
No ratings yet
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
3 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

HW 4

Uploaded by

HW 4

Uploaded by

11/7/24, 4:45 PM HW 4

## Rows: 327 Columns: 3142

PCA <- prcomp(leukemia_data[,-c(1)],center=TRUE,scale=TRUE)

Plot the PVE of each PC and the cumulative PVE.

PVE <- PCA$sdev^2 / sum(PCA$sdev^2)

plot(cumsum(PVE),xlab="Principal Component ",

From the outputed result, it can be

## [1] 0.8990880 0.9002490 0.9013977 0.9025446 0.9036898 0.9048285

colors <- rainbow(7)

colors_df_u <- unique(colors_df)

From our result, it can be concluded

To find genes with highest absolute loadings for PC1:

pc1_l <- as.data.frame(PCA$rotation[,1])

## loading gene abs_loading

Yes. Based on the plot, it can be

subsetl <- leukemia_data %>% filter(Type %in% c("T-ALL","TEL-AML1","Hyperdip50"))

We then generate a Euclidean distance matrix from our subsets.

scaledl <- scale(subsetl[,-1])

Create a dendrogram based on our hierarchical clustering result

dendrogram <- scaledl %>%

dendrogram <- scaledl %>%

You might also like