0% found this document useful (0 votes)

14 views

Lecture4 Expression - Analysis 2019

ChIP-seq and related techniques like ChIP-exo and CUT&RUN are described for studying protein-DNA interactions. RNA-seq is summarized as a method for quantifying transcript abundance and discovering novel transcripts and isoforms. Key steps in RNA-seq like library preparation, mapping reads, and analyzing the data including challenges in reconstructing the transcriptome and quantifying expression are covered at a high level.

Uploaded by

Charlie Hou

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture4 Expression - Analysis 2019

Uploaded by

Charlie Hou

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Genetics 211 - 2019

Lecture 4

Functional Genomics
Gavin Sherlock
January 29th 2019
ChIP-Seq

Sonicate DNA to
produce sheared,
soluble chromatin

Immunoprecipitate and
purify
immunocomplexes

Reverse cross-links,
and purify DNA
Sequence
ChIP-Seq Data
Peak Calling
ChIP-exo
• ChIP-exo improves on resolution of ChIP-
Seq
• ChIP DNA is treated with an exo-nuclease,
to digest away unprotected sequences

ChIP
Exo 5’
3’
3’
5’

Rhee HS, Pugh BF (2011). Cell 147(6):1408-19.

ChIP-exo

Rhee HS, Pugh BF (2011). Cell 147(6):1408-19.

Cleavage Under Targets and Release
Using Nuclease (CUT & RUN)

Skene and Henikoff, elife, 2017

CUT&RUN compared to Chip-Seq
RNA-seq

• Detect transcript abundance by counting fragments of

transcripts
• No prior knowledge needed of which parts of the genome
are expressed
• Allows splice site discovery
• 5 and 3 UTR mapping
• Novel transcript discovery
• View RNA modifications (editing, other enzymatic
changes)
• With longer reads, can “phase” splice sites
• Possibly discover many novel isoforms
Dynamic Range

Mortazavi A, et al. (2008) Nat Methods 5(7):621-8.

How do we sequence mRNA?

Total RNA

DNAase Treatment

Oligo-dT beads

PolyA purified RNA

First Strand cDNA synthesis

5’ cap structure AAA(A)n 3’ poly A tail

mRNA

3’ 5’ oligo (dT)12-18 primer

5’ AAA(A)n 3’

dNTPs reverse transcriptase

3’ 5’
5’ AAA(A)n 3’

cDNA:mRNA hybrid
Second Strand Synthesis
3’ 5’
5’ AAA(A)n 3’
dNTPs RNAaseH
E. coli polymerase I

3’ 5’
5’

remnants of mRNA serve as primers for

synthesis of second strand of cDNA

3’ 5’
5’

bacteriophage T4
DNA ligase

3’ 5’
5’ 3’
double stranded cDNA

Library construction, similar to genomic DNA, using forked adapters

Shatter RNA, Prime with Random Hexamers

5’ cap structure AAA(A)n 3’ poly A tail

Fragment RNA

Prime 1st strand synthesis with random hexamers

5’ 3’

3’ 5’
5’ 3’
double stranded cDNA

Library construction, using forked adapters

Random Hexamer Induced Sequence Errors

van Gurp TP, McIntyre LM, Verhoeven KJ. (2013). Consistent errors in first
strand cDNA due to random hexamer mispriming. PLoS One 8(12):e85583.
Using dUTP to retain strand specificity

mRNA

fragment

1st strand synthesis

with random hexamers
5’ 3’ and normal dNTPs

2nd strand synthesis

5’ 3’ with dTTP -> dUTP

forked adapter ligation

Creating Strand Specificity

UNG treatment

Ad #2 Ad #1

Pre-amplification and sequencing

Mapping of Reads
• Map reads to both the genome, and the predicted
spliced genome.
• Un-mappable reads may span unknown exon-exon
junctions from novel transcripts or exons.
• Need to be able to accommodate mismatches.
Exonic Read Density
• To measure abundance when sequencing entire
transcripts, you must normalize the data for the
transcript length.
• Exonic Read Density = Reads per kb gene
exon per million mapped reads
– Developed by the Wold lab, but makes intuitive
sense.
• Implies single end data – people now often use
fpkm, which works for paired end data too.
Why Exonic Read Density?
What we observe in mapped reads

What was sequenced

What was present in RNA

1 rpkm 3 rpkm 1 rpkm

Analysis Considerations
• Read Mapping
– Unspliced Aligners
– Spliced Aligners
• Transcriptome Reconstruction
– Genome guided
– Genome independent reconstruction
• Expression quantification
– Gene quantification
– Isoform quantification
• Differential Expression
Unspliced Aligners
• Limited to identifying known exons and
junctions
• Requires a good reference transcriptome
• BWT (e.g. bowtie2, bwa) based aligners are
fast, and have been typically used
• Pseudoaligners (kallisto and sailfish) much
faster, and probably as accurate
Spliced Aligners
Align to whole genome, including intron-spanning reads that allow
large gaps
• Exon first (MapSplice, SpliceMap, HiSat2)
– Two step process
• Use unspliced alignment
• Take unmapped reads, split, and look for possible spliced connections
– Typically faster
• Seed-extend (GSNAP, QPALMA)
– Break reads into short seeds and place on genome, then examine
with more sensitive methods
– Find more splice junctions, though not *yet* clear if they tend to
be false positives
Garber et al, 2011, Nature Methods
Transcriptome Reconstruction
• Challenging because
– Transcript abundance spans several orders of
magnitude
– Reads will originate from mature mRNA, as
well as incompletely spliced precursor RNA
– Reads are short, and genes can have many
isoforms, making it challenging to determine
which isoform produced which read
Two Approaches
• Genome Guided
– Relies on reference genome
– Uses spliced reads to reconstruct the transcriptome
– E.g. cufflinks (identifies minimal set of isoforms),
scripture (identifies maximal set of isoforms)
• Genome Independent Approach
– Tries to de novo assemble transcripts
– TransAbyss, Velvet, Trinity
– Sensitive to sequencing errors
– Usually requires more computational resources
Two isoforms of the same gene:
Determining differential
Expression
• A number of packages available
– Cuffdiff, DE-Seq, EdgeR etc.
• Require replicates for each condition, so can
compare within vs. between sample
variance
• More abundant transcripts are more able to
be determined to have differential
expression
Current trends
• No perfect solution
• Kallisto is now widely used, and is
incredibly fast, with low memory
requirements
– Speed allows bootstrapping to determine
uncertainly in abundance estimates
• Sleuth takes advantage of those bootstraps
to identify differential expression
Better Assaying Isoforms
• To better understand a biological system, we really want to
understand all transcripts
– Alternative splicing first seen in viruses in the 1970s
• Splicing generates complexity
– Humans have only ~2X more genes than Drosophila
– More than one gene one protein
– >38,000 Dscam isoforms!
– Alternative splicing can be altered in disease
• With relatively short reads, even with paired end sequencing,
it’s not clear which exons ends up with which other exons in
mature isoforms
• Long-Read RNA-Seq results in better isoform determination.
Long-Read RNA-Seq

Sharon D, Tilgner H, Grubert F, Snyder M. (2013). A single-molecule long-read survey of the human transcriptome.
Nat Biotechnol 31(11):1009-14
TIF-Seq
• Transcript Isoform Sequencing
• Does not capture exonic structure
• Instead captures 5’ and 3’ ends of
transcripts
• From only ~6,000 genes in yeast, almost 2
million unique transcript isoforms identified
• 371,087 major TIFs identified genome-wide
Pelechano V, Wei W, Steinmetz LM. (2013). Extensive transcriptional
heterogeneity revealed by isoform profiling. Nature 497(7447):127-31.
TIF-Seq
TIF-Seq
Analysis and visualization of
expression data
Visualizing Data
MAK16 YAL025C
5 MAK16
0.5
YBL015W ACH1
4
YBL048W
0
3 YBL048W
OD 0.26

OD 0.46

OD 0.80

OD 1.80

OD 3.70

OD 6.90

OD 7.30
YBL049W
YBL049W
-0.5 2
YBL064C
YBL064C
1 YBL078C
-1
MAK16
YBL078C
0 YBR072W
HSP26

O 26

O 46

O 80

O 70

O 90
30
-1 YBR139W

7.
-1.5
YBR139W
D

D
O YBR147W
-2
YBR147W
-2
YCR021C
-3 HSP30
YDL085W
-2.5
-4 YDL085W
YDL204W
YDL204W
YDL208W NHP2
Extracting Data
Experiments

RNA-Seq data

Genes
200 10000 50.00 5.64
4800 4800 1.00 0.00
9000 300 0.03 -4.91
Cy5 ⎛ Cy5⎞
Cy3 Cy5 log 2 ⎜⎜ ⎟⎟
Cy3 ⎝ Cy3⎠
Visualizing Data (cont.)
Expression During Sporulation

5
Series1
Series2
Series3
Series4

4 Series5
Series6
Series7
Series8
Series9
Series10
3 Series11
Series12
Series13
Series14
Series15
2 Series16
Series17
Series18
Series19
Series20
Series21
1
Log Ratio

Series22
Series23
Series24
Series25
Series26
0 Series27

0 2 4 6 8 10 Series28
Series29
Series30
Series31
-1 Series32
Series33
Series34
Series35
Series36
Series37
-2 Series38
Series39
Series40
Series41
Series42
-3 Series43
Series44
Series45
Series46
Series47

-4 Series48
Series49
Time (hours) Series50
Series51
Organizing Data
In expression studies,
we often use clustering
algorithms to help us
identify patterns in
complex data.

For example, we can

randomize the data
used to represent this
painting and see if
clustering will help us
visualize the pattern.
Clustering algorithms

First, we represent the painting in black and white.

Clustering algorithms

The painting is “sliced” into rows which are then randomized.

Clustering algorithms

Rows ordered by hierarchical clustering with nodes

flipped to optimize ordering
Clustering algorithms

Rows ordered by using a Self-Organizing Map (SOM)

Random vs. Biological Data

From Eisen MB, et al, PNAS 1998 95(25):14863-8

Types of Clustering
• Agglomerative
– Bottom up approach
– Different variants of hierarchical clustering
– This is the typical clustering you see
• Partitioning / Divisive
– Top down approach
– K-means Clustering
– Self-Organizing Maps
• All require the ability to compare expression
patterns to each other.
How do we compare expression
profiles?

• Treat expression data for a gene as a

multidimensional vector.

• Use a distance/correlation metric to

compare the vectors.
Expression Vectors
• Each gene is represented by a vector where coordinates
are its values - log(ratio) - in each experiment

• x = log(ratio)expt1
• y = log(ratio)expt2 z
• z = log(ratio)expt3
• etc. y
Similar expression

x
Distance metrics
• Distances or correlations are measured
“between” expression vectors

• Many different ways to measure distance:

• Euclidean distance
• Pearson correlation coefficient(s)
• Spearman’s Rank Correlation
• Manhattan distance
• Mutual information
• Kendall’s Tau
• etc.

• Each has different properties and can reveal

different features of the data
Euclidean distance
• Euclidean distance
metrics detect similar
vectors by identifying
those that are closest 2.5

in space. In this 2
Gene A

example, Gene A and Gene C

EXPERIMENT 2
1.5

C are closest. 1
Gene B

0.5

0
0 0.5 1 1.5 2 2.5
EXPERIMENT 1
Pearson correlation
• The Pearson correlation
disregards the magnitude
of the vectors but instead
compares their 2.5

directions. In this 2
Gene A

example, Gene A and Gene C

EXPERIMENT 2
1.5

Gene B have the same 1

Gene B

slope, so would be most

0.5

similar to each other.

0
0 0.5 1 1.5 2 2.5
EXPERIMENT 1
Agglomerative Hierarchical
Clustering
1. Compare all expression patterns to each other.
2. Join patterns that are the most similar out of all
patterns.
3. Compare all joined and unjoined patterns.
4. Go to step 2, and repeat until all patterns are
joined.

Need a rule to decide how to compare clusters to each other

Visualization of Hierarchical Clustering

G6
G6
G1
G5

G5 G2
G2
G4
G3 G3

G4
Single linkage Clustering

Nearest Neighbor •
• +•
•
•
This method
• • produces long
• + chains which form
• straggly clusters.
•
Complete Linkage Clustering
Uses the
Furthest •
Neighbor • +•
• This method tends
• to produce very
tight clusters of
• similar patterns
•
• +
• •
Average Linkage Clustering

Average (only •
shown for two •
cases) +• The red and blue
• ‘+’ signs mark the
•
centroids of the
two clusters.
• •
• +
• •
Centroid Linkage Clustering

•
Centroid
• +• The red and blue
• ‘+’ signs mark the
•
centroids of the
two clusters.
• •
• +
• •
And we get a cluster:
Single Complete Average Centroid
Two-way clustering
• Just as gene vectors are clustered,
experiment vectors can be clustered.
• All the data points for an experiment can be
used to construct a vector and the vectors of
multiple experiments can be compared.
Two-way Clustering
Two-way clustering can help show which
samples are most similar, as well as which
genes.
Agglomerative Hierachical
Clustering
Advantages:
• Simple
• Easy to implement
• Easy to visualize

Disadvantages:
• Can lead to artifacts
• Discarding of subtleties in 2-way clustering
Partitioning Methods
• Split data up into smaller, more homogenous
sets
• Should avoid artifacts associated with
incorrectly joining dissimilar vectors
• Can cluster each partition independently of
others, by genes and arrays
• K-means clustering and Self-Organizing
Maps are two possible partitioning methods
K-means Clustering
• Split data into ‘n’ partitions, each with
an associated vector.
• Assign genes to partitions, and
recalculate the vector associated with
each partition as the centroid of its
associated genes.
• Repeat until solution converges, or for
a fixed number of iterations.
Self Organizing Maps
• Create a ‘Map’ of ‘n’ partitions, that
is modeled on the expression data,
where each partition in the map has
an associated vector.

• Genes’ expression vectors are

assigned to the partition with the
most similar associated vector.

• Neighboring partitions are more

similar to each other than they are to
distant partitions.
The
TheMap
MapIsIsDisorganized
Organized

Repeat 100,000 times

Dimensional Reduction
• Is hard to get a sense as to whether there are
clear clusters when clustering data – the
nature of the tree can be hard to discern
structure
• People have turned to Principal
Components Analysis, to be able to project
data in 2 (or more dimensions)
Principal Components Analysis
tSNE
• Similar in principal to
PCA
• However, uses
information in higher
dimensions to better
separate clusters back
in 2-dimensions
• Parameters make a
difference, so beware
Using the Gene Ontology to
assess list of genes
• Many experiments result in a list of
interesting genes
• Typically biologists can make up a story
about any random list
• So, look at all GO annotations for the
genes in a list, and see if the number of
annotations for any GO node is significant
The Categories of GO
(The Gene Ontology)
• Biological Process = goal or objective (Why)

(e.g. DNA replication, Cell Cycle Control, Cell adhesion)

• Molecular Function = elemental activity/task (What)

(e.g. Transcription factor, polymerase, protein kinase)

• Cellular Component = location or complex (Where)

(e.g. pre-replication complex, kinetochore, membrane)

Each Category is a structured, controlled vocabulary

Parent-Child Relationships

Nucleus

Nucleoplasm Nuclear Nucleolus Chromosome Perinuclear

envelope space

A child is a subset of The cell component term

a parent’s elements Nucleus has 5 children
Determining P-values for GO
annotation for a list of genes
We can calculate the probability of having x of n
genes having an annotation to a GO node, given
that in the genome, M of N genes have that
annotation, using the hypergeometric
distribution, as:
⎛ M ⎞⎛ N − M ⎞
⎜ ⎟⎜ ⎟
⎝ x ⎠⎝ n − x ⎠
p=
⎛N⎞
⎜ ⎟
⎝n ⎠
Determining GO significance
To calculate a P-value, we calculate the
probability of having at least x of n annotations:

⎛ M ⎞⎛ N − M ⎞
x−1 ⎜ ⎟⎜ ⎟
⎝i ⎠⎝ n − i ⎠
P-value = 1− ∑
⎛ N ⎞
⎜ ⎟
i=0

⎝i ⎠

Then do multiple hypothesis correction on the p-values

ICY2
YPL250C
MET11
MET11
Methionine Cluster MXR1
YER042W
MET17*
YLR302C
SAM3
YPL274W
MET28
MET28
STR3
YGL184C
MMP1
YLL061W
MET1
MET1
SER33
YIL074C
MHT1
YLL062C
MET14
MET14
MET16
MET16
MET3
MET3
MET10
MET10
ECM17
ECM17
MET2*
YNL276C
MUP1
MUP1
MET17
MET17
MET6
MET6
GO Annotations
• sulfur metabolic process : 2.43e-19 (12/18 vs 66/6608)
• methionine metabolic process : 1.40e-14 (10/18 vs 24/6608)
Recommended Reading
ChIP-Seq
• Valouev, A., Johnson, D.S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R.M. and Sidow, A. (2008). Genome-
wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 5(9):829-34. QuEST.
• Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W.
and Liu, X.S. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9(9):R137.
• Rozowsky, J., Euskirchen, G., Auerbach, R.K., Zhang, Z.D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M. and Gerstein,
M.B. (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27:66-75.
• Rhee, H.S. & Pugh, B.F. (2011). Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide
resolution. Cell 147, 1408–1419.
• Nawy, T. (2012). High-resolution chromatin immunoprecipitation. Nature Methods 9, 130.
• Skene, P.J., Henikoff, S. (2015). A simple method for generating high-resolution maps of genome-wide protein binding. Elife
4:e09225.
• Skene, P.J., Henikoff, S. (2017). An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife
6 pii: e21856. doi: 10.7554/eLife.21856.
Recommended Reading
RNA-Seq
• Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L., Krobitsch, S., Lehrach, H., and Soldatov, A. (2009).
Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 37(18):e123.
• Borodina, T., Adjaye, J., Sultan, M. (2011). A strand-specific library preparation protocol for RNA sequencing. Methods
Enzymol. 500:79-98.
• Grabherr, M.G., Haas, B.J., et al. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome.
Nat Biotechnol. 29(7):644-52. Trinity
• van Gurp, T.P., McIntyre, L.M., Verhoeven, K.J. (2013). Consistent errors in first strand cDNA due to random hexamer
mispriming. PLoS One 8(12):e85583.
• Sharon, D., Tilgner, H., Grubert, F. and Snyder, M. (2013). A single-molecule long-read survey of the human transcriptome.
Nat Biotechnol. 31(11):1009-14.
• Pelechano, V., Wei, W. and Steinmetz, L.M. (2013). Extensive transcriptional heterogeneity revealed by isoform profiling.
Nature 497(7447):127-31.
• Kim D, Langmead B, Salzberg SL (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods.
12(4):357-60.
• Frazee, A.C., Pertea, G., Jaffe, A.E., Langmead, B., Salzberg, S.L., Leek, J.T. (2015). Ballgown bridges the gap between
transcriptome assembly and expression analysis. Nat Biotechnol. 33(3):243-6
• Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T. and Salzberg, S.L. (2015) StringTie enables improved
reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 33(3):290-5.
• Pertea, M., Kim D, Pertea, G.M., Leek, J.T., Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments
with HISAT, StringTie and Ballgown. Nat Protoc. 11(9):1650-67.
• Patro, R., Mount, S.M., Kingsford, C. (2014). Sailfish enables alignment-free isoform quantification from RNA-seq reads using
lightweight algorithms. Nat Biotechnol. 32(5):462-4.
• Bray NL, Pimentel H, Melsted P, Pachter L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol.
34(5):525-7 Kallisto
Recommended Reading
Clustering/Expression Data analysis:

• Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. (1998). Cluster analysis and display of
genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863-8.
• Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S.,
Golub, T.R. (1999). Interpreting patterns of gene expression with self-organizing maps:
methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA
96(6):2907.
• Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M. (1999). Systematic
determination of genetic network architecture. Nat Genet. 22(3):281-5.
• Tusher, V.G., Tibshirani, R., Chu, G. (2001). Significance analysis of microarrays applied to
the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116-21
• Slonim, D.K. (2002). From patterns to pathways: gene expression data analysis comes of age.
Nat Genet. 32 Suppl:502-8.
• McShane, L.M., Radmacher, M.D., Freidlin, B., Yu, R., Li, M.C., Simon, R. (2002). Methods
for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics 18(11):1462-9.
• Bryan, J. (2004). Problems in gene clustering based on gene expression data. Journal of
Multivariate Analysis 90, 44–66.
• Chipman, H. and Tibshirani, R. (2006). Hybrid Hierarchical Clustering with Applications to
Microarray Data. Biostatistics, 7(2):286-301.

3M ESPE Catalog
No ratings yet
3M ESPE Catalog
124 pages
Below Are The Definitions of The Six Capacity Pillars and Some Helpful Pointers For Assessment: Definitions of Capacity Pillar Structure
100% (2)
Below Are The Definitions of The Six Capacity Pillars and Some Helpful Pointers For Assessment: Definitions of Capacity Pillar Structure
2 pages
DNA Library, RS
No ratings yet
DNA Library, RS
37 pages
The Molecule of Life Is Transmitted and Expressed
No ratings yet
The Molecule of Life Is Transmitted and Expressed
51 pages
DNA Replication: An Overview
No ratings yet
DNA Replication: An Overview
36 pages
Transcription Final
No ratings yet
Transcription Final
21 pages
Science of Living System (BS20001) : - Soumya de
No ratings yet
Science of Living System (BS20001) : - Soumya de
45 pages
BIO 411 Lecture 4 - Manipulating Proteins
No ratings yet
BIO 411 Lecture 4 - Manipulating Proteins
44 pages
CDNA, CDNA Library and Cloning for Undergraduate 2009 (1)
No ratings yet
CDNA, CDNA Library and Cloning for Undergraduate 2009 (1)
79 pages
RNA Synthesis and Gene Control in Proks
No ratings yet
RNA Synthesis and Gene Control in Proks
41 pages
Module 2 Slides
No ratings yet
Module 2 Slides
87 pages
MIT7 01SCF11 2.4sol
No ratings yet
MIT7 01SCF11 2.4sol
3 pages
Science of Living System (BS20001) : - Soumya de
No ratings yet
Science of Living System (BS20001) : - Soumya de
35 pages
Science of Living System: Nihar Ranjan Jana
No ratings yet
Science of Living System: Nihar Ranjan Jana
20 pages
BS10003 - Nucleic Acid (Part II) - D. Samanta - August 2024
No ratings yet
BS10003 - Nucleic Acid (Part II) - D. Samanta - August 2024
33 pages
Slides Week 10 Classes35-38 Bio200 Win16 1
No ratings yet
Slides Week 10 Classes35-38 Bio200 Win16 1
44 pages
Lecture 8
No ratings yet
Lecture 8
66 pages
LESSON_Transcription and Translation
No ratings yet
LESSON_Transcription and Translation
34 pages
BS10003 - Nucleic Acid (Part II) - D. Samanta - November 2022
No ratings yet
BS10003 - Nucleic Acid (Part II) - D. Samanta - November 2022
33 pages
Practice Problems Hourly
No ratings yet
Practice Problems Hourly
2 pages
Molecular Theory + Offline Part 4
No ratings yet
Molecular Theory + Offline Part 4
68 pages
Week 7 Section Handout
No ratings yet
Week 7 Section Handout
3 pages
W008 S005 Virology
No ratings yet
W008 S005 Virology
14 pages
RNA Seq Tutorial
0% (1)
RNA Seq Tutorial
139 pages
Recombinant DNA Technology
No ratings yet
Recombinant DNA Technology
75 pages
Lecture 1.5 translation 2
No ratings yet
Lecture 1.5 translation 2
16 pages
BMOL20090 PCR and Next Gen Sequencing 2024
No ratings yet
BMOL20090 PCR and Next Gen Sequencing 2024
68 pages
dcvmn_dgenovese
No ratings yet
dcvmn_dgenovese
69 pages
Central-Dogma-From-Genes-to-Proteins
No ratings yet
Central-Dogma-From-Genes-to-Proteins
37 pages
From RNA-seq Reads To Gene Expression
No ratings yet
From RNA-seq Reads To Gene Expression
27 pages
BCH4027_-TD2-Mapping_Promoters-1-
No ratings yet
BCH4027_-TD2-Mapping_Promoters-1-
32 pages
Lecture 2 Genomes, Cloning Part 1 AP0701 2023-24
No ratings yet
Lecture 2 Genomes, Cloning Part 1 AP0701 2023-24
122 pages
DNA Libraries
No ratings yet
DNA Libraries
37 pages
Dr. S. Khatkar PCR
No ratings yet
Dr. S. Khatkar PCR
24 pages
Protein Synthesis: ADA School Biology Department
No ratings yet
Protein Synthesis: ADA School Biology Department
27 pages
DNA Replication: An Overview
No ratings yet
DNA Replication: An Overview
34 pages
BS10003 - Transcription and Translation - September 2024
No ratings yet
BS10003 - Transcription and Translation - September 2024
22 pages
DNA Replication Final
No ratings yet
DNA Replication Final
85 pages
DNA Replication Prokaryotes Eukaryotes
No ratings yet
DNA Replication Prokaryotes Eukaryotes
38 pages
Lecture 4 - Mechanism of Transcription in Bacteria
No ratings yet
Lecture 4 - Mechanism of Transcription in Bacteria
51 pages
Transcription
No ratings yet
Transcription
75 pages
MCB Molecular Basis of Gene Transcription WITHOUT ANSWERS
No ratings yet
MCB Molecular Basis of Gene Transcription WITHOUT ANSWERS
39 pages
DNA Replication
No ratings yet
DNA Replication
34 pages
Transcription Translaiton3
No ratings yet
Transcription Translaiton3
45 pages
Biochemistry - Biosynthesis of Protein & Protein Chemistry
No ratings yet
Biochemistry - Biosynthesis of Protein & Protein Chemistry
113 pages
Gene Expression - Sept 2012
No ratings yet
Gene Expression - Sept 2012
36 pages
Translation
No ratings yet
Translation
43 pages
Recombinant DNA II: Making, Screening and Analyzing cDNA Clones Genomic DNA Clones
100% (1)
Recombinant DNA II: Making, Screening and Analyzing cDNA Clones Genomic DNA Clones
23 pages
Access Course 2024-4-1
No ratings yet
Access Course 2024-4-1
38 pages
BIOL 2050 - Lecture 7
No ratings yet
BIOL 2050 - Lecture 7
27 pages
04 DNA Replication
No ratings yet
04 DNA Replication
31 pages
Transcription and Translation
No ratings yet
Transcription and Translation
35 pages
4-Molecular biology 2
No ratings yet
4-Molecular biology 2
10 pages
9 Ncrna
No ratings yet
9 Ncrna
70 pages
3 DNA Libraries
No ratings yet
3 DNA Libraries
22 pages
VSU-2020-PPT Information Transfer Protein Synthesis
No ratings yet
VSU-2020-PPT Information Transfer Protein Synthesis
78 pages
4.1 Transcription
No ratings yet
4.1 Transcription
2 pages
Continuous Assessment: Concentration's Increment in Case of Insert and Ligase
No ratings yet
Continuous Assessment: Concentration's Increment in Case of Insert and Ligase
8 pages
Replication - EI
No ratings yet
Replication - EI
25 pages
Transcription Working
No ratings yet
Transcription Working
51 pages
3_RNAseq_background
No ratings yet
3_RNAseq_background
42 pages
DNA Code Basics
From Everand
DNA Code Basics
Zara Sagan
No ratings yet
Lecture5 Sequence Comparison-2019
No ratings yet
Lecture5 Sequence Comparison-2019
91 pages
Lecture7 Epigenomics-2019
No ratings yet
Lecture7 Epigenomics-2019
62 pages
Lecture2-High Throughput Sequencing-2019
No ratings yet
Lecture2-High Throughput Sequencing-2019
58 pages
20 Effective ChatGPT Prompts
100% (4)
20 Effective ChatGPT Prompts
13 pages
Ap23 FRQ Comp Sci A
No ratings yet
Ap23 FRQ Comp Sci A
19 pages
EC2306 Lab Manual
No ratings yet
EC2306 Lab Manual
49 pages
Gate Leaf - SI
No ratings yet
Gate Leaf - SI
100 pages
Name: Şeyma Nur KALKAN Number:183407049 Date: 08.04.2020 SOC211.1
No ratings yet
Name: Şeyma Nur KALKAN Number:183407049 Date: 08.04.2020 SOC211.1
1 page
9 Formulating Claims of Fact, Policy and Value
100% (1)
9 Formulating Claims of Fact, Policy and Value
23 pages
English Activity Book 8
No ratings yet
English Activity Book 8
67 pages
A History of Abstract Algebra - Jeremy Gray
100% (1)
A History of Abstract Algebra - Jeremy Gray
564 pages
SMCG Notes-1
No ratings yet
SMCG Notes-1
40 pages
Blue Economy - Presentation - 16 June 2015 - FF
No ratings yet
Blue Economy - Presentation - 16 June 2015 - FF
14 pages
Competition Policy in Support of The Green Deal - Vestager Speech
No ratings yet
Competition Policy in Support of The Green Deal - Vestager Speech
10 pages
(2012) Physiology, Biochemistry and Possible Applications of Microbial Caffeine Degradation PDF
0% (1)
(2012) Physiology, Biochemistry and Possible Applications of Microbial Caffeine Degradation PDF
10 pages
UPPSALA - Call For Applications Guest Researcher Pro - 221129 - 084840
No ratings yet
UPPSALA - Call For Applications Guest Researcher Pro - 221129 - 084840
2 pages
Modern Electroplating Fourth Edition Book Review PDF
No ratings yet
Modern Electroplating Fourth Edition Book Review PDF
2 pages
Answers For Academic Practice Test 4
40% (5)
Answers For Academic Practice Test 4
5 pages
Practice Test/Thermochemistry/Ap Chemistry: Combustion F F F
No ratings yet
Practice Test/Thermochemistry/Ap Chemistry: Combustion F F F
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
152 pages
Unit 1.3 Weathering
No ratings yet
Unit 1.3 Weathering
6 pages
WH-04 - Work at Height
No ratings yet
WH-04 - Work at Height
3 pages
17TH June Vocab of Perfection 4.0
No ratings yet
17TH June Vocab of Perfection 4.0
13 pages
Ancient Indian Astronomy
No ratings yet
Ancient Indian Astronomy
12 pages
Force CQ (7-10-21)
No ratings yet
Force CQ (7-10-21)
16 pages
Thesis Acknowledgement Parents
100% (3)
Thesis Acknowledgement Parents
4 pages
ANAPHY 1st Exam
No ratings yet
ANAPHY 1st Exam
12 pages
Rpt-Sow Form 1 2024
No ratings yet
Rpt-Sow Form 1 2024
9 pages
Snowdonia National Park
No ratings yet
Snowdonia National Park
3 pages
q125 Sonicator and Accessories
No ratings yet
q125 Sonicator and Accessories
2 pages
Thornleigh Salesian College, Bolton 1
No ratings yet
Thornleigh Salesian College, Bolton 1
23 pages
Grade 11 Investigation Number Patterns
No ratings yet
Grade 11 Investigation Number Patterns
4 pages
Course Outline Form: Metropolitan Community College
No ratings yet
Course Outline Form: Metropolitan Community College
4 pages

Lecture4 Expression - Analysis 2019

Uploaded by

Lecture4 Expression - Analysis 2019

Uploaded by

Genetics 211 - 2019

Rhee HS, Pugh BF (2011). Cell 147(6):1408-19.

Rhee HS, Pugh BF (2011). Cell 147(6):1408-19.

Skene and Henikoff, elife, 2017

• Detect transcript abundance by counting fragments of

Mortazavi A, et al. (2008) Nat Methods 5(7):621-8.

PolyA purified RNA

5’ cap structure AAA(A)n 3’ poly A tail

3’ 5’ oligo (dT)12-18 primer

dNTPs reverse transcriptase

remnants of mRNA serve as primers for

Library construction, similar to genomic DNA, using forked adapters

5’ cap structure AAA(A)n 3’ poly A tail

Prime 1st strand synthesis with random hexamers

Library construction, using forked adapters

1st strand synthesis

2nd strand synthesis

forked adapter ligation

Pre-amplification and sequencing

What was sequenced

What was present in RNA

1 rpkm 3 rpkm 1 rpkm

For example, we can

First, we represent the painting in black and white.

The painting is “sliced” into rows which are then randomized.

Rows ordered by hierarchical clustering with nodes

Rows ordered by using a Self-Organizing Map (SOM)

From Eisen MB, et al, PNAS 1998 95(25):14863-8

• Treat expression data for a gene as a

• Use a distance/correlation metric to

• Many different ways to measure distance:

• Each has different properties and can reveal

example, Gene A and Gene C

example, Gene A and Gene C

Gene B have the same 1

slope, so would be most

similar to each other.

Need a rule to decide how to compare clusters to each other

• Genes’ expression vectors are

• Neighboring partitions are more

Repeat 100,000 times

(e.g. DNA replication, Cell Cycle Control, Cell adhesion)

• Molecular Function = elemental activity/task (What)

(e.g. Transcription factor, polymerase, protein kinase)

• Cellular Component = location or complex (Where)

(e.g. pre-replication complex, kinetochore, membrane)

Each Category is a structured, controlled vocabulary

Nucleoplasm Nuclear Nucleolus Chromosome Perinuclear

A child is a subset of The cell component term

Then do multiple hypothesis correction on the p-values

You might also like