3 RNAseq Background
3 RNAseq Background
5
RNA sequencing
Isolate RNAs Generate cDNA, fragment,
Samples of interest size select, add linkers
Condition 1 Condition 2
(normal colon) (colon tumor) Sequence ends
Map to genome,
transcriptome, and
predicted exon
junctions
6
Why sequence RNA (versus DNA)?
• Functional studies
• Genome may be constant but an experimental condition has a
pronounced effect on gene expression
• e.g. Drug treated vs. untreated cell line
• e.g. Wild type versus knock out mice
7
Why sequence RNA (versus DNA)?
• Interpreting mutations that do not have an obvious effect on protein
sequence
• ‘Regulatory’ mutations that affect what mRNA isoform is expressed and how much
• e.g. splice sites, promoters, exonic/intronic splicing motifs, etc.
8
Introduction to RNA-seq
https://www.youtube.com/watch?v=tlf6wYJrwKY&t=414s
main steps in RNA-seq
10
Challenges
• Sample
• Purity?, quantity?, quality?
• RNAs consist of small exons that may be separated by large introns
• Mapping reads to genome is challenging
• The relative abundance of RNAs vary wildly
• 105 – 107 orders of magnitude
• Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may
consume the majority of reads
• Ribosomal and mitochondrial genes
• RNAs come in a wide range of sizes
• Small RNAs must be captured separately
• PolyA selection of large RNAs may result in 3’ end bias
• RNA is fragile compared to DNA (easily degraded)
11
Replicates
• Technical Replicate
• Multiple instances of
sequence generation
• Flow Cells, Lanes, Indexes
• Biological Replicate
• Multiple isolations of cells
showing the same
phenotype, stage or other
experimental condition
• Some example
concerns/challenges:
• Environmental Factors,
Growth Conditions, Time
• Correlation Coefficient 0.92-
0.98
12
main steps in RNA-seq
14
15
Which read aligner should I use?
https://www.ebi.ac.uk/~nf/hts_mappers/
16
Features comparison of aligners
https://www.ebi.ac.uk/~nf/hts_mappers/
18
19
https://cole-trapnell-lab.github.io/team/cole-trapnell/
Spliced mappers
• Exon-first • Seed-and-extend
• Exon-first methods map reads first to the • Seed-and-extend methods generally start by mapping part of
genome using an unspliced approach to the reads as kmers or substrings; candidate matches are then
find read-clusters; unmapped reads are extended using different algorithms and potential splice-sites
then used to find connections between are located.
these read-clusters.
• Include:
• Include: • MapNext (Bao et al. 2009),
• TopHat (Trapnell et al. 2009), • PALMapper (Jean et al. 2010),
• MapSplice (Wang et al. 2010a), • SplitSeek (Ameur et al. 2010),
• SpliceMap (Au et al. 2010), • GSNAP (Wu et al. 2010),
• HMMsplicer (Dimon et al. 2010), • Supersplat (Bryant et al. 2010),
• SOAPsplice (Huang et al. 2011), • SeqSaw (Wang et al. 2011),
• PASSion (Zhang et al. 2012), • and STAR (Dobin et al. 2012).
• TrueSight (Li et al. 2012b),
• and GEM (Marco-Sola et al. 2012).
21
22
main steps in RNA-seq
24
25
Expectation–maximization algorithm
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
Expectation-maximization
https://www.youtube.com/watch?v=REypj2sy_5U
Expectation-maximization
Aim: Gaussian means and variances
(prior: uniform)
https://www.youtube.com/watch?v=iQoXFmbXRJA
EM algorithm
30
main steps in RNA-seq
Mortazavi, A. et.al. (2008). Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods,
5(7):621--628.
35
RPKM example
million million
http://yourgene.pixnet.net/blog/post/69572975-rpkm-%E7%B0%A1%E4%BB%8B
36
TPM
https://www.youtube.com/watch?v=TTUrtCY2k-w&t=3s
Expression profile
PCA
https://www.youtube.com/watch?v=HMOI_lkzW08
https://www.youtube.com/watch?v=FgakZw6K1QQ
MDS & PCoA
https://www.youtube.com/watch?v=GEn-_dAyYME
Homework
• What’s TPM?
• What’s FPKM?
• What’s the difference between TPM and FPKM?
Any questions?
42