0% found this document useful (0 votes)
9 views42 pages

3 RNAseq Background

The document provides an overview of RNA sequencing (RNA-seq), detailing its methodology, including RNA isolation, cDNA conversion, sequencing, and downstream analysis. It discusses the significance of RNA-seq in functional studies, challenges faced during the process, and different mapping strategies for read alignment. Additionally, it introduces key metrics like RPKM and TPM for quantifying gene expression levels.

Uploaded by

johngeralt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views42 pages

3 RNAseq Background

The document provides an overview of RNA sequencing (RNA-seq), detailing its methodology, including RNA isolation, cDNA conversion, sequencing, and downstream analysis. It discusses the significance of RNA-seq in functional studies, challenges faced during the process, and different mapping strategies for read alignment. Additionally, it introduces key metrics like RPKM and TPM for quantifying gene expression levels.

Uploaded by

johngeralt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Computational Biology Associate professor: Tingwen Chen ( 陳亭妏 )

Lab Office: Room 420 bioICT building


Email: [email protected]
What is RNASeq
General analysis flowchart
RPKM, TPM
DEGs
Functional analysis
scRNA Seq
Demo data we will use
Several slides are adapted from 2013 Canadian bioinformatics workshops
4
Gene
expression

5
RNA sequencing
Isolate RNAs Generate cDNA, fragment,
Samples of interest size select, add linkers

Condition 1 Condition 2
(normal colon) (colon tumor) Sequence ends

Map to genome,
transcriptome, and
predicted exon
junctions

100s of millions of paired reads


10s of billions bases of sequence
Downstream analysis

6
Why sequence RNA (versus DNA)?
• Functional studies
• Genome may be constant but an experimental condition has a
pronounced effect on gene expression
• e.g. Drug treated vs. untreated cell line
• e.g. Wild type versus knock out mice

• Some molecular features can only be observed at the


RNA level
• Alternative isoforms, fusion transcripts, RNA editing
• Predicting transcript sequence from genome
sequence is difficult
• Alternative splicing, RNA editing, etc.

7
Why sequence RNA (versus DNA)?
• Interpreting mutations that do not have an obvious effect on protein
sequence
• ‘Regulatory’ mutations that affect what mRNA isoform is expressed and how much
• e.g. splice sites, promoters, exonic/intronic splicing motifs, etc.

• Prioritizing protein coding somatic mutations (often heterozygous)


• If the gene is not expressed, a mutation in that gene would be less interesting
• If the gene is expressed but only from the wild type allele, this might suggest loss-of-function
(haploinsufficiency)
• If the mutant allele itself is expressed, this might suggest a candidate drug target

8
Introduction to RNA-seq
https://www.youtube.com/watch?v=tlf6wYJrwKY&t=414s
main steps in RNA-seq

1. RNA is isolated from a sample,


2. RNA is converted to cDNA fragments via reverse-transcription and
fragmentation,
3. a high-throughput sequencer is used to generate millions of reads
from the cDNA fragments,
4. …

10
Challenges
• Sample
• Purity?, quantity?, quality?
• RNAs consist of small exons that may be separated by large introns
• Mapping reads to genome is challenging
• The relative abundance of RNAs vary wildly
• 105 – 107 orders of magnitude
• Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may
consume the majority of reads
• Ribosomal and mitochondrial genes
• RNAs come in a wide range of sizes
• Small RNAs must be captured separately
• PolyA selection of large RNAs may result in 3’ end bias
• RNA is fragile compared to DNA (easily degraded)
11
Replicates
• Technical Replicate
• Multiple instances of
sequence generation
• Flow Cells, Lanes, Indexes
• Biological Replicate
• Multiple isolations of cells
showing the same
phenotype, stage or other
experimental condition
• Some example
concerns/challenges:
• Environmental Factors,
Growth Conditions, Time
• Correlation Coefficient 0.92-
0.98

12
main steps in RNA-seq

1. RNA is isolated from a sample,


2. RNA is converted to cDNA fragments via reverse-transcription and
fragmentation,
3. a high-throughput sequencer is used to generate millions of reads
from the cDNA fragments,
4. reads are mapped to a reference genome or transcript set with an
alignment tool
5. transcriptome reconstruction
6. and counts of reads mapped to each gene are used to estimate
expression levels.
13
Three RNA-seq mapping strategies

14
15
Which read aligner should I use?
https://www.ebi.ac.uk/~nf/hts_mappers/

16
Features comparison of aligners
https://www.ebi.ac.uk/~nf/hts_mappers/
18
19
https://cole-trapnell-lab.github.io/team/cole-trapnell/
Spliced mappers
• Exon-first • Seed-and-extend
• Exon-first methods map reads first to the • Seed-and-extend methods generally start by mapping part of
genome using an unspliced approach to the reads as kmers or substrings; candidate matches are then
find read-clusters; unmapped reads are extended using different algorithms and potential splice-sites
then used to find connections between are located.
these read-clusters.
• Include:
• Include: • MapNext (Bao et al. 2009),
• TopHat (Trapnell et al. 2009), • PALMapper (Jean et al. 2010),
• MapSplice (Wang et al. 2010a), • SplitSeek (Ameur et al. 2010),
• SpliceMap (Au et al. 2010), • GSNAP (Wu et al. 2010),
• HMMsplicer (Dimon et al. 2010), • Supersplat (Bryant et al. 2010),
• SOAPsplice (Huang et al. 2011), • SeqSaw (Wang et al. 2011),
• PASSion (Zhang et al. 2012), • and STAR (Dobin et al. 2012).
• TrueSight (Li et al. 2012b),
• and GEM (Marco-Sola et al. 2012).

21
22
main steps in RNA-seq

1. RNA is isolated from a sample,


2. RNA is converted to cDNA fragments via reverse-transcription and
fragmentation,
3. a high-throughput sequencer is used to generate millions of reads
from the cDNA fragments,
4. reads are mapped to a reference genome or transcript set with an
alignment tool
5. transcriptome reconstruction
6. and counts of reads mapped to each gene are used to estimate
expression levels.
23
Review of Bowtie/TopHat

24
25
Expectation–maximization algorithm
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
Expectation-maximization
https://www.youtube.com/watch?v=REypj2sy_5U
Expectation-maximization
Aim: Gaussian means and variances
(prior: uniform)

https://www.youtube.com/watch?v=iQoXFmbXRJA
EM algorithm
30
main steps in RNA-seq

1. RNA is isolated from a sample,


2. RNA is converted to cDNA fragments via reverse-transcription and
fragmentation,
3. a high-throughput sequencer is used to generate millions of reads
from the cDNA fragments,
4. reads are mapped to a reference genome or transcript set with an
alignment tool
5. transcriptome reconstruction
6. and counts of reads mapped to each gene are used to estimate
expression levels.
31
http://yourgene.pixnet.net/blog/post/99023045-%E8%BD%89%E9%8C%84%E9%AB%94%E9%87%8D%E5%BB%BA%E8%88%87%E5%9F%BA%E5%9B%A0%E9%AB%94%E5%BA%8F%E5%88%97%E5%B7%B2%E7
%9F%A5%E7%89%A9%E7%A8%AE%E7%9A%84rna%E5%AE%9A%E5%BA%8F
33
main steps in RNA-seq

1. RNA is isolated from a sample,


2. RNA is converted to cDNA fragments via reverse-transcription and
fragmentation,
3. a high-throughput sequencer is used to generate millions of reads from the
cDNA fragments,
4. reads are mapped to a reference genome or transcript set with an alignment
tool
5. transcriptome reconstruction
6. and counts of reads mapped to each gene are used to estimate expression
levels
7. …
34
What is FPKM (RPKM)

Mortazavi, A. et.al. (2008). Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods,
5(7):621--628.

35
RPKM example
million million

Total exon reads=18 Total exon reads=2


Mapped reads=18+2=20 million Mapped reads=18+2=20 million
Exon length=9 KB Exon length=1 KB
RPKM=18/(20*9)=0.1 RPKM=2/(20*1)=0.1

http://yourgene.pixnet.net/blog/post/69572975-rpkm-%E7%B0%A1%E4%BB%8B
36
TPM
https://www.youtube.com/watch?v=TTUrtCY2k-w&t=3s
Expression profile
PCA
https://www.youtube.com/watch?v=HMOI_lkzW08

https://www.youtube.com/watch?v=FgakZw6K1QQ
MDS & PCoA

https://www.youtube.com/watch?v=GEn-_dAyYME
Homework
• What’s TPM?
• What’s FPKM?
• What’s the difference between TPM and FPKM?
Any questions?

42

You might also like