From RNA-seq Reads To Gene Expression
From RNA-seq Reads To Gene Expression
Expression
Hoang Thanh Hai
Introduction
Bunch Bunch
of of
normal mutated
cells cells
Active genes
mRNA level
Non-Active gene
Bunch Bunch
of of
normal mutated
cells cells
2.Reads Mapping
Mapped Reads
3.Expression Quantification SAM/BAM Files
Tools:
• FastQC: checking Information
– total reads, sequence length
– Per base sequence quality
– Overrepresented sequences
– GC content
– Duplication level
First column -> genes names Remaining columns -> number of counts
for each sample
Could we state which genes are up-regulated or down-regulated based
or the direct number of counts -> No
Normalize data before comparision
Sample #1 has 635 reads assigned to it Sample #2 has 1270 reads assigned to it.
twice as many reads as Sample #1
However, the read counts make it look like the genes in
Sample #2 were transcribed twice as much as in Sample #1
Normalize data before comparison
Adjust the read counts per gene to reflect differences in how many reads
were assigned to each sample
There are many sophisticated ways -> The simplest method is to just divide the
read counts per gene by the total mapped to each sample (cpm)
Step 4: Differential expression testing
First thing in any DE testing is always the same:
Plot the data
PCA reduces the number of axes you need to display the important
aspects of the data.
Exclude
The wild-type samples make a nice The mutated samples make a nice
cluster in the left side cluster in the right side
• When PC1 are the most important differences -> this mean
biggest differences are between the WT and the MT samples.
An Example
In summary, plotting the data…
• Tells us if we can expect to find interesting
differences.
• Tells us if we should exclude some samples
from any down stream analysis.
Identify differentially expressed genes between
the “normal” and “mutant” samples.
• This is typically done using R with either edgeR or DESeq2,
and the results are generally displayed using this sort of graph
A Red dot is a gene that is different between “normal” and “mutant” samples
Black dots are genes that are the same.
The X-axis tells you how much each gene is transcribed.
The Y-axis tells you how big the relative difference is
between “normal” and “mutant”.
We’ve identified interesting genes, now what?
• If you know what you’re looking for, you can see if the experiment
validated your hypothesis.
• If you don’t know what you’re looking for, you can see if certain
pathways are enriched in either the normal or mutant gene sets.