0% found this document useful (0 votes)
41 views27 pages

From RNA-seq Reads To Gene Expression

The document summarizes the process of analyzing RNA-seq data from gene expression to identify differentially expressed genes between normal and mutated cell samples. Key steps include: 1) Mapping RNA sequencing reads to a reference genome; 2) Counting reads mapped to each gene; 3) Normalizing read counts to account for differences in sequencing depth; 4) Using statistical tools like edgeR or DESeq2 to identify genes that are differentially expressed between normal and mutated samples based on normalized read counts. The output is a list of genes identified as differentially expressed which can then be further analyzed to validate hypotheses or identify enriched biological pathways.

Uploaded by

HoangHai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views27 pages

From RNA-seq Reads To Gene Expression

The document summarizes the process of analyzing RNA-seq data from gene expression to identify differentially expressed genes between normal and mutated cell samples. Key steps include: 1) Mapping RNA sequencing reads to a reference genome; 2) Counting reads mapped to each gene; 3) Normalizing read counts to account for differences in sequencing depth; 4) Using statistical tools like edgeR or DESeq2 to identify genes that are differentially expressed between normal and mutated samples based on normalized read counts. The output is a list of genes identified as differentially expressed which can then be further analyzed to validate hypotheses or identify enriched biological pathways.

Uploaded by

HoangHai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

From RNA-seq reads to Gene

Expression
Hoang Thanh Hai
Introduction
Bunch Bunch
of of
normal mutated
cells cells

The mutated cells behave differently than the normal cells

What genetic mechanism causing the difference…

Answer: looking at differences in gene expression


Bunch Bunch
of of
normal mutated
cells cells

Bunch of chromosomes Bunch of genes


Bunch
of
cells

Active genes

mRNA level

Non-Active gene
Bunch Bunch
of of
normal mutated
cells cells

RNA seq measure gene >< RNA seq measure gene


expression in normal cells Compare expression in mutated cells

High throughput sequencing tells us which genes are active,


and how much they are transcribed.
Bunch Bunch
of of
normal mutated
cells cells

1. No differences in gene 1 between normal and mutated cells


2. gene 2 is up-regulated in mutated cells
3. Gene 3 is suppressed in mutated cells
Three mains steps for RNA-Seq
Transcriptome profiling using NGS
Step 2: Sequencing

Step 1: Library preparation


Step 3: Data analysis
Step 1: Preparing an RNA-seq library
Step 3: Data analysis
Step 3: Data analysis
From reads to differential expression
1.Raw-data Quality Raw Sequence Data QC by
FASTQ Files FastQC/R

2.Reads Mapping

Unspliced Mapping Spliced mapping


BWA, Bowtie TopHat, MapSplice

Mapped Reads
3.Expression Quantification SAM/BAM Files

Summarize read counts FPKM/RPKM


Cufflinks QC by
RNA-SeQC
4.DE testing

DEseq, edgeR, etc Cuffdiff


List of DE
5.Functional Interpretation
Function Integrate with
Infer networks
enrichment other data

Biological Insights & hypothesis


Raw data for a sample: FASTQ files
Line1: Sequence identifier
Line2: Raw sequence
Line3: meaningless
Line4: quality values for the sequence
Step 1 :low quality reads check

Tools:
• FastQC: checking Information
– total reads, sequence length
– Per base sequence quality
– Overrepresented sequences
– GC content
– Duplication level

• MultiQC: Summary FastQC results


Per base sequence quality

Command: fastqc -o FastQC_Report *fastq.gz


Step 2: Mapping RNA-seq Reads to
genome
Using STAR to align
Step 3: Count the number of reads

Visualize mapping results by Artemis


Step 3: Count the number of reads
• Count the reads per gene -> matrix of number

First column -> genes names Remaining columns -> number of counts
for each sample
Could we state which genes are up-regulated or down-regulated based
or the direct number of counts -> No
Normalize data before comparision

Sample #1 has 635 reads assigned to it Sample #2 has 1270 reads assigned to it.
twice as many reads as Sample #1
However, the read counts make it look like the genes in
Sample #2 were transcribed twice as much as in Sample #1
Normalize data before comparison

Adjust the read counts per gene to reflect differences in how many reads
were assigned to each sample

There are many sophisticated ways -> The simplest method is to just divide the
read counts per gene by the total mapped to each sample (cpm)
Step 4: Differential expression testing
First thing in any DE testing is always the same:
Plot the data

The data is a huge matrix…


Plot samples data
PCA plot

But we have thousands of genes…


So we would need a graph with thousands/2 axes to plot the raw data…

PCA reduces the number of axes you need to display the important
aspects of the data.
Exclude

The wild-type samples make a nice The mutated samples make a nice
cluster in the left side cluster in the right side

• When PC1 are the most important differences -> this mean
biggest differences are between the WT and the MT samples.

An Example
In summary, plotting the data…
• Tells us if we can expect to find interesting
differences.
• Tells us if we should exclude some samples
from any down stream analysis.
Identify differentially expressed genes between
the “normal” and “mutant” samples.
• This is typically done using R with either edgeR or DESeq2,
and the results are generally displayed using this sort of graph
A Red dot is a gene that is different between “normal” and “mutant” samples
Black dots are genes that are the same.
The X-axis tells you how much each gene is transcribed.
The Y-axis tells you how big the relative difference is
between “normal” and “mutant”.
We’ve identified interesting genes, now what?

• If you know what you’re looking for, you can see if the experiment
validated your hypothesis.
• If you don’t know what you’re looking for, you can see if certain
pathways are enriched in either the normal or mutant gene sets.

You might also like