Tutorial RNA-Seq Analysis Part 1
Tutorial RNA-Seq Analysis Part 1
CLC bio
Finlandsgade 10-12 8200 Aarhus N Denmark
Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19
www.clcbio.com [email protected]
Tutorial: RNA-Seq analysis part I: Getting started
Subset of the full data set This file can be imported using the standard import and includes a
subset of the full data set including a region of chromosome 16 for use as a reference.
When running the full data set, we extracted all the reads that matched the genes of this
part of chromosome 16. Download and import this data set (using the normal import) for
use in these tutorials.
Experiments with the full data set Later on, we will work on experiments generated from the full
data set. Download and import this data set (using the normal import) for use in these
tutorials.
Once downloaded and imported, you should have the following folders and data in the Navigation
Area (see figure 1).
Figure 1: The subset of the full data set has been imported together with the experiments generated
from the full data set.
P. 2
Tutorial: RNA-Seq analysis part I: Getting started
Click Next when the data is listed in the right-hand side of the dialog.
You are now presented with the dialog shown in figure 3.
Since we are using (part of) the ref-seq annotated mouse genome, choose Use reference with
annotations. Click ( ) to select the reference sequence NC_000082 subset.
Click Next where you can set parameters for the mapping. Leave these settings at their default
- we will focus on these later on. (You can set the parameters to default by clicking the button
( ) at the bottom of the dialog, but then you will have to define the reference sequence again).
P. 3
Tutorial: RNA-Seq analysis part I: Getting started
The choice between Prokaryote and Eukaryote is basically a matter of telling the Workbench
whether you have introns in your reference. In order to select Eukaryote, you need to have
reference sequences with annotations of the type mRNA (this is the way the Workbench expects
exons to be defined). The reference sequence provided with this tutorial includes mRNA
annotations (they are the green annotations), so you select Eukaryote in this wizard.
Below you can specify settings for discovering novel exons. We will investigate this in detail later
on.
Clicking Next will allow you to specify the output options as shown in figure 5.
P. 4
Tutorial: RNA-Seq analysis part I: Getting started
Uncheck the Create list of un-mapped reads, Create report and Make log and click Finish.
The standard output is a table showing mapping statistics on each gene.
The Expression values column is per default based on the RPKM. Change the measure to use
Total exon reads instead by clicking at the bottom of the view (we will go into more details with
expression measures in part II). Now sort the table on the new expression value by clicking the
column header twice. Find the Ahsg gene (4th from the top of the list) and double-click.
When the result is open, you need to do a few customizations to make the view better suited
for interpretation. In the Side Panel, under Text format, set the font size to small or tiny. To
save these customizations so that they take effect next time you open a mapping, click the
Save/Restore Settings button ( ) at the top of the Side Panel and click Save Settings. Give
your settings a name and make sure the check box to Always apply these settings is checked.
Double-click the tab of the view (or press Ctrl + M) to Maximize the view and click Fit Width ( )
in the tool bar to zoom out to see the full gene. You should now have a view similar to figure 7.
You can now see distinct peaks of coverage below the exons which are marked in green. Scroll
slowly down on the scroll bar at the right hand side of the view. You will begin to see reads that
have been mapped across exon-exon boundaries.
Click Zoom in ( ) and click-and-drag a rectangle around one of the exons. In this way you can
zoom in to see more details of a particular exon. If you zoom all the way in, you will be able to
see the nucleotide level and the alignment of the reads.
P. 5
Tutorial: RNA-Seq analysis part I: Getting started
Tutorial
Close the view and go back to the RNA-seq sample. In the 'Transcripts' column you can see that
the Ahsg gene only has one transcript annotated. Use the Advanced filter ( ) at the upper
right hand part of the RNA-seq sample table view) to identify genes with more than one transcript
annotated (set the filter to Transcripts > 1 and press Apply as shown in figure 8).
Figure 8: Using the advanced filter to only show genes with more than one annotated transcript.
The Fetub gene has three transcripts annotated. Open the mapping for this gene and press Fit
width ( ) to zoom out completely and get an overview of the mapping to this gene.
One of the three transcripts annotated for Fetub uses a different first exon from the other two
transcripts. There is no coverage in this exon at all, and thus no evidence for expression of the
alternative first exon isoform. The other two transcripts have the same first exon but one skips
P. 6
Tutorial: RNA-Seq analysis part I: Getting started
the second exon of the other. You can see both reads that span from exon 2 to exon 3 and reads
that span from exon 2 to exon 4. Thus, there is evidence for both of these splice variants (see
figure 9).
Tutorial
Close the view and you are ready for part II: Non-specific matches and expression values.
P. 7
Bibliography
Tutorial
[Mortazavi et al., 2008] Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold,
B. (2008). Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods,
5(7):621--628.