Alternative Splicing
Alternative Splicing
Alternative Splicing
Prasoon K Thakur, Institute of Molecular Genetics of the ASCR, Prague Czech Republic
Hukam C Rawal, ICAR – National Research Centre on Plant Biotechnology, New Delhi, India
Mina Obuca, Institute of Molecular Genetics of the ASCR, Prague, Czech Republic
Sandeep Kaushik, European Institute of Excellence on Tissue Engineering and Regenerative Medicine, Guimaraes, Portugal and
University of Minho, Braga, Portugal
r 2019 Elsevier Inc. All rights reserved.
Introduction
Genes
Genes are genomic regions that contain necessary information for the expression of a protein or a molecule that ultimately helps in
the survival, reproduction and function of the organism. The information stored in the DNA sequence of a gene is first transcribed
into an RNA molecule, by RNA polymerase. For the protein coding genes, this transcript is called a messenger RNA (mRNA) which is
then translated to synthesize proteins. The nascent RNA (pre-mRNA) transcripts contain intervening sequences, known as the introns,
that do not become part of the final mRNA (Gilbert, 1978). The regions of pre-mRNA that are retained and ligated for translation are
known as exons. The introns are removed from nascent RNA by a mechanism known as splicing. The spliced mRNA molecule forms a
continuous protein-coding region ready to be translated into a protein molecule. Nearly all eukaryotes share the presence of introns
and the mechanism of RNA splicing. The presence of introns plays a major role in facilitating recombination of RNA sequences and
helps in developing protein diversity (Rogozin et al., 2012). For a better understanding of the bioinformatics tools and approaches
used to study alternative splicing, it is necessary to have a brief overview of the elements and mechanisms involved in splicing.
Fig. 1 Alternative splicing mechanism and events. Splicing factors recognize different sequences along the pre-mRNA. Conserved recognition sequences
include intron boundaries named 5 ss and 3 ss. These elements interact with the spliceosome, recruiting snRNPs (U1, U2, U4, U5 and U6). Other regulatory
elements are embedded in the exon (ESE, ESS) or in the intron (ISE) and could be recognized by proteins of the SR and/or hnRNP families.
The splicing reaction is a coordinated series of RNA–RNA, RNA–protein and protein–protein interactions (Trowitzsch et al., 2009;
Hoskins and Moore, 2012). First, U1 binds to the 50 splice site of an exon and U2 binds near branch point sequence (BPS) just
upstream of the 30 splice site of the adjacent exon (Peled-Zehavi et al., 2001). The U1 and U2 binding is important for the intron
definition, thus for the accuracy of the splicing reaction. Later, a tri-snRNP complex, composed of U4/U6 and U5, joins in and leads
to the formation of an active complex that catalyzes splicing. Once the splicing is over, the spliceosome disassembles and all
components are recycled for future splicing reactions (Hnilicova and Stanek, 2011). Each splicing alternative is
regulated through an interplay between constitutive splicing motifs, such as 50 splice sites, branch points, poly-pyrimidine tracts and
30 splice sites and components of the core splicing machinery. Exons and introns often contain cis-regulatory sequences, intronic
enhancers or silencers that affect the splicing in a positive and negative manner. Splicing regulatory sequences (enhancer or silencer)
are usually 6–10 base pair (bp) in length and induce binding of specific regulatory proteins such as serine/arginine rich (SR) proteins
or heterogeneous nuclear ribonucleoproteins (hnRNPs) (Black, 2003; Wang and Burge, 2008; Chasin, 2007; Han et al., 2010).
Types of Splicing
Splicing of a multi-exon pre-mRNA transcripts leads to two or more distinct mRNA products (Black, 2003; Wang and Burge, 2008;
Nilsen and Graveley, 2010; Calarco et al., 2011). Based on the complexity of the events, two different types of splicing have been
described: constitutive splicing and alternative splicing (van den Hoogenhof et al., 2016). Constitutive splicing (CS) is the process
where the introns are removed from the pre-mRNA and the exons are joined together to form a mature mRNA. On the other hand,
alternative splicing (AS), is the process where exons can be included or excluded in different combinations to create a diverse range
of mRNA transcripts from a single pre-mRNA (Nilsen and Graveley, 2010). Computational and experimental studies have
concurred that splicing signals at AS sites are weaker, the length of AS exons are shorter than that of CS exons, skipped exons tend
to preserve the reading frame at a greater frequency than CS, and orthologous AS are more conserved than orthologous CS
(Zheng et al., 2005; Clark and Thanaraj, 2002; Garg and Green, 2007). Because of a rising interest towards AS in past years, this
article is focused on it.
exons can either be included or excluded from the mature mRNA (Zhang et al., 2016). In case of intron retention, the excision of
an intron can be suppressed, which results in the retention of the entire intron and exons can be extended or shortened through
the use of alternative 50 or 30 splice sites. In contrast, alternative promoters and alternative poly-A sites are alternative selection of
transcription start sites or poly-A sites and are not due to alternative splicing. Incompletely spliced transcripts may contain intron
fragments that can be inaccurately identified as intron retention, so it is hard to distinguish them from experimental artifacts. In
general, intron retention is the most difficult AS event to detect. Many genes have multiple alternative splicing events with complex
combinations of exons and produce diverse transcript isoforms. Most interesting example is, Drosophila melanogaster’s Down
syndrome cell adhesion molecule (Dscam) gene contains 20 constitutive and 95 cassette exons. These 95 cassette exons can express
38,016 different mature mRNAs (Graveley et al., 2004; Missler and Sudhof, 1998).
Tissue-Specific AS in Humans
AS is considered an important mechanism for tissue-specific expression of transcript isoforms in humans. Recent EST based
analyses have shown that the human brain contains highest fraction of alternatively spliced genes, followed by the liver and testis
whereas human muscle, uterus, breast, stomach and pancreas were observed to have the lowest levels of genes undergoing AS
(Lee et al., 2003; Xu et al., 2002; Stamm et al., 2000). Moreover, human AS events have different usage pattern of alternative 50
splice site exons (A5Es) or alternative 30 splice site exons (A3Es). These studies suggest that the fraction of genes containing A3Es
and A5Es have significantly high in the liver as in any other human tissue (Yeo et al., 2004). Brain was observed to be the tissue
with the second highest level of alternative usage for both 50 splice sites and 30 splice sites whereas muscle, uterus, breast, pancreas
and stomach have shown the lowest level of A5Es and A3Es (Yeo et al., 2004). AS events variation in human tissues are primarily
regulated by trans-acting factors that bind on exonic and intronic cis-acting RNA elements (CAEs). A computational study using
publicly available Affymetrix Genechip Human Exon Array dataset identified 652 cis-acting RNA elements (CAEs) across 11 human
tissues (Wang et al., 2009). Approximately, one third of all predicted CAEs matched with exonic splicing regulator databases and
the vast majority of predicted CAEs were observed in the intronic regulatory regions. Most of these CAEs contribute to the AS
between two tissues, while some are important in multiple tissues. Overall, the analysis suggests that genome-wide AS patterns are
regulated by a combination of tissue-specific cis-acting elements and "general elements" whose functional activities are important
but differ across multiple tissues (Wang et al., 2009).
Until recently, systematic analysis of AS was done using expressed sequence tags (EST) (Gupta et al., 2004; Sorek et al., 2004;
Xie et al., 2002) or splicing-specific microarrays (Castle et al., 2008; Clark et al., 2002; Pan et al., 2008). However, the detection of
alternative splicing events has been significantly improved by next-generation sequencing (NGS). From this data, AS is identified
through bioinformatics approaches by aligning the EST sequences with mRNA and genomic sequences using sequence alignment
tools (Kim and Lee, 2008). Genomic sequence acts as the main reference sequence for detection and validation of AS events. With
the availability of high-throughput techniques, multiple genomes have been sequenced and it has become feasible to study AS on
a genomic scale (Pan et al., 2008; Wang et al., 2008). Therefore, a large number of alternative transcripts have been discovered
along with extraction of distinctive features of alternatively spliced exons using bioinformatics (Roy et al., 2013). With the
224 Bioinformatics Approaches for Studying Alternative Splicing
advances in tools used to study AS, it has also become easier to uncover the splicing dysregulations that lead to diseases. It is
observed that dysregulation of alternative splicing affects various human conditions, including cancers (Ghigna et al., 2010;
Yap and Makeyev, 2013; Mills and Janitz, 2012; Poulos et al., 2011; Mittendorf et al., 2012; Manetti et al., 2011; Endo-Umeda et al.,
2012; Medina and Krauss, 2013; Bogdanov, 2006; Lara-Pezzi et al., 2012; Cooper, 2005; Miura et al., 2011; Sampath and Pelus,
2007; Hagen and Ladomery, 2012; Yi and Tang, 2011; Omenn et al., 2010). Databases like dbSNP (see “Relevant Websites
section”) and ssSNPTarget (see “Relevant Websites section”) help to search for the splice site SNPs located in the genes of interest
for the identification of any association to diseases (Tang et al., 2013). These tools prove helpful in developing new splicing-
targeted drugs or therapeutics.
AS detection usually involves three distinct stages. During the first step, transcript and genomic sequence data are processed to
eliminate repetitive or ambiguous sequences. Poly-A tails are also removed during this pre-processing even though there may be some
genuine poly-A genomics sequences. Second, the transcript sequences are aligned to the genomic sequence and “gene models” are
deduced. The alignment to genomic sequence can be performed using tools like BLAT, GMAP or SPA (Kent, 2002; Wu and Watanabe,
2005; van Nimwegen et al., 2006) with or independent of SIM4 (Florea et al., 1998). Next, several possible alignments for a given
sequence are screened out by choosing only the best hit as measured by percent identity and alignment coverage. Rest of the unspliced
alignments can be removed if strictly focused on AS detection. Third, each of the alternative splicing events are identified and various
alternative “gene models” are constructed. In order to exclude artifacts, individual AS events only consist of a pair of genomically non-
overlapping splicing events. From here, liberal methods generate all possible combinations of the splicing events (splicing graphs)
whereas more conservative methods seek only a minimal set of isoforms. Below, we present an overview of the role of bioinformatics
approaches and tools in detecting alternative splicing along with its relevance in present era and challenges involved.
AS detection using sequence alignment: We can identify alternative splicing events by aligning ESTs with genomic and mRNA
sequences using alignment tools like BLAST, BLAT (BLAST-Like Alignment Tool) or ClustalW. Using the alignment between ESTs
and genomic sequences, one can detect the locations of exons and introns, and then by comparing their structures we can identify
the alternative splicing events. For identifying alternative splicing by sequence alignment, we can further use alignment tools like
GMAP or SPA to correct the genome alignments and generate valid alignments. Once we have identified the alternative splicing
events, we can construct full-length alternatively spliced transcripts using graphical display tools including “Splice graph”.
Although these alignments based methods are rapid, possible limitations that could affect the analysis include high sequencing
errors, contamination, misalignments, low sequence coverage of ESTs.
AS detection using sequence conservation: AS events are conserved among different organisms e.g., human and mouse, and
therefore are of biological importance. Comparative genomics can provide us different means to predict the conserved exons that
were spliced alternatively featuring the evolutionarily conserved alternative splicing events. As alternative exons have compara-
tively higher conservation level than constitutive exons in flanking intronic regions, hence can be used as good identifier for the
alternative splicing (Chen, 2011).
AS detection using microarray data analysis: The above mentioned two approaches of sequence alignment and comparative
genomics for identification of alternative splicing events, provides us only with the existence of an alternative splicing event but
neither the degree nor the regulation of these alternative splicing events. The microarray study can provide us quantity of an
alternative splicing event for a particular stage, condition or tissue of a cell. Previously, Affymetrix exon arrays were used by
researchers to identify tissue-specific exons (Clark et al., 2007) and differentially expressed AS between different human-cells
(Yeo et al., 2007). Some of the tools or methods that have been used widely for the alternative splicing microarray data analysis
includes the splicing index calculation, Analysis of splice variation (ANOSVA), Finding isoforms using robust multichip analysis
(FIRMA), a gene structure-based splice variant deconvolution method (DECONV), splicing prediction and concentration esti-
mation (SPACE), and Generative model for the alternative splicing array platform (GenASAP). The details along with pros and
cons of these methods is well described (Chen, 2011).
AS detection using RNA-seq data analysis: Alternative splicing is considered as main mechanism governing protein diversity and
gene regulation. With the advancement of RNA-seq technology. It is possible to analyze the global impact and regulation of this
biological process. There are increasing number of studies illustrates that the selection of wrong splice sites causes human disease.
Therefore, the identification and quantification of differentially spliced transcripts are crucial for RNA-seq analysis.
Splicing Efficiency
Splicing efficiency (SE) (or splicing score; or splicing index) is used for the quantification of alternative splicing. SE is calculated as the
ratio between the amounts of (spliced) mRNA and the (nascent) pre-mRNA. The conventional approach of using real-time quan-
titative PCR (RT-qPCR) with primers spanning exon-intron and exon-exon junctions (Hao and Baltimore, 2013) to determine SE is
feasible for only a limited number of genes. However, with the availability of strong computational hardware and bioinformatics
tools that can analyze the large amounts of RNA-seq data, it is now possible to determine genome-wide SE. Using RNA-seq data,
several approaches have been used for calculating SE based on RNA-seq read counts from intronic, exonic or exon-exon junctions
(Herzel and Neugebauer, 2015; Volanakis et al., 2013). Below, we provide the main approaches for calculating SE (or scores):
1) Exon-centered splicing score (ECSS): ECSS is calculated using the splicing frequency around a given exon, by subtracting the
read coverage over 2 kb of the upstream intron from the read coverage over 2 kb of the 50 end of the downstream intron
(Pandya-Jones, 2011; Tilgner et al., 2012).
Bioinformatics Approaches for Studying Alternative Splicing 225
2) Intron-centered splicing score (ICSS): ICSS for each intron is calculated as the ratio of reads around the 30 splice site. The read
coverage over the last 25 bp of a given intron is divided by the read coverage over the first 25 bp of the downstream exon
(Carrillo Oesterreich et al., 2010).
3) Gene-based splicing score (GBSS): It is the splicing frequency for each gene, by dividing the read coverage over exons by the read
coverage over the whole locus (Bhatt et al., 2012).
ECSS is optimal for alternative cassette exon usage. However, it is not suitable for the first and terminal exons which have to be
analyzed differently. Using ICSS, first and terminal exons can be included in the analysis but short exons are a disadvantage. In
GBSS, the noise is reduced by the involvement of many reads in the analysis. However, this approach neither provides SE for
individual introns nor any information about AS events per gene. The user may refer to Brugiolo et al for further discussion
regarding calculation of splicing efficiency using bioinformatics analysis (Brugiolo et al., 2013).
Several software tools have been developed for the identification of alternatively spliced exons and isoforms during the last decade.
Herein, we are highlighting basic characteristics of some the commonly used tools:
AStalavista (2007): AStalavista (Alternative Splicing transcriptional landscape visualization tool) is the first tool (a web server)
for the dynamically and exhaustive extraction of complex AS events from annotated genes in order to compare different types of AS
and distributions (Foissac and Sammeth, 2007). It is a JAVA-based tool available as a web server provided in “Relevant Websites
section” and the latest version can be downloaded from the website provided in “Relevant Websites section”. For any given set of
transcripts annotated with known exon–intron structure, AStalavista first performs an exhaustive pairwise comparison between all
transcripts in a locus. To ensure an exhaustive detection of AS events, it pools all transcripts from a single transcriptional locus that
overlap on the same strand of the genome sequence. It then dynamically assigns an AS code to the splicing events observed based
on the relative position of splice sites. This code helps in automatic identification and exhaustive extraction of variations in the
exon–intron structure (Sammeth et al., 2008). AS events of the same type are given an identical code and classified in the same
structural group whereas a new, concise and univocal AS code is assigned to each of the variant splicing structures. This generic
protocol is applicable to any genome with or without annotation. Hence, AStalavista has an advantage over other methods that
otherwise require a predefined splice form (as a reference transcript) for comparison. It detected more than 24,000 AS mechan-
istically un-elucidated events involving more than two alternatives in humans (Sammeth, 2009). Using this analysis, AStalavista
generates a ranked list for each detected event type with its unique code (relative-position notation) and builds a AS landscape (in
the form of a pie chart) for the given set of transcripts. This landscape of AS events can be used to investigate the transcriptome
diversity across genes, chromosomes, and species.
AltAnalyze (2010): AltAnalyze (see “Relevant Websites section”) is an comprehensive tool the for performing the analysis of
alternative splicing data from Affymetrix Exon and Gene Arrays and their functional prediction at proteins and domains level
(Emig et al., 2010). It requires neither any programming knowledge nor any exon-array analysis expertise. It was designed for
226 Bioinformatics Approaches for Studying Alternative Splicing
computing alternative exon statistics based on the widely used rigorous statistical methods like FIRMA and MiDAS using ‘detection
above background’ (DABG) P-value thresholds with other alternative exon analysis parameters for any number of raw Affymetrix
CEL files. AltAnalyze provides outputs in tab-delimited text file format which be either opened with Microsoft Excel like
spreadsheet programs or can be visualized with DomainGraph (see “Relevant Websites section”) for further analysis.
GPSeq (2010): GPSeq is a tool to analyze RNA-seq data to estimate gene and exon expression, identify differentially expressed
genes (DEGs), and differentially spliced exons (DSEs) through log-likelihood ratio approaches (Srivastava and Chen, 2010). It is
based on a two-parameter (i.e., y and l) generalized Poisson (GP) model to fit to the position-level read counts across all of the
positions of a gene/exon. Here the estimated parameter y represents the transcript amount for the gene and l represents the
average bias during the sample preparation and sequencing process. After normalization of mapped reads a likelihood ratio test is
used to identify DEGs or DSEs by treating the l estimates as true values. Moreover, a simulation strategy can be adopted to better
estimate the P-values. It deals with the fundamental problem of the distribution of the position-level read counts by proposing a
two-parameter generalized Poisson (GP) model. This GP model fits the data much better than the traditional Poisson model by
separating true signals from sequencing bias and aids in the better estimation of gene or exon expression, in performing a more
reasonable normalization across different samples, and hence improve the identification of DEGs as well as DSEs. More
importantly, it can deal with multiple RNA-seq data sets. The codes for the GP model were written in C and ‘GPseq’ is an R-
package to implement all these methods available for download at the website provided in “see Relevant Websites section”. This
tool is not available anymore.
MISO (2010): Mixture of Isoforms (MISO) is a statistical model that quantitates the expression level of alternatively spliced
genes from RNA-seq data, and identifies differentially regulated isoforms or exons across samples (Katz et al., 2010). MISO model
uses Bayesian inference by modeling the generative process to reproduce the reads from isoforms in RNA-seq for calculating the
probability that a read originated from a particular isoform. It treats the expression level of a set of isoforms as a random variable
and estimates a distribution over the values of this variable using a sampling based algorithm known as Markov Chain Monte
Carlo (“MCMC”). These estimates can be quantified by the confidence intervals. It not just estimates the expression level of a single
alternatively spliced exon (“exon-centric”), or of each transcript belonging to a gene (“isoform-centric”), but can quantify the levels
of multiple isoforms produced by several nearby alternative splicing events. ISO is available as a Python package at the website
provided in “Relevant Websites section”. It outputs the results as the exon/isoform expression levels in each sample along with
confidence intervals which can be visualize alongside the RNA-seq data with sashimi-plot.
JuncBASE (2011): JuncBASE is used to identify and classify alternative splicing events from RNA-seq data (Brooks et al., 2010).
JuncBASE is available from the GitHub repository located at the website provided in “Relevant Websites section”. Alternative
splicing events are identified from splice junction reads from RNA-seq read alignments and annotated exon coordinates. JuncBASE
also uses read counts to quantify the relative expression of each isoform and identifies splice events that are significantly differ-
entially expressed across two or more samples.
SpliceTrap (2011): SpliceTrap is a method to quantify local exon inclusion levels by estimating the expression-levels of each
exon using paired-end RNA-seq data (Wu et al., 2011). SpliceTrap generates alternative splicing profiles for different splicing
patterns, such as exon skipping, alternative 50 or 30 splice sites, and intron retention. It can also identify major classes of alternative
splicing events under a single cellular condition, without requiring a background set of reads to estimate relative splicing changes.
It utilizes a comprehensive human exon database called TXdb (see section “Bioinformatics Approaches for AS Detection”) to
estimate the expression level of every exon as an independent Bayesian inference problem. Unlike microarray-based methods,
SpliceTrap relies on RNA-seq, and therefore it can determine the inclusion level of every exon within a single cellular condition,
without requiring a background set of reads. Compared to Cufflinks and Scripture, it was shown to improve the accuracy,
robustness and reliability in quantifying a large fraction of AS activity. SpliceTrap is useful in studying changes at the single-exon
levels and can be helpful in the discovery of nearby cis-regulatory elements in diverse applications. It can also be implemented
online through the CSH Galaxy server the website provided in “Relevant Websites section” and is also available for download and
installation at the website provided in “Relevant Websites section”.
MATS (2012): MATS is a computational tool to detect differential AS events from RNA-seq data (Shen et al., 2012). The
statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between
two conditions exceeds a given user-defined threshold. From the RNA-seq data, MATS can automatically detect and analyze AS
events corresponding to all major types of AS patterns. MATS handles replicate RNA-seq data from both paired and unpaired study
design and has been designed to handle two RNA-seq samples.
SpliceSeq (2012): SpliceSeq is a Java based application, freely available at the website provided in “Relevant Websites Section”,
for visualization and quantitation of RNA-seq reads for alternative splicing and to identify potential functional changes resulted
from splice variation (Ryan et al., 2012). It aligns RNA-seq reads to gene splice graphs for accurate analysis of large and complex
transcript variants. Firstly, an alignment database is generated from the imported RNA-seq data by using Bowtie and then
SpliceSeq aligns reads to the splice graphs or gene models. Further, it evaluates the patterns of predicted transcript splicing for AS
events and classifies these events in different types including exon skip, cassette exons, alternate promoter etc. The resulted
alternative protein sequences are predicted with the weighted traversal of each gene's splice graph.
SplicingViewer (2012): SplicingViewer is an integrated tool, freely accessed at the website provided in “Relevant Websites
section” for detecting the splice junctions from known gene models or RNA-seq data, annotating the alternative splicing patterns
using the splice junctions and visualizing these patterns (Liu et al., 2012). Firstly, the RNA-seq short reads are mapped to the
provided or selected reference genome using aligners like BWA, Bowtie and SOAP. The mapped data are then converted to SAM/
Bioinformatics Approaches for Studying Alternative Splicing 227
BAM format by SAMtools to be used in SplicingViewer as input for further analysis and display alternative splicing patterns and the
RNA-seq mapping result with a user-friendly interface, in a memory efficient and quick manner.
ASprofile (2013): ASprofile is computational framework for the identification of AS events in different RNA-seq samples (Florea
et al., 2013). It comprises of three main programs for extracting (extract-as), quantifying (extract-as-fpkm) and comparing (collect-
fpkm) AS events from transcripts assembled from RNA-seq data in multiple conditions. First, extract-as program takes as input a
GTF transcript file and compares all pairs of transcripts within a gene to determine exon-intron structure differences that indicate
an AS event. Second, extract-as-fpkm calculates the FPKM of each AS event from those of transcripts. Finally, collect-fpkm collects the
FPKM event values for all RNA-seq samples, calculates and compares splicing ratios across samples.
ASprofile are providing following types of AS event such as exon skipping (SKIP), cassette exons (MSKIP), alternative transcript
start and termination (TSS, TTS), retention of single or multiple introns (IR, MIR), and alternative exon (AE). ASprofile has been
implemented using RNA-seq data from Illumina’s Human Body Map project and authors have provided global view of alternative
splicing events in 16 different human tissues. The list of all AS events catalog and the ASprofile software are freely available and can
be access through their web site the website provided in “Relevant Websites section”.
DiffSplice (2013): DiffSplice is tools for the identification and quantification of alternative splicing events (Hu et al., 2013). This
software is available at the website provided in “Relevant Websites section”. This approach starts with the identification of
alternative splicing module (ASMs) from the splice graph that created directly from the exons and introns predicted from RNA-seq
read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is
compared across sample groups. DiffSplice takes as input the SAM files and results are summarized as a decomposition of the
genome. It can be visualized using the UCSC genome browser. The approach does not depend on transcript or gene annotations. It
evades the need for full transcript inference and quantification, which is a usually difficult because of short read lengths, as well as
various sampling biases.
DSGseq (2013): DSGseq is method to identify differentially spliced genes between two groups of samples by comparing
read counts on all exons (Wang et al., 2013). DSGseq software is available at the website provided in “Relevant Websites
section”. DSGseq tools is use negative binomial (NB) distribution to model sequencing reads on exons, and propose a NB-
statistic to detect differentially spliced genes between two groups of RNA-seq samples by comparing read counts on all exons.
It produces the tabular output the differences in the relative abundance of the isoforms of each gene in two groups of samples
and relative abundance of the isoforms of each exon in each gene. It also ranks the exon that the most significant difference.
This is a novel exon-based approach and it does not need isoform composition information and isoform expression esti-
mation. Experiment on simulated and real RNA-seq data shows that this method has good performance and applicability.
DSGseq method can identify the exons that contribute the best to the differential splicing. It can also identify previously
unknown alternative splicing events.
GLiMMPS (2013): GLiMMPS (Generalized Linear Mixed Model Prediction of sQTL) is a robust statistical method for detecting
splicing quantitative trait loci (sQTLs) from RNA-seq data (Zhao et al., 2013). It works at a low false positive rate and characterizes
the genetic variation of alternative splicing. It takes into account the individual variation in sequencing coverage and the noise
prevalent in RNA-seq data. To begin with, it needs a set of identified AS events and RNA-seq reads mapped to splice junctions
detected for the estimation of the exon inclusion levels. GLiMMPS uses the reads information from both exon inclusion and
skipping isoforms to model the estimation uncertainty of exon inclusion level. The source code used to be available from the
website provided in “Relevant Websites section”. However, currently, it appears to be discontinued.
SpliceR (2014): SpliceR is an R package for classification of alternative splicing and prediction of coding potential (Vitting-
Seerup et al., 2014). SpliceR is implemented as an R package based on standard Bioconductor classes and is freely available from
the Bioconductor repository (see “Relevant Websites section”). spliceR uses the full-length transcript output from RNA-seq
assemblers (like Cufflinks) to detect single or multiple exon skipping, alternative donor and acceptor sites, intron retention,
alternative first or last exon usage, and mutually exclusive exon events. For each gene, spliceR constructs the hypothetical pre-RNA
based on the exon information from all transcripts originating from that gene. Subsequently, all transcripts are compared to this
hypothetical pre-RNA in a pairwise manner. AS events are classified and annotated. It is flexible and can be easily integrated in
existing pipelines or workflows. It predicts coding potential of transcripts, calculates untranslated region (UTR) and open reading
frame (ORF) lengths. Moreover, it also predicts whether transcripts are nonsense mediated decay (NMD)-sensitive based on
compatible annotated start codon positions and their downstream ORF.
GESS (2014): Graph-based exon-skipping scanner (GESS) is computational method to detect de novo exon-skipping events
directly from raw RNA-seq data without the prior knowledge of gene annotation information (Wang et al., 2017). It can also detect
the dominant isoform among the skipping- or inclusion- isoforms, generated for these skipping event sites and presents a more
accurate and comprehensive data of skipping events associated with a particular physiological condition within a cell. GESS tools
is available at the website provided in “Relevant Websites section”. First, a splice-site-link graph is created from the splicing-aware
aligned reads using a greedy algorithm. The sub-graphs are navigated iteratively by implementing a walking strategy to reveal a
pattern corresponding to an exon-skipping event. Finally, the MISO model is implemented to obtain the ratio of skipping isoform
vs. inclusion isoform and the dominated isoform is determined. In this method, the input must be sorted bam file and exon-
skipping sites output from GESS can be use by MISO for C –value (Percent Spliced Isoform) calculation. GESS method is capable
of capturing de novo exon-skipping events directly from using raw RNA-seq reads.
rMATS (2014): As described previously, MATS works for detecting differential AS between two RNA-seq samples. Similarly,
rMATS can be used to detect differential AS from replicate RNA-seq data using a hierarchical model to highlight the sampling
228 Bioinformatics Approaches for Studying Alternative Splicing
uncertainty in individual replicates and the possible variability among them (Shen et al., 2014). rMATS is flexible in testing the
splicing difference above any user-defined threshold. Being a general method for analyzing mRNA isoform ratios from read-
counts, rMATS method can also be used for sequencing-based analyses of other types of mRNA isoform variations including
alternative polyadenylation and RNA editing. The rMATS source code is freely available at the website provided in “Relevant
Websites section”. It takes the raw RNA-seq reads, a genome sequence file, and a transcript annotation file as the input to identify
the alternative splicing events corresponding to all major types of alternative splicing patterns and calculates the P value and FDR
for differential splicing.
FineSplice (2014): FineSplice is a Python wrapped splice junction detection algorithm combined with TopHat2 for a reliable
identification of expressed exon junctions from RNA-seq data (Gatto et al., 2014). FineSplice software is freely available at the website
provided in “Relevant Websites section”. During the first step, it performs the transcriptome alignment with de novo splice junction
discovery using TopHat2 with available annotations for known transcript isoforms. The resulting binary alignment map (BAM) file is
used as input by FineSplice. At next step, it computes the set of split-read overhangs across each junction for each of the uniquely
mapping read. It labels the splice junctions with no matching overhang and defines a potential false positives subset. Then for each
junction it constructs a feature vector based on the log2 deviation. With a defined class label and feature vector, it fits a L1-regularized
logistic regression model over the whole set of junctions. FineSplice outputs a confident set of expressed splice junctions with the
corresponding read counts. Potential false positives arising from spurious alignments are filtered out via a semi-supervised anomaly
detection strategy based on logistic regression. Multiple mapping reads with a unique location after filtering are rescued and reallocated
to the most reliable candidate location. This is conjugate approach for an efficient mapping solution with a semi-supervised anomaly
detection scheme to filter out false positives. It allows reliable estimation of expressed junctions from the alignment output.
Mutations in the splicing signals (50 splice site, 30 splice site and branch point), splicing silencer and enhancer motifs can affect the
proper binding of spliceosome and other RNA binding proteins that changes AS patterns. These changes would lead to altered gene
products and several diseases as a consequence. Therefore, it is important to be able to predict splicing codes that potentially affect
AS. There are several online bioinformatics tools available for the prediction of splicing code that could be useful to understand
human variation on splicing and its consequences in human diseases. These predictions are taking advantage of experimentally
binding data of RNA binding proteins such as SR proteins as well as experimentally validated splicing signals sequences. These
tools take primary sequence as input and predict the splicing regulatory binding sites, properties of exon/intron such as splice site
strength, branch point and other regulatory binding motifs.
Tools for Splice Code Prediction: Table 1 summarizes some of such online bioinformatics tools, like Human Splicing Finder
(HSF), RegRNA, ESEfinder, Alternative Splice Site Predictor (ASSP), SplicePort, EX-SKIP, RESCUE-ESE, Maxentscan and
SROOGLE (Desmet et al., 2009; Huang et al., 2006; Cartegni et al., 2003; Wang and Marin, 2006; Dogan et al., 2007; Raponi
et al., 2011; Fairbrother et al., 2004; Eng et al., 2004; Schwartz et al., 2009). Maxentscan (Yeo and Burge, 2004), SplicePort
(Dogan et al., 2007), HSF (Desmet et al., 2009) and SFmap (Paz et al., 2010) can be used for the prediction of the basic cis-acting
elements (50 splice site, 30 splice site and branch point). All these programs provide a useful indication of whether donor,
acceptor and branch-point are well defined compared with ideal consensus sequences. ESEfinder and RESCUE-ESE are dedicated
to disruption or creation of splicing regulatory elements (SREs) (Cartegni et al., 2003; Fairbrother et al., 2004). In addition,
SFmap server is useful to search for the splicing factor binding motifs in the sequence of interest (Paz et al., 2010). RegRNA and
SROOGLE are servers can be useful in searching for both basic splicing signals and SREs (Huang et al., 2006; Schwartz et al.,
2009). SROOGLE, is one of the most used tools, which provides a graphic output and displays the four core splicing signals with
scores based on nine different algorithms. It also highlights the sequences belonging to 13 different groups of SREs. EX-SKIP
application compares the ESE/ESS profile of a wild-type and a mutated allele to quickly determine which exonic variant has the
highest chance to skip this exon (Raponi et al., 2011).
Performance of Splice Code Predictors: One of the key questions that naturally arises is regarding the performance of these programs
for the identification of possible splicing mutations. In humans, SREs tend to be reasonably conserved (Burset et al., 2001) and the
programs that evaluate their relative strengths seem to be more successful than those that aim to target the much more loosely
conserved SRE elements. MaxEntScan takes the nucleotide dependencies within donor site sequences into account and has been
shown to be the best predictors of cryptic splice site activation in disease-causing mutations (Houdayer et al., 2008). Currently,
SROOGLE is the most used tool that provides information on splice-sites or enhancer/silencer disruption in a single interface
(Schwartz et al., 2009). Some programs such as ESEfinder have worked well in some contexts (Zatkova et al., 2004; Kralovicova and
Vorechovsky, 2007; Wang et al., 2005) and have led to a scientific debate in others (Cartegni and Krainer, 2002; Cartegni et al., 2006;
Fairbrother et al., 2004; Pfarr et al., 2005; Deburgrave et al., 2007). Combination of all these resources still represents the best chance of
“predicting” putative splicing mutations whether in conserved or less conserved regions (Houdayer et al., 2008; Soukarieh et al., 2016).
Therefore, it appears that the future trend in developing such prediction algorithms or software will be to integrate all analyses on a
single platform. This way, it will be easy to obtain most of the in depth details on global splicing signals on the same platter/platform.
Due the progress in this field and improvement in bioinformatics predictions, methodologies related to splicing diagnostics are
enticing. In such methodologies, the information obtained from in silico bioinformatics approaches are translated to wet lab
(in vitro and in vivo) systems to evaluate splicing efficiencies (Baralle and Buratti, 2017).
Bioinformatics Approaches for Studying Alternative Splicing 229
Human splicing finder mRNA sequence Consensus values of potential splice sites http://www.umd.be/HSF3/
and search for branch points
RegRNA mRNA sequence Motifs in mRNA 50 -UTR and 30 -UTR, http://regrna.mbc.nctu.edu.tw/
motifs involved in mRNA splicing, html/prediction.html
motifs involved in transcriptional
regulation, other motifs in mRNA, such
as riboswitches, prediction of the splice
sites, such as splicing donor/acceptor
sites, RNA structural features, such as
inverted repeat, and miRNA target sites
ESEfinder Exonic sequence ESE finder use to find the presence of http://exon.cshl.edu/ESE/
exonic splicing enhancer elements
Alternative Splice Site Raw sequence, DNA/RNA Putative alternative exon isoform, cryptic, http://wangcomputing.com/assp/
Predictor (ASSP) and constitutive splice sites of internal index.html
(coding) exons
SplicePort Multiple or single sequence Splice-site predictions and user can also http://spliceport.cbcb.umd.edu
browse feature associated with the
prediction
EX-SKIP Two exonic sequence (up to 4000bp) It calculates the total number of ESSs, http://ex-skip.img.cas.cz/
ESEs and their ratio
RESCUE-ESE RNA or DNA Sequence (4 k) Exonic splicing enhancers (ESEs) http://genes.mit.edu/burgelab/
prediction rescue-ese/
Maxentscan Each sequence must be the same Building distributions over short sequence http://genes.mit.edu/burgelab/
length motifs maxent/Xmaxent.html
50 splice site sequence [3 bases 50 splice site score http://genes.mit.edu/burgelab/
in exon þ 6 bases in intron] maxent/Xmaxentscan_scoreseq.
html
30 splice site sequence. [20 bases in 30 splice site score http://genes.mit.edu/burgelab/
the intron þ 3 base in the exon] maxent/
Xmaxentscan_scoreseq_acc.
html
SROOGLE Exon along with the introns Graphic display of splicing related data on http://sroogle.tau.ac.il/
DNA segments
SFmap Human genomic sequence or a list Splicing factor binding motifs http://sfmap.technion.ac.il/
of sequences in FASTA format
Concluding Remarks
Field of alternative splicing has become very active and attractive in recent years, especially due to the development of high
throughput technologies. Consequently, a large number of bioinformatics tools are available for various analyses regarding AS. In
fact, it is hard to list every tool developed as of yet, nonetheless, we have reviewed bioinformatics software and algorithms that are
used for the detection of splicing events using RNA-seq or other data. Drawbacks or advantages of each approach has been put
forth. Some popular online tools and software that can be used to study intronic and exonic mutations leading to splicing defects,
have also been discussed.
Acknowledgment
PKT was supported by the Czech Science Foundation (P305/12/G034) and the institutional support (RVO68378050).
See also: Exome Sequencing Data Analysis. Functional Enrichment Analysis. Genome Annotation. Genome Annotation: Perspective From
Bacterial Genomes. Genome Databases and Browsers. Genome Informatics. Integrative Analysis of Multi-Omics Data. Metabolome Analysis.
Natural Language Processing Approaches in Bioinformatics. Next Generation Sequencing Data Analysis. Prediction of Coding and Non-Coding
RNA. Quantitative Immunology by Data Analysis Using Mathematical Models. Sequence Analysis. Whole Genome Sequencing Analysis
230 Bioinformatics Approaches for Studying Alternative Splicing
References
Baralle, D., Buratti, E., 2017. RNA splicing in human disease and in the clinic. Clin. Sci. (Lond.) 131 (5), 355–368.
Barann, M., Zimmer, R., Birzele, F., 2017. Manananggal – A novel viewer for alternative splicing events. BMC Bioinform. 18 (1), 120.
Belfort, M., 1990. Phage T4 introns: Self-splicing and mobility. Annu. Rev. Genet. 24 (1), 363–385.
Berget, S.M., Moore, C., Sharp, P.A., 1977. Spliced segments at the 50 terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. 74 (8), 3171–3175.
Bhatt, D.M., et al., 2012. Transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular RNA fractions. Cell 150 (2), 279–290.
Black, D.L., 2003. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291–336.
Bogdanov, V.Y., 2006. Blood coagulation and alternative pre-mRNA splicing: An overview. Curr. Mol. Med. 6 (8), 859–869.
Bradley, T., Cook, M.E., Blanchette, M., 2015. SR proteins control a complex network of RNA-processing events. RNA 21 (1), 75–92.
Brett, D., et al., 2002. Alternative splicing and genome complexity. Nat. Genet. 30 (1), 29–30.
Brooks, A.N., et al., 2010. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res.
Brugiolo, M., Herzel, L., Neugebauer, K.M., 2013. Counting on co-transcriptional splicing. F1000Prime Rep. 5, 9.
Burset, M., Seledtsov, I.A., Solovyev, V.V., 2001. SpliceDB: Database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res. 29 (1), 255–259.
Calarco, J.A., Zhen, M., Blencowe, B.J., 2011. Networking in a global world: Establishing functional connections between neural splicing regulators and their target transcripts.
RNA 17 (5), 775–791.
Carrillo Oesterreich, F., Preibisch, S., Neugebauer, K.M., 2010. Global analysis of nascent RNA reveals transcriptional pausing in terminal exons. Mol. Cell 40 (4), 571–581.
Cartegni, L., et al., 2003. ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res. 31 (13), 3568–3571.
Cartegni, L., et al., 2006. Determinants of exon 7 splicing in the spinal muscular atrophy genes, SMN1 and SMN2. Am. J. Hum. Genet. 78 (1), 63–77.
Cartegni, L., Krainer, A.R., 2002. Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nat. Genet.
30 (4), 377–384.
Castle, J.C., et al., 2008. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat. Genet. 40 (12), 1416–1425.
Chasin, L.A., 2007. Searching for splicing motifs. Adv. Exp. Med. Biol. 623, 85–106.
Chen, L., 2011. Statistical and computational studies on alternative splicing. In: Lu, H.H.-S., Schölkopf, B., Zhao, H. (Eds.), Handbook of Statistical Bioinformatics. Berlin,
Heidelberg: Springer Berlin Heidelberg, pp. 31–53.
Chow, L.T., et al., 1977. An amazing sequence arrangement at the 50 ends of adenovirus 2 messenger RNA. Cell 12 (1), 1–8.
Clark, F., Thanaraj, T.A., 2002. Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum. Mol.
Genet. 11 (4), 451–464.
Clark, T.A., et al., 2007. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 8 (4), R64.
Clark, T.A., Sugnet, C.W., Ares Jr., M., 2002. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296 (5569), 907–910.
Cooper, T.A., 2005. Alternative splicing regulation impacts heart development. Cell 120 (1), 1–2.
Dai, C., et al., 2012. Integrating many co-splicing networks to reconstruct splicing regulatory modules. BMC Syst. Biol. 6 (1), S17.
Deburgrave, N., et al., 2007. Protein- and mRNA-based phenotype-genotype correlations in DMD/BMD with point mutations and molecular basis for BMD with nonsense and
frameshift mutations in the DMD gene. Hum. Mutat. 28 (2), 183–195.
Desmet, F.O., et al., 2009. Human Splicing Finder: An online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37 (9), e67.
Dogan, R.I., et al., 2007. SplicePort – An interactive splice-site analysis tool. Nucleic Acids Res. 35 (Web Server issue), W285–W291.
Early, P., et al., 1980. Two mRNAs can be produced from a single immunoglobulin mu gene by alternative RNA processing pathways. Cell 20 (2), 313–319.
Emig, D., et al., 2010. AltAnalyze and DomainGraph: Analyzing and visualizing exon expression data. Nucleic Acids Res. 38 (Web Server issue), W755–W762.
Endo-Umeda, K., et al., 2012. Differential expression and function of alternative splicing variants of human liver X receptor alpha. Mol. Pharmacol. 81 (6), 800–810.
Eng, L., et al., 2004. Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: Maximum entropy estimates of splice junction strengths. Hum.
Mutat. 23 (1), 67–76.
Fairbrother, W.G., et al., 2004. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res. 32 (Web Server issue), W187–W190.
Fairbrother, W.G., et al., 2004. Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLOS Biol. 2 (9), E268.
Feng, H., Qin, Z., Zhang, X., 2013. Opportunities and methods for studying alternative splicing in cancer with RNA-Seq. Cancer Lett. 340 (2), 179–191.
Florea, L., et al., 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8 (9), 967–974.
Florea, L., Song, L., Salzberg, S.L., 2013. Thousands of exon skipping events differentiate among splicing patterns in sixteen human tissues. F1000Res 2, 188.
Foissac, S., Sammeth, M., 2007. ASTALAVISTA: Dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res. 35 (Web Server
issue), W297–W299.
Garcia-Blanco, M.A., Baraniak, A.P., Lasda, E.L., 2004. Alternative splicing in disease and therapy. Nat. Biotechnol. 22, 535.
Garg, K., Green, P., 2007. Differing patterns of selection in alternative and constitutive splice sites. Genome Res. 17 (7), 1015–1022.
Gatto, A., et al., 2014. FineSplice, enhanced splice junction detection and quantification: A novel pipeline based on the assessment of diverse RNA-Seq alignment solutions.
Nucleic Acids Res. 42 (8), e71.
Geuens, T., Bouhy, D., Timmerman, V., 2016. The hnRNP family: Insights into their role in health and disease. Hum. Genet. 135, 851–867.
Ghigna, C., et al., 2010. Pro-metastatic splicing of Ron proto-oncogene mRNA can be reversed: Therapeutic potential of bifunctional oligonucleotides and indole derivatives.
RNA Biol. 7 (4), 495–503.
Gilbert, W., 1978. Why genes in pieces? Nature 271 (5645), 501.
Graveley, B.R., et al., 2004. The organization and evolution of the dipteran and hymenopteran Down syndrome cell adhesion molecule (Dscam) genes. RNA 10 (10),
1499–1506.
Gupta, S., et al., 2004. Strengths and weaknesses of EST-based prediction of tissue-specific alternative splicing. BMC Genom. 5, 72.
Hagen, R.M., Ladomery, M.R., 2012. Role of splice variants in the metastatic progression of prostate cancer. Biochem. Soc. Trans. 40 (4), 870–874.
Han, S.P., et al., 2010. Functional implications of the emergence of alternative splicing in hnRNP A/B transcripts. RNA 16 (9), 1760–1768.
Hao, S., Baltimore, D., 2013. RNA splicing regulates the temporal order of TNF-induced gene expression. Proc. Natl. Acad. Sci. USA 110 (29), 11934–11939.
Herzel, L., Neugebauer, K.M., 2015. Quantification of co-transcriptional splicing from RNA-Seq data. Methods 85, 36–43.
Hnilicova, J., Stanek, D., 2011. Where splicing joins chromatin. Nucleus 2 (3), 182–188.
Hoskins, A.A., et al., 2011. Ordered and dynamic assembly of single spliceosomes. Science 331 (6022), 1289–1295.
Hoskins, A.A., Moore, M.J., 2012. The spliceosome: A flexible, reversible macromolecular machine. Trends Biochem. Sci. 37 (5), 179–188.
Houdayer, C., et al., 2008. Evaluation of in silico splice tools for decision-making in molecular diagnosis. Hum. Mutat. 29 (7), 975–982.
Hu, Y., et al., 2013. DiffSplice: The genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res. 41 (2), e39.
Huang, H.Y., et al., 2006. RegRNA: An integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Res. 34 (Web Server issue), W429–W434.
Jurica, M.S., Moore, M.J., 2003. Pre-mRNA splicing: Awash in a sea of proteins. Mol. Cell 12 (1), 5–14.
Kalari, K.R., et al., 2012. Deep sequence analysis of non-small cell lung cancer: integrated analysis of gene expression, alternative splicing, and single nucleotide variations in
lung adenocarcinomas with and without oncogenic KRAS mutations. Front. Oncol. 2, 12.
Bioinformatics Approaches for Studying Alternative Splicing 231
Katz, Y., et al., 2010. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7 (12), 1009–1015.
Kent, W.J., 2002. BLAT – The BLAST-like alignment tool. Genome Res. 12 (4), 656–664.
Keren, H., Lev-Maor, G., Ast, G., 2010. Alternative splicing and evolution: Diversification, exon definition and function. Nat. Rev. Genet. 11 (5), 345–355.
Kim, N., Lee, C., 2008. Bioinformatics detection of alternative splicing. In: Keith, J.M. (Ed.), Bioinformatics: Data, Sequence Analysis and Evolution. Totowa, NJ: Humana Press,
pp. 179–197.
Kralovicova, J., Vorechovsky, I., 2007. Global control of aberrant splice-site activation by auxiliary splicing sequences: Evidence for a gradient in exon and intron definition.
Nucleic Acids Res. 35 (19), 6399–6413.
Lara-Pezzi, E., Dopazo, A., Manzanares, M., 2012. Understanding cardiovascular disease: A journey through the genome (and what we found there). Dis. Model Mech. 5 (4),
434–443.
Lee, C., et al., 2003. ASAP: The alternative splicing annotation project. Nucleic Acids Res. 31 (1), 101–105.
Li, J.J., et al., 2011. Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. USA
108 (50), 19867–19872.
Liu, Q., et al., 2012. Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer. Genomics 99 (3), 178–182.
Li, W., et al., 2012. Algorithm to identify frequent coupled modules from two-layered network series: Application to study transcription and splicing coupling. J. Comput. Biol.
19 (6), 710–730.
Long, J.C., Javier, F.C., 2009. The SR protein family of splicing factors: Master regulators of gene expression. Biochem. J. 417 (1), 15–27.
Lynch, K.W., 2015. Thoughts on NGS, alternative splicing and what we still need to know. RNA 21 (4), 683–684.
Manetti, M., et al., 2011. Impaired angiogenesis in systemic sclerosis: The emerging role of the antiangiogenic VEGF(165)b splice variant. Trends Cardiovasc. Med. 21 (7),
204–210.
Martinez-Contreras, R., et al., 2007. hnRNP proteins and splicing control. Adv. Exp. Med. Biol. 623, 123–147.
Medina, M.W., Krauss, R.M., 2013. Alternative splicing in the regulation of cholesterol homeostasis. Curr. Opin. Lipidol. 24 (2), 147–152.
Mezlini, A.M., et al., 2013. iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 23 (3), 519–529.
Mills, J.D., Janitz, M., 2012. Alternative splicing of mRNA in the molecular pathology of neurodegenerative diseases. Neurobiol. Aging 33 (5), 1012. e11-24.
Missler, M., Sudhof, T.C., 1998. Neurexins: Three genes and 1001 products. Trends Genet. 14 (1), 20–26.
Mittendorf, K.F., et al., 2012. Tailoring of membrane proteins by alternative splicing of pre-mRNA. Biochemistry 51 (28), 5541–5556.
Miura, K., Fujibuchi, W., Sasaki, I., 2011. Alternative pre-mRNA splicing in digestive tract malignancy. Cancer Sci. 102 (2), 309–316.
Nilsen, T.W., Graveley, B.R., 2010. Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463.
Omenn, G.S., Yocum, A.K., Menon, R., 2010. Alternative splice variants, a new class of protein cancer biomarker candidates: Findings in pancreatic cancer and breast cancer
with systems biology implications. Dis. Markers 28 (4), 241–251.
Pan, Q., et al., 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415.
Pandya-Jones, A., 2011. Pre-mRNA splicing during transcription in the mammalian system. Wiley Interdiscip. Rev. RNA 2 (5), 700–717.
Patel, A.A., Steitz, J.A., 2003. Splicing double: Insights from the second spliceosome. Nat. Rev. Mol. Cell Biol. 4, 960.
Paz, I., et al., 2010. SFmap: A web server for motif analysis and prediction of splicing factor binding sites. Nucleic Acids Res. 38 (Web Server issue), W281–W285.
Peled-Zehavi, H., et al., 2001. Recognition of RNA branch point sequences by the KH domain of splicing factor 1 (mammalian branch point binding protein) in a splicing factor
complex. Mol. Cell. Biol. 21 (15), 5232–5241.
Pfarr, N., et al., 2005. Linking C5 deficiency to an exonic splicing enhancer mutation. J. Immunol. 174 (7), 4172–4177.
Poulos, M.G., et al., 2011. Developments in RNA splicing and disease. Cold Spring Harb. Perspect. Biol. 3 (1), a000778.
Raponi, M., et al., 2011. Prediction of single-nucleotide substitutions that result in exon skipping: Identification of a splicing silencer in BRCA1 exon 6. Hum. Mutat. 32 (4),
436–444.
Ren, S., et al., 2012. RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant
alternative splicings. Cell Res. 22 (5), 806–821.
Rogers, M.F., et al., 2012. SpliceGrapher: Detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome Biol. 13 (1), R4.
Rogozin, I.B., et al., 2012. Origin and evolution of spliceosomal introns. Biol. Direct 7, 11-11.
Rosenfeld, M.G., et al., 1982. Calcitonin mRNA polymorphism: Peptide switching associated with alternative RNA splicing events. Proc. Natl. Acad. Sci. 79 (6), 1717–1721.
Roy, S.W., Gilbert, W., 2006. The evolution of spliceosomal introns: Patterns, puzzles and progress. Nat. Rev. Genet. 7 (3), 211–221.
Roy, B., Haupt, L.M., Griffiths, L.R., 2013. Review: Alternative splicing (AS) of genes as an approach for generating protein complexity. Curr. Genom. 14 (3), 182–194.
Ryan, M.C., et al., 2012. SpliceSeq: A resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts. Bioinformatics 28 (18),
2385–2387.
Sammeth, M., 2009. Complete alternative splicing events are bubbles in splicing graphs. J. Comput. Biol. 16 (8), 1117–1140.
Sammeth, M., Foissac, S., Guigó, R., 2008. A general definition and nomenclature for alternative splicing events. PLOS Comput. Biol. 4 (8), e1000147.
Sampath, J., Pelus, L.M., 2007. Alternative splice variants of survivin as potential targets in cancer. Curr. Drug Discov. Technol. 4 (3), 174–191.
Schmidt, F.J., 1985. RNA splicing in prokaryotes: Bacteriophage T4 leads the way. Cell 41 (2), 339–340.
Schreiber, K., et al., 2015. Alternative splicing in next generation sequencing data of saccharomyces cerevisiae. PLOS ONE 10 (10), e0140487.
Schwartz, S., Hall, E., Ast, G., 2009. SROOGLE: Webserver for integrative, user-friendly visualization of splicing signals. Nucleic Acids Res. 37 (Web Server issue),
W189–W192.
Shapiro, I.M., et al., 2011. An EMT – Driven alternative splicing program occurs in human breast cancer and modulates cellular phenotype. PLOS Genet. 7 (8), e1002218.
Shen, S., et al., 2012. MATS: A Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Res. 40 (8), e61.
Shen, S., et al., 2014. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 111 (51), E5593–E5601.
Song, L., Sabunciyan, S., Florea, L., 2016. CLASS2: Accurate and efficient splice variant annotation from RNA-seq reads. Nucleic Acids Res. 44 (10), e98.
Sorek, R., Shamir, R., Ast, G., 2004. How prevalent is functional alternative splicing in the human genome? Trends Genet. 20 (2), 68–71.
Soukarieh, O., et al., 2016. Exonic splicing mutations are more prevalent than currently estimated and can be predicted by using in silico tools. PLOS Genet. 12 (1), e1005756.
Srivastava, S., Chen, L., 2010. A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 38 (17), e170.
Staley, J.P., Guthrie, C., 1998. Mechanical devices of the spliceosome: Motors, clocks, springs, and things. Cell 92 (3), 315–326.
Stamm, S., et al., 2000. An alternative-exon database and its statistical analysis. DNA Cell Biol. 19 (12), 739–756.
Sultan, M., et al., 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 (5891), 956–960.
Tang, J.Y., et al., 2013. Alternative splicing for diseases, cancers, drugs, and databases. Sci. World J. 2013, 703568.
Tilgner, H., et al., 2012. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs.
Genome Res. 22 (9), 1616–1625.
Trowitzsch, S., et al., 2009. Crystal structure of the Pml1p subunit of the yeast precursor mRNA retention and splicing complex. J. Mol. Biol. 385 (2), 531–541.
van den Hoogenhof, M.M., Pinto, Y.M., Creemers, E.E., 2016. RNA splicing: Regulation and dysregulation in the heart. Circ. Res. 118 (3), 454–468.
van Nimwegen, E., et al., 2006. SPA: A probabilistic algorithm for spliced alignment. PLOS Genet. 2 (4), e24.
Vitting-Seerup, K., et al., 2014. spliceR: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinform. 15, 81.
Volanakis, A., et al., 2013. Spliceosome-mediated decay (SMD) regulates expression of nonintronic genes in budding yeast. Genes Dev. 27 (18), 2025–2038.
232 Bioinformatics Approaches for Studying Alternative Splicing
Wang, E.T., et al., 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476.
Wang, J., et al., 2017. Computational methods and correlation of exon-skipping events with splicing, transcription, and epigenetic factors. Methods Mol. Biol. (Clifton, N.J.)
1513, 163–170.
Wang, J., et al., 2005. Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes. Nucleic Acids Res. 33 (16), 5053–5062.
Wang, M., Marin, A., 2006. Characterization and prediction of alternative splice sites. Gene 366 (2), 219–227.
Wang, W., et al., 2013. Identifying differentially spliced genes from two groups of RNA-seq samples. Gene 518 (1), 164–170.
Wang, X., et al., 2009. Genome-wide prediction of cis-acting RNA elements regulating tissue-specific pre-mRNA alternative splicing. BMC Genom. 10 (1), S4.
Wang, Z., Burge, C.B., 2008. Splicing regulation: From a parts list of regulatory elements to an integrated splicing code. RNA 14 (5), 802–813.
Will, C.L., Lührmann, R., 2011. Spliceosome structure and function. Cold Spring Harb. Perspect. Biol. 3 (7).
Woodson, S.A., 1998. Ironing out the kinks: Splicing and translation in bacteria. Genes Dev. 12 (9), 1243–1247.
Wu, J., et al., 2011. SpliceTrap: A method to quantify alternative splicing under single cellular conditions. Bioinformatics 27 (21), 3010–3016.
Wu, T.D., Watanabe, C.K., 2005. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21 (9), 1859–1875.
Xie, H., et al., 2002. Computational analysis of alternative splicing using EST tissue information. Genomics 80 (3), 326–330.
Xu, Q., Modrek, B., Lee, C., 2002. Genome‐wide detection of tissue‐specific alternative splicing in the human transcriptome. Nucleic Acids Res. 30 (17), 3754–3766.
Yap, K., Makeyev, E.V., 2013. Regulation of gene expression in mammalian nervous system through alternative pre-mRNA splicing coupled with RNA quality control
mechanisms. Mol. Cell. Neurosci. 56, 420–428.
Yeo, G., et al., 2004. Variation in alternative splicing across human tissues. Genome Biol. 5 (10), R74.
Yeo, G.W., et al., 2007. Alternative splicing events identified in human embryonic stem cells and neural progenitors. PLOS Comput. Biol. 3 (10), 1951–1967.
Yeo, G., Burge, C.B., 2004. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11 (2–3), 377–394.
Yi, Q., Tang, L., 2011. Alternative spliced variants as biomarkers of colorectal cancer. Curr. Drug Metab. 12 (10), 966–974.
Zahler, A.M., Tuttle, J.D., Chisholm, A.D., 2004. Genetic suppression of intronic þ 1G mutations by compensatory U1 snRNA changes in Caenorhabditis elegans. Genetics 167
(4), 1689–1696.
Zatkova, A., et al., 2004. Disruption of exonic splicing enhancer elements is the principal cause of exon skipping associated with seven nonsense or missense alleles of NF1.
Hum. Mutat. 24 (6), 491–501.
Zhang, X., et al., 2016. Recognition of alternatively spliced cassette exons based on a hybrid model. Biochem. Biophys. Res. Commun. 471 (3), 368–372.
Zhao, K., et al., 2013. GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. 14 (7), R74.
Zheng, C.L., Fu, X.-D., Gribskov, M., 2005. Characteristics and regulatory elements defining constitutive splicing and different modes of alternative splicing in human and
mouse. RNA 11 (12), 1777–1787.
Relevant Websites
http://www.altanalyze.org
AltAnalyze.
http://ccb.jhu.edu/software/ASprofile
ASprofile
CCB.
http://genome.imim.es/astalavista
AStalavista.
http://www.domaingraph.de
DomainGraph.
http://sammeth.net/confluence/display/ASTA/2 þ - þ Download
2
Download
AStalavista
Confluence.
http://www.netlab.uky.edu/p/bioinfo/DiffSplice
DiffSplice.
http://bioinfo.au.tsinghua.edu.cn/software/DSGseq
DSGseq.
https://sourceforge.net/p/finesplice/
FineSplice.
https://github.com/anbrooks/juncBASE
GitHub
anbrooks/juncBASE.
http://compbio.uthscsa.edu/GESS_Web/
GESS-RNA.
rnaseq-mats.sourceforge.net/
Multivariate Analysis of Transcript Splicing.
http://www.ncbi.nlm.nih.gov/snp
NCBI
NIH.
http://www-rcf.usc.edu/Bliangche/software.html
GPSeq.
https://pypi.python.org/pypi/misopy/
Python Package Index.
http://variome.kobic.re.kr/ssSNPTarget/
ssSNPTarget.
http://cancan.cshl.edu/splicetrap
SpliceTrap.
http://rulai.cshl.edu/splicetrap/
SpliceTrap.
Bioinformatics Approaches for Studying Alternative Splicing 233
http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview
SpliceSeq:Overview.
http://bioinformatics.zj.cn/splicingviewer
SpliceSeq:Overview.
https://codeload.github.com/Xinglab/GLiMMPS/zip/master
GLiMMPS.
http://www.bioconductor.org/packages/2.13/bioc/html/spliceR.html
SpliceR
Bioconductor.
Biographical Sketch
Prasoon Kumar Thakur is presently working as a bioinformatician at the laboratory of RNA Biology at the Institute
of Molecular genetics, Prague. He is pursuing a PhD degree in Charles University, Prague, Czech Republic. He has
a bachelor’s degree in Bioinformatics from the Institute of Advance Studies in Education, Sardarshahr (IASES),
India and M.Sc (Bioinformatics) from the Jamia Millia Islamia, New Delhi, India. The focus of his current research
is RNA splicing and high-throughput data analysis.
Hukam C. Rawal is presently working as a Research Associate at NRCPB, New Delhi, India. He has over 10 years’
experience in different aspects of computational biology. He has good exposure to a wide range of computational
approaches for the analysis related to human disease, pathogen infection and crop sciences. He has expertise on
advance and popular genomics studies and has worked on different aspects of comparative genomics. He is quite
familiar with high throughput sequencing data. He has several publications in the field of comparative genomics,
transcriptome assembly, genome level analysis, chloroplast and mitochondrial genome in reputed journals like
Genome Biology & Evolution, Genes, Scientific Reports, etc.
Mina Obuca finished her bachelor and master studies in the Faculty of Sciences at the University of Novi Sad,
Serbia. Currently, she is a PhD student at the Charles University in Prague, Czech Republic where she is doing her
PhD at the Institute of Molecular Genetics in the Laboratory of RNA Biology. She is interested in answering why
mutations in splicing proteins are causing retinitis pigmentosa.
234 Bioinformatics Approaches for Studying Alternative Splicing
Sandeep Kaushik is a passionate computational biologist with a wide exposure of computational approaches.
Presently, he is presently carrying out his research as an Assistant Researcher (equivalent to Assistant Professor) at
3B’s Research Group, University of Minho, Portugal. He has a PhD in Bioinformatics from National Institute of
Immunology, New Delhi, India. He has gained a wide exposure on analysis of scientific data ranging from
transcriptomic data on mycobacteria and human samples to genomic data from wheat. His research experience
and expertise entails molecular dynamics simulations, RNA-sequencing data analysis, de novo genome assembly,
protein prediction and annotation, database mining, agent-based modeling and simulations. He has a cumulative
experience of more than 10 years of programming (using PERL, R and NetLogo languages). He has published his
research in reputed international journals like Molecular Cell, Biomaterials, Biophysical Journal and others.