Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
ISSN No:-2456-2165
Abstract:- The rapid development of next generation was published that described the cell characterization at
sequencing (NGS) tool and technologies over the past initial developmental stages (Tang et al., 2009). Recent
few years, valuable insights have been gained into the technical developments, and bioinformatics tools and
complex and diverse biological systems with diverse techniques advanced researchers' ability to Insilco
range from microbial communities to cancerous genome. evaluation of diversified populations of immunological cells
The NGS base technologies based of genomics, in health and disease (Shalek et al., 2013). Further ScRNA
epigenomics and transcriptomics, are preoperationally sequencing is variously used to visualize the relationships of
concentrated on individual cells characterization. As an cell lineage in early developmental stages like,
example, ScRNA sequencing can sort out complex and differentiation of myoblasts and the determination of
rare cellular populations, revealing the gene regulatory lymphocyte fates (Stubbington et al., 2016). Similarly, some
inter and intra relationships, and trace the progression analyses such as detection of alternative splicing,
of individual cell lineages during cell development. exploration of allelic expression, and the identification of
Researchers have previously undiscovered molecular edited RNA, do not apply to ScRNA-seq protocols, which
details in single-nuclei and single-cell sequencing generate data for whole-transcript ScRNA-seq.
experiments. Since recent years, analytical approaches
and sequencing methods have increased rapidly. In this There are number of sequencing techniques, platforms
review, our prime focus will be on challenges related to which can affect sequencing data and also have been
single cell isolation (SCI) and library preparation, developed different techniques. As a result, choosing an
analysis of ScRNA seq data by computational pipelines. appropriate analytical approach is essential to efficiently
These sequencing technologies will be greatly facilitating deal with the highly variable ScRNA sequencing data
molecular biological improvement and bioinformatics (Bacher and Kendziorski, 2016). A search tag cloud on SEQ
due to the availability of bioinformatics tools. answers shows that RNA-Seq is one of the top subjects in
NGS (www.seqanswers.com/forums). Serial Gene
Keywords:- Single-cell RNA sequencing, alternative expression analysis and Microarrays base studies are being
splicing, Next generation sequencing. replaced by RNA sequencing methodologies to quantify
gene expressions. Because of the great depth of sequencing
I. INTRODUCTION techniques, RNA sequencing can yield a repertoire of all the
transcripts present in the tissue site at any specific point in
In a typical single-cell sequencing workflow, tissue time, including rare transcripts. By this, it produces almost
and cells are prepared, cells are captured and a library is complete portrait of the transcriptomic events occurred in
prepared, cells are sequenced, raw data is processed, and any living cell. The multipurpose data obtained from this
visualization and downstream analyses are performed. Since can be utilized in gene characterize to reveal information on
each tissue and cell type to be isolated may require a novel transcripts, single-nucleotide polymorphisms,
different protocol for preparing single cell suspensions there alternative splicing that have to measure gene expression
are multiple ways for their preparation. Genomics and level, single nucleotide polymorphisms and structural
transcriptome analysis are both powerful tools for tackling variations (Novales et al., 2008; Dissanayake et al., 2009;
the long-standing problem of mapping genotypes to Brautigan et al., 2011; Alagna et al., 2009)
phenotypes in biology and medicine. The sequencing entire
transcriptomes were equal to single cell level that was The RNA sequencing about non-model species when
pioneered by Is cove and colleagues (Brady et al., 1990) and the genetic tools and transcriptomic sequencing data may
James Eber wine (James Eber wine et al., 1992) Through scarce. Furthermore, RNA is only sequenced in coding
linear amplification by in vitro transcription and PCR base regions, instead of the whole genome, resources may be
exponential amplification to expand the complementary limited. RNA sequencing offers a few advantages over the
DNA (cDNA) of the individual cells and cellular lines. whole-genome assembly, including lack of repetitive
Initially developed to detect DNA base microarray chips, sequences and high GC participation. Various
and these recent technologies have been adapted for single transcriptomic approaches have been used in plant species
cell RNA (ScRNA) sequencing (Tietjen et al., 2003). The such as alga, moss non-model plant species (Trick et al.,
first report on single cell transcriptome (Sc-trans) analysis 2009; Franssen et al., 2011).
The genomics of individual cells within powerful strand displacement ability (Dean et al.,
By sequencing chromosomal variations like copy 2002).
numbers and single-nucleotide variations we can identify
chromosomal varieties. It showed to study different tumor An additional method, multiple annealing and looping-based
evolution, gametic generation, causing genomic amplification cycles (MALBAC) is used to detect accurate
heterogeneity and mutations in a population. As a result, the copy number of variations allowing for efficient genotyping
amount of DNA in human genomes is often inadequate. of individual cell’s uniformity (Zonget al., 2012). Thus,
E.g., the weightiness of one genomic DNA is just 6kg, MALBAC provides a significant advantage over other
currently in NGS applications there are copies in a normal methods. Since it reduces the amplification errors and biases
cell which is not enough. Through using a traditional PCR by copying the amplifier separately from the original
method that causes simple difficulties and allelic dropout template for starting material in exponential amplification
across the genome when applied to single cells. For genome (Wu et al., 2014).
sequencing unbiased amplification of DNA is crucial. There
are different types of PCR such as linker adapter, primer Technologies Available for ScRNA-Seq
extension and pre- amplification. A degenerate Single-cell transcriptomic research has incorporated a
oligonucleotide-primed PCR was described by Telenius et variety of ScRNA-seq technologies (Table2). Sequence-by-
al., (1992) (Hubert et al., 1992). sequence approaches were developed and published firstly
(Tang et al. 2009). Due to development of many other
Due to its simplicity and high accuracy, procedures used in ScRNA-seq approaches were followed. There are different
genomic analysis multiple displacement amplification aspects of ScRNA-seq technologies such as cell breakdown,
techniques are one of the most popular methods. DNA can amplification, reverse transcription, unique molecular
be amplified isothermally at 30˚C using random hexamer identifiers and transcript coverage. The major difference
primers and phi29 DNA polymerase. During strand between different ScRNA-seq methods is that to capture
synthesis phi29 DNA polymerase can displace the strands full-length transcripts and sequence them while other firstly
capture and sequence the 3′-end (Rosenberg et al., 2018).
There are many different Single cell RNA-seq KEGG Genes and proteins are organized into Ortholog
procedures that may have disparate strengths and groups and stored in the KEGG Orth ology Catalogue
weaknesses (Kolodziejczyk et al., 2015; Haque et al., 2017; (Kanehisa et al.,2000)
Picelli, 2017; Ziegenhainet al., 2017). A study showed Green genes and Silva are ribosomal RNA gene
Smart-seq can detect a greater expressed number of genes databases for taxonomic annotation (McDonald et al.,
than CEL-sequencing, MARS-seq, smart sequencing and 2012).
drop sequencing tool (Hashimshony et al., 2016; Jaitin et Gene Bank built by National Center for Biotechnology
al., 2014; Ramskold et al., 2012; Ziegenhainet al., 2017). Information, contains genome sequences over 250,000
The SCRNA-seq procedure has the highest technical species and data can be viewed by NCBI’s retrieval
efficiency than traditional bulk RNA-seq. Such as External system called Entrez. Included Coding and non-
RNA Control Consortium controls can be used to estimate translated regions, promoters, terminators, axons,
technical variances between cells (External, 2005). The introns and repeats (Benson et al.,2002)
RNA spike-ins are RNA transcripts that hybridize and
analyzes such as RNA-Seq, that can be calibrated through Data arranging and quantifying for ScRNA-Seq
using unique molecular identifiers to estimate absolute By Reading ratios remains important indicator for
molecular counts. Spike-ins is used by methods similar to overall quality of single sequencing data and nucleotide -
higher protocol sequencing but not by droplet-based sequencing technologies, sequence transcripts into reads and
methods while unique molecular identifiers are typically then to generate raw, RNA-seq data in fastq format. Read
employed by 3′-end sequencing technologies such as Drop- alignment there have no difference between the RNA-seq
seq (Macosko et al., 2015). have been discussed before, the mapping tools developed for
bulk RNA-seq can also be used with ScRNA-seq data. (Li
Uses of Bioinformatics tools and Homer, 2010; Chen et al., 2011). There are two
A next-generation sequencing technique (NGS) is a basically read mapping algorithms and the spaced-seed
high-throughput method that enables the identification of indexing and Burrows-Wheeler transform based (Li and
nucleotide sequences within DNA, and RNA molecules. Homer, 2010). Aligners such as TopHat2 (Kim et al., 2013).
Mathematical and statistical methods implemented in Read mapping and expression quantification is an array-
various programming paradigms and dedicated software based and suffix-based method that is more efficient than
tools can analyze and explain biological, molecular, cellular read mapping which is an important bioinformatic diverse
and genomic information (Metzker, M. L. 2010). tool, sequencing data for alignment of complementary
deoxy ribose nucleic acid, but it requires considerable
memory. There are currently few genome-guided assembly
Study of RNA editing and alternative splicing techniques different tools e.g., Single Splice, Expedition, BRIE, and
in ScRNA-Seq data census. Single Splice practices a statistical model to detect
Mainly in alternative splicing technology basically five genes with significant isoform usage. Using a linear model,
modes that are commonly recognized for example, exon- Census assigns a Dirichlet-multinomial distribution to the
skipping, mutually exclusive exons, alternative donor site, isoform counts in each gene. Differential isoform
alternative acceptor site and intron retention. Some quantification based on different hierarchical models.
mechanisms played an important role in a variety of Expedition contains a suite of algorithms for identifying AS,
biological procedures and abnormal alternative splicing, assigning splicing modes and visualizing modality changes
which has been associated to cancer (Sven et al., 2016). A (Table 3) (Welch et al., 2016; Qui et al., 2017; Huang and
lack of precision of SC RNA-seq data, splicing Sanguinetti, 2017; Song et al., 2017). Currently, the RNA-
quantification led to the development of raw RNA- editing detection have been difficult limited by SCRNA-seq
sequencing that is not suitable for SC RNA-seq data. It and prevented the use of individual cells until now. By
considers that expression dynamics influenced by important developing both single-cell editing detection algorithms and
role in cell characterization through different parameters. ScRNA-seq technologies will be possible to explore editing
It's promising to study AS at single-cell resolution to gain dynamics among single cells(Gott et al., 2000)
insights into isoform usage at the cellular level. There are
Smart
tools URL Reference
II. CONCLUSION but also polyA RNAs, the researchers will be able to obtain
comprehensive pictures of both the protein-coding as well as
It is concluded that development and application of the non-coding gene expressions in real time bases at single
ScRNA-seq, information on the variability and dynamics of cell resolution level.
cell expression has been well gathered. Such single-cell
analysis is positioned to provide a deeper understanding of Abbreviations
biological complexity in disorders as well as at normal (NGS) next-generation sequencing, single-cell RNA
development. The issues mentioned above will solved early (sc RNAsingle-cell suspensions(scS), serial analysis of gene
in future by the rapid progress in single cell isolation, new expression (SAGE), single-nucleotide polymorphisms
technologies are going to emerge as powerful way to clear (SNPs), flow-activated cell sorting (FACS) laser capture
these time-honored questions in biological research as well microdissection (LCM)linker-adapter PCR (LA-PCR),
as clinical studies in the distant future. There have been interspersed repetitive sequence amplification PCR (IRS-
proposed Single cell -sequencing methods are utilized for PCR), multiple displacement amplification (MDA), multiple
the frozen samples and their fixation are also proposed, by annealing and looping-based amplification cycles
which heterogeneous clinical study will be benefited. Other (MALBAC), transcripts per million reads (TPM)
benefits of developing protocol to capture not only poly A+