Computational approaches for improving the accuracy and efficiency of RNA-seq analysis

Author:

Sarkar, Hirak N/A [claim]

Description:

The past decade has seen tremendous growth in the area of high throughput sequencing technology, which simultaneously improved the biological resolution and subsequent processing of publicly-available sequencing datasets. This enormous amount of data also calls for better algorithms to process, extract and filter useful knowledge from the data. In this thesis, I concentrate on the challenges and solutions related to the processing of bulk RNA-seq data. An RNA-seq dataset consists of raw nucleotide sequences, drawn from the expressed mixture of transcripts in one or more samples. One of the most common uses of RNA-seq is obtaining transcript or gene level abundance information from the raw nucleotide read sequences and then using these abundances for downstream analyses such as differential expression. A typical computational pipeline for such processing broadly involves two steps: assigning reads to the reference sequence through alignment or mapping algorithms, and subsequently quantifying such assignments to obtain the expression of the reference transcripts or genes. In practice, this two-step process poses multitudes of challenges, starting from the presence of noise and experimental artifacts in the raw sequences to the disambiguation of multi-mapped read sequences. In this thesis, I have described these problems and demonstrated efficient state-of-the-art solutions to a number of them. The current thesis will explore multiple uses for an alternate representation of an RNA-seq experiment encoded in equivalence classes and their associated counts. In this representation, instead of treating a read fragment individually, multiple fragments are simultaneously assigned to a set of transcripts depending on the underlying characteristics of the read-to-transcript mapping. I used the equivalence classes for a number of applications in both single-cell and bulk RNA-seq technologies. By employing equivalence classes at cellular resolution, I have developed a droplet-based single-cell RNA-seq sequence simulator ...

Contributors:

Patro, Robert ; Digital Repository at the University of Maryland ; University of Maryland (College Park, Md.) ; Computer Science

Year of Publication:

2020

Document Type:

Dissertation ; [Doctoral and postdoctoral thesis]

Language:

Subjects:

Computer science ; Bioinformatics ; Conservation biology ; Assembly ; Clustering ; Equivalence Classe ; Quantification ; RNA-seq ; Transcription

DDC:

004 Data processing & computer science (computed)

Relations:

http://hdl.handle.net/1903/26454

URL:

http://hdl.handle.net/1903/26454
https://doi.org/10.13016/tuy4-kjcc

Content Provider:

University of Maryland: Digital Repository (DRUM)

URL: https://drum.lib.umd.edu/
Continent: North America
Country: us
Latitude / Longitude: 38.986918 / -76.942554 (Google Maps | OpenStreetMap)
Number of documents: 32,385
Open Access: 204 (1%)
Type: Academic publications
System: DSpace
Content provider indexed in BASE since: 2005-03-05
BASE URL: https://www.base-search.net/Search/Results?q=coll:ftunivmaryland

Citations Loading ...

For full functionality of this site it is necessary to enable JavaScript.
Here are the instructions for enabling JavaScript in your web browser.

Cited by Loading ...

For full functionality of this site it is necessary to enable JavaScript.
Here are the instructions for enabling JavaScript in your web browser.

More Versions Loading ...

Email this
Add to Favorites
In Google Scholar
- RefWorks
- EndNote
- RIS
- BibTeX
- MARC
- RDF
- RTF
- JSON
- YAML