Author:
Description:
The past decade has seen tremendous growth in the area of high throughput sequencing technology, which simultaneously improved the biological resolution and subsequent processing of publicly-available sequencing datasets. This enormous amount of data also calls for better algorithms to process, extract and filter useful knowledge from the data. In this thesis, I concentrate on the challenges and solutions related to the processing of bulk RNA-seq data. An RNA-seq dataset consists of raw nucleotide sequences, drawn from the expressed mixture of transcripts in one or more samples. One of the most common uses of RNA-seq is obtaining transcript or gene level abundance information from the raw nucleotide read sequences and then using these abundances for downstream analyses such as differential expression. A typical computational pipeline for such processing broadly involves two steps: assigning reads to the reference sequence through alignment or mapping algorithms, and subsequently quantifying such assignments to obtain the expression of the reference transcripts or genes. In practice, this two-step process poses multitudes of challenges, starting from the presence of noise and experimental artifacts in the raw sequences to the disambiguation of multi-mapped read sequences. In this thesis, I have described these problems and demonstrated efficient state-of-the-art solutions to a number of them. The current thesis will explore multiple uses for an alternate representation of an RNA-seq experiment encoded in equivalence classes and their associated counts. In this representation, instead of treating a read fragment individually, multiple fragments are simultaneously assigned to a set of transcripts depending on the underlying characteristics of the read-to-transcript mapping. I used the equivalence classes for a number of applications in both single-cell and bulk RNA-seq technologies. By employing equivalence classes at cellular resolution, I have developed a droplet-based single-cell RNA-seq sequence simulator ...
Contributors:
Patro, Robert ; Digital Repository at the University of Maryland ; University of Maryland (College Park, Md.) ; Computer Science
Year of Publication:
2020
Document Type:
Dissertation ; [Doctoral and postdoctoral thesis]
Language:
en
Subjects:
Computer science ; Bioinformatics ; Conservation biology ; Assembly ; Clustering ; Equivalence Classe ; Quantification ; RNA-seq ; Transcription
DDC:
004 Data processing & computer science (computed)
Relations:
URL:
Content Provider:
University of Maryland: Digital Repository (DRUM)
- URL: https://drum.lib.umd.edu/
- Continent: North America
- Country: us
- Latitude / Longitude: 38.986918 / -76.942554 (Google Maps | OpenStreetMap)
- Number of documents: 32,385
- Open Access: 204 (1%)
- Type: Academic publications
- System: DSpace
- Content provider indexed in BASE since:
- BASE URL: https://www.base-search.net/Search/Results?q=coll:ftunivmaryland
My Lists:
My Tags:
Notes:
Citations Loading ...
Cited by Loading ...
More Versions Loading ...
An error has occurred!