Preview |
PDF, English
- main document
Download (18MB) | Terms of use |
Abstract
The cell-type composition in bulk samples serves as key evidence for examining disease progression, phenotypic characterisation and treatment responses. Therefore, cell-type deconvolution has been spotlighted as a computational approach to estimating cell-type composition. DNA methylation (DNAm) has been broadly used as epigenetic marks for cell-type deconvolution because it carries cell type-specific signals at CpG sites in mammal genomes. In particular, sequencing-based DNAm data provides broader genomic coverage and better captures rare cell-type signals compared to array-based data. Despite the advantages of sequencing-based data, so far, array-based data has been the primary target of cell-type deconvolution methods. Hence, we introduce a new sequencing-based cell-type deconvolution method using DNAm data and perform a systematic benchmarking of existing cell-type deconvolution methods. To address the limitations of existing methods in the benchmarking, we developed the deep learning method MethylBERT based on Bidirectional Encoder Representations from Transformers (BERT). The proposed method is specifically designed for tumour purity estimation. MethylBERT classifies DNAm patterns into tumour and normal cell types, and infers the proportion of tumour cell type via maximum likelihood estimation. The evaluation demonstrates the good performance of the proposed method for DNAm pattern classification and estimation of tumour purity. In addition, we show that MethylBERT is capable of detecting rare tumour signals by yielding accurate tumour purity estimation results for bulk samples with a very low tumour percentage (<1%) demonstrating the potential of MethylBERT for non-invasive early cancer diagnostics via blood tests.
Document type: | Dissertation |
---|---|
Supervisor: | Rohr, PD Dr. Karl |
Place of Publication: | Heidelberg |
Date of thesis defense: | 14 March 2024 |
Date Deposited: | 25 Mar 2024 13:28 |
Date: | 2024 |
Faculties / Institutes: | The Faculty of Mathematics and Computer Science > Department of Computer Science |
DDC-classification: | 004 Data processing Computer science |
Controlled Keywords: | Transformers, Bioinformatics, Epigenetics, Machine learning |