Abstract
The past decade has seen unprecedented progress in the survival chances of cancer patients as a consequence of new treatments targeting tumor-specific cellular processes, which have been uncovered by molecular genetic analyses. From a data analysis perspective, the main challenge is the high dimensionality and multimodality of the genetic data relative to the small sample sizes (numbers of patients). From a computational perspective, the analysis of high volumes of data (about 100 GB of sequencing data for an individual tumor genome) currently requires high-powered computational resources and still remains challenging in the very short time frames that are desired to start treatment immediately. We discuss two avenues of progress. First,we present methods that are able to extract most of the genetic variants froma sequenced tumor genome, but require only 2%to 5% of the computational resources compared with the current state-of-the-art procedures. Second, we discuss a versatile unified statistical model for distinguishing true variants from technical artifacts of the DNA sequencing process. Using analyses of paired samples from primary and relapse neuroblastoma tumors, we are able to extract patterns of tumor evolution that are correlated with cancer progression and the escape of tumors from therapeutic intervention. As a result, a novel risk classification of neuroblastoma has been established based on genomic and mutational data.