Title:

Analyzing Phenotypes in High-content Screening with Machine Learning

Author: Zamparo, Lee
Advisor: Zhang, Zhaolei
Department: Computer Science
Issue Date: Nov-2015
Abstract (summary): High-content screening (HCS) uses computational analysis on large collections of unlabeled biological image data to make discoveries in cell biology. While biological images have historically been analyzed by inspection, advances in the automation of sample preparation and delivery, coupled with advances in microscopy and data storage, have resulted in a massive increase in both the number and resolution of images produced per study. These advances have facilitated genome-scale imaging studies, which are increasingly frequent. Although the sheer volume of data involved strongly favours computational analysis, many assays continue to be scored by eye. As a scoring method, visual inspection limits the rate at which data may be analyzed, at increased cost and decreased reproducibility. In this thesis, we propose computational methods for data analysis of HCS data. We begin with feature data derived from confocal microscopy fluorescence images of yeast cell populations. We use machine learning methods trained on a small labeled subset of that feature data to robustly score each population with respect to a DNA damage focus phenotype. We then introduce a method for using deep autoencoders trained using a label-free objective to perform dimensionality reduction. This allows us to model the non-linear relations between features in high-dimensional data. The computational complexity of our approach scales linearly with the number of examples, allowing us to train on a much larger number of samples. Finally, we propose an outlier detection method for discovering populations that present significantly different distributions of cellular phenotypes as com- pared to wild-type using nonparametric Bayesian clustering on the low-dimensional data. We evaluate our methods against comparable alternatives and show that they either meet or exceed the level of top performers.
Content Type: Thesis

Permanent link

https://hdl.handle.net/1807/71416

Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.