0% found this document useful (0 votes)
3 views2 pages

Mutthiman 2

This paper discusses the use of unsupervised feature learning and machine learning techniques for cancer detection and classification based on gene expression data. The proposed method allows for the integration of data from various cancer types, enhancing the detection and diagnosis process compared to traditional approaches. The results indicate improved performance in cancer classification, promising a more generalized and scalable solution for medical analysis.

Uploaded by

alucard98081
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views2 pages

Mutthiman 2

This paper discusses the use of unsupervised feature learning and machine learning techniques for cancer detection and classification based on gene expression data. The proposed method allows for the integration of data from various cancer types, enhancing the detection and diagnosis process compared to traditional approaches. The results indicate improved performance in cancer classification, promising a more generalized and scalable solution for medical analysis.

Uploaded by

alucard98081
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Abstract

Using automated computer tools and in particular machine learning to facilitate and
enhance medical analysis and diagnosis is a promising and important area. In this paper, we
show that how unsupervised feature learning can be used for cancer detection and cancer
type analysis from gene expression data. The main advantage of the proposed method over
previous cancer detection approaches is the possibility of applying data from various types
of cancer to automatically form features which help to enhance the detection and diagnosis
of a specific one. The technique is here applied to the detection and classification of cancer
types based on gene expression data. In this domain we show that the performance of this
method is better than that of previous methods, therefore promising a more comprehensive
and generic approach for cancer detection and diagnosis.

Introduction
Studying the correlation between gene expression profiles and disease states or stages of
cells plays an important role in biological and clinical applications (Tan & Gilbert, 2003). The
gene expression profiles can here be obtained from multiple tissue samples and by
comparing the genes expressed in normal tissue with the ones in diseased tissue, one can
obtain better insight into the disease pathology (Tan & Gilbert, 2003). One of the challenges
that has been addressed in this way is to determine the difference between cancerous gene
expression in tumor cells and the gene expression in normal, non-cancerous tissues. To
address this, quite a number of machine learning classification techniques have been used
to classify tissue into cancerous and normal. However, due to the high dimensionality of
gene expression data (a.k.a the high dimensionality of the feature space) and the availability
of only a few hundred samples for a given tumor, this application requires a number of
specific considerations to deal with these data.

The first challenge here is how to reduce the dimensionality of the feature space in a way that
ensures that the resulting feature space still contains sufficient information to perform accurate
classification. In addition, small sample sets (i.e. a small number of training examples) make the
problem much harder to solve and increase the risk of overfitting. For years, many solutions have
been proposed to address the cancer detection problem, most of which perform feature space
reduction by deriving compact feature sets by selecting and constructing features either manually or
in supervised ways. This, however, leads to the problems with those methods that they are mostly
not scalable and can not be generalized to new cancer types without the re-design of new features.
In addition, these techniques can not take effective advantage of tissue samples from other cancers
when, for example, breast cancer detection is to be learned, being effectively restricted to only data
from breast cancer and normal tissue when building the classifier.

To deal with this problem and to facilitate and develop more generalized versions of cancer
classifiers, we propose in this paper a more general way of learning features by applying
unsupervised feature learning and deep learning methods. We use a sparse autoencoder
method to learn a concise feature representation from unlabeled data. In contrast to the
previous methods where data has to be strictly from the cancer type to be detected in order
to provide the appropriate label for supervised learning, the unlabeled data can here be
obtained by combing data from different tumor cells provided that they are generated using the
same microarray platform (i.e. given that they contain the same gene expression information). For
example, for the feature learning that forms the basis for prostate cancer classification we can use
samples from breast cancer, lung cancer, and many other cancers which are available in that
platform. The resulting features from all these sets are then used as a basis for the construction of
the classifier.

Gene Expression
Gene expression data measures the level of activity of genes within a given tissue and thus provides
information regarding the complex activities within the corresponding cells. This data is generally
obtained by measuring the amount of messenger ribonucleic acid(mRNA) produced during
transcription which, in turn, is a measure of how active or functional the corresponding gene is
(Aluru, 2005). As cancer is associated with multiple genetic and regulatory aberrations in the cell,
these should reflect in the gene expression data. To capture these abnormalities, microarrays, which
permit the simultaneous measurement of expression levels of tens of thousands of genes, have been
increasingly utilized to characterize the global gene-expression profiles of tumor

You might also like