0% found this document useful (0 votes)
42 views

Breast Cancer Using Image Processing

This document discusses using machine learning algorithms to classify breast cancer as benign or malignant using histopathology images. It analyzes features to predict cancer type and selects the best model. Methods like logistic regression, random forest, KNN, and SVM are applied and evaluated on over 5000 labeled breast tissue image patches. Neural networks and mammograms are also discussed for cancer detection applications. The goal is to accurately identify invasive ductal carcinoma and help radiologists diagnose breast cancer.

Uploaded by

Rishabh Khosla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Breast Cancer Using Image Processing

This document discusses using machine learning algorithms to classify breast cancer as benign or malignant using histopathology images. It analyzes features to predict cancer type and selects the best model. Methods like logistic regression, random forest, KNN, and SVM are applied and evaluated on over 5000 labeled breast tissue image patches. Neural networks and mammograms are also discussed for cancer detection applications. The goal is to accurately identify invasive ductal carcinoma and help radiologists diagnose breast cancer.

Uploaded by

Rishabh Khosla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Breast Cancer Detection

Objective and Scope


This analysis aims to observe which features are most helpful in predicting malignant or benign
cancer and to see general trends that may aid us in model selection and hyper parameter selection.
The goal is to classify whether the IDC breast cancer is benign or malignant. To achieve this machine
learning classification methods have been used to fit a function that can predict the discrete class of
new input.

Process Description
Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is
the most common form of breast cancer. Accurately identifying and categorizing breast cancer
subtypes is an important clinical task, and automated methods can be used to save time and reduce
error.

The goal of this script is to identify IDC when it is present in otherwise unlabelled histopathology
images. The dataset consists of approximately five thousand 50x50 pixel RGB digital images of H&E-
stained breast histopathology samples that are labelled as either IDC or non-IDC. These numpy
arrays are small patches that were extracted from digital images of breast tissue samples. The breast
tissue contains many cells but only some of them are cancerous. Patches that are labelled "1"
contain cells that are characteristic of invasive ductal carcinoma.

The methodology involves use of classification techniques like Logistic Regression, Random Forest
Classifier, K Nearest Neighbour, Support Vector Machine, Linear SVC, Gaussian NB, Decision Tree
Classifier.

Neural Networks

A neural network is a series of algorithms that endeavours to recognize underlying relationships in a


set of data through a process that mimics the way the human brain operates. In this sense, neural
networks refer to systems of neurons, either organic or artificial in nature. Neural networks can
adapt to changing input; the network generates the best possible result without needing to redesign
the output criteria.

Mammogram

Mammography is one of the most effective methods used in hospitals and clinics for early detection
of breast cancer. It has been proven effective to reduce mortality as much as by 30%. The main
objective of screening mammography is to early detect the cancerous tumour and remove it before
the establishment of metastases. The early signs for breast cancer are masses and microcalcification
but the abnormalities and normal breast tissues are often difficult to be differentiated due to their
subtle appearance and ambiguous margins. Only about 3% of the required information are revealed
during a mammogram where a part of suspicious region is covered with vessels and normal tissues.
This situation may cause the radiologists difficult to identify a cancerous tumour. Thus, computer-
aided diagnosis (CAD) has been developed to overcome the limitation of mammogram and assists
the radiologists to read the mammograms much better. ANN model is the most commonly used in
CAD for mammography interpretation and biopsy decision making.
Classification Techniques:
Logistic Regression

Logistic Regression uses an equation similar to Linear Regression but the outcome of logistic
regression is a categorical variable whereas it is a value for other regression models.

Nearest Neighbour

K-Nearest Neighbour is a supervised machine learning algorithm as the data given to it is labelled. It
is a nonparametric method as the classification of test data point relies upon the nearest training
data points rather than considering the dimensions (parameters) of the dataset.

Support Vector Machines

Support Vector Machine is a supervised machine learning algorithm which is doing well in pattern
recognition problems and it is used as a training algorithm for studying classification and regression
rules from data. SVM is most precisely used when the number of features and number of instances
are high.

Random Forest Classifier

Random forest, like its name implies, consists of a large number of individual decision trees that
operate as an ensemble. Each individual tree in the random forest spits out a class prediction and
the class with the most votes becomes our model’s prediction.

Machine Learning Algorithm Benefits Assumptions and /or


Limitations
Decision Tree Easy to understand. Classes must be mutually
Efficient training algorithm. exclusive.
Order of instances has no Decision tree depend upon
effect on training. order of attribute selection.
Naive Bayes Based on statistical modelling. Assumes attributes to be
Easy to understand. statistically independent.
Efficient training algorithm. Assumes normal distribution
Order of instances has no on numeric attributes.
effect on training. Classes must be mutually
Useful across multiple exclusive.
domains. Redundant attributes mislead
classification.
Neural Network Used for classification or Difficult to understand
regression. structure of algorithm.
Able to represent Boolean Too many attributes can result
functions. in over fitting.
Tolerate noisy inputs.
Support Vector Machine Models non-linear class Training is slow compared to
boundaries. Bayes & Decision tree.
Easy to control complexity of Difficult to understand
decision rule. structure of algorithm.
Conclusion
The goal of the project is to present an application of different machine learning algorithms, using
Neural Networks for the diagnosis of Invasive Ductal Carcinoma (IDC). The project is divided into two
parts that would be executed sequentially. The first is the model selection phase in which various ML
models would be tested on the data to find the one that is best suited for the project. The second
phase is the model evaluation phase in which the IDC in breast would be predicted.

References and bibliography:


❏http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29

❏ https://arxiv.org/abs/1609.04802

❏ http://torch.ch/blog/2016/02/04/resnets.html

❏ A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Image net classification with deep convolutional
neural networks,” in Proceedings of theAdvances in Neural Information Processing Systems, 2012,
pp. 1097–1105.

❏ R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve example-based single image super
resolution,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2016,pp. 1865–1873.

You might also like