0% found this document useful (0 votes)
21 views27 pages

School of Studies Engineering and Technology

Project

Uploaded by

xchandrabhan11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views27 pages

School of Studies Engineering and Technology

Project

Uploaded by

xchandrabhan11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF STUDIES ENGINEERING AND TECHNOLOGY


GURU GHASIDAS VISHWAVIDYALAYA BILASPUR, CHHATTISGARH
( A CENTRAL UNIVERSITY )
MINOR PROJECT
ON
“BREAST CANCER DETECTION”
Under the Guidance of Submitted by

Dr. Alok kumar Singh Ankit Kumar (19103007)


( HOD, CSE Chandrabhan Chauhan (19103309)
Department ) Praduman kumar (19103345)
(III Year , VI th Sem)
 Breast cancer -A cancer that forms in the cells of the breasts.
 Breast cancer can occur in women and rarely in men.
 Symptoms of breast cancer include a lump in the breast, bloody
discharge from the nipple and changes in the shape or texture of the
nipple or breast.
 Its treatment depends on the stage of cancer. It may consist of
chemotherapy, radiation, hormone therapy and surgery.
 Breast cancer is a disease in which cells in the breast grow out of
control.
INTRODUCTION
 There are different kinds of breast cancer. The kind of breast cancer
depends on which cells in the breast turn into cancer.
 Breast cancer can begin in different parts of the breast.
 The early diagnosis of Breast cancer can improve the prognosis and
chance of survival significantly, as it can promote timely clinical
treatment to patients.
 Breast cancer is one of the most common cancers among women worldwide,
representing the majority of new cancer cases and cancerrelated deaths according
to global statistics, making it a significant public health problem in today’s society.
 Tumors can be benign (noncancerous) or malignant (cancerous).
 Benign tumors tend to grow slowly and do not spread.
 Malignant tumors can grow rapidly, invade and destroy nearby normal tissues,
and spread throughout the body.
 The early diagnosis of Breast cancer can improve the prognosis and chance of
PROBLEM survival significantly, as it can promote timely clinical treatment to patients.
 The lack of strong prognosis models results in difficulty for doctors to prepare
STATEMENT a treatment plan that may prolong patient survival time.
 The requirement of time is to develop the technique which gives minimum
error to increase accuracy.
 Our aim is to develop a model which can make prediction on the
basis of features extracted from the Mamography images that,
whether the person is Malignant or Benign.
 Develop the model which gives minimum error to increase
accuracy.
 For that we taking datasets from kaggle Breast Cancer Minimias
Data Set Predict whether the cancer is benign or malignant
 The data is collected at
OUR AIM
https://www.kaggle.com/kmader/miasmammography/download
CONVOLUTIONAL NEURAL NETWORK (CNN) :-

 A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image,
assign importance (learnable weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other
classification algorithms.
 CNN is mainly used for image classification, segmentation, and also for other co-related fields. a CNN can predict
the objects inside an image by just looking at it like we humans do.
 We are using CNN for this project and it is particularly well-suited to analyzing the images , such as MRI results or
x-rays
PROPOSED METHODOLOGY

Breast Data Breast Cancer


Cancer Dataset Pre-processing Detection
Preparation andAugmentation Model Design

Model
Train Model
Prediction
DATA SET PREPARATION

 The Mammograph dataset was downloaded from the open data repository for diagnosing breast
cancer from mammograms (i.e from Kaggle website ) . The Content-based visual information
retrieval technique will be used to retrieve the input images from the database with minimum time
consumption. In our project, this step is not useful because the dataset is well structured and have
only necessary data .
 The data is collected at https://www.kaggle.com/kmader/miasmammography/download
ALGORITHM USED

We have gone through various architectures for this project to aquire maximum accuracy and minimum
loss, which includes ResNet101,MobileNetV2 and VGG-16. we got maximum accuracy and minimum loss
with VGG-16 when compared to others.

VGG-16:

The major shortcoming of too many hyper-parameters ofAlexNet was solved by VGG Net by replacing
large kernel-sized filters (11 and 5 in the first and second convolution layer, respectively) with multiple
3×3 kernel-sized filters one after another.

The architecture consist of 3x3 Convolutional filters, 2x2 Max Pooling layer with a stride of 1, keeping the
padding same to preserve the dimension. In total, there are 16 layers in the network where the input
image is RGB format with dimension of 224*224*3, followed by 5 pairs of Convolution(filters: 64,
128,256,512,512) and Max Pooling.The output of these layers is fed into three fully connected layers and
a softmax function in the output layer. In total there are 138 Million parameters in VGG Net.

Advantages :-
It is a very good architecture for benchmarking on a particular task

Also, pre-trained networks for VGG are available freely on the internet, so it is commonly used out of the box
for various applications.

Disadvantages:-
It is painfully slow to train.
The network architecture weights themselves are quite large (concerning disk/bandwidth).
ResNet-101:
ResNet-101 is a convolutional neural network
that is 101 layers deep. You can load a pretrained
version of the network trained on more than a
million images from the ImageNet database. The
pretrained network can classify images into 1000
object categories, such as keyboard, mouse, pencil,
and many animals. As a result, the network has
learned rich feature representations for a wide
range of images. The network has an image input
size of 224x224 pixels.
Advantages :-
Networks with large number (even thousands) of layers can be trained easily without increasing the
training error percentage.

ResNet help in tackling the vanishing gradient problem using identity mapping.

Disadvantages:-
Although ResNet has proven powerful in many applications, one major drawback is that deeper network usually
requires weeks for training, making it practically infeasible in real-world applications
MobileNetV2:
MobileNetV2 is a convolutional neural network
architecture that seeks to perform well on mobile
devices. It is based on an inverted residual
structure where the residual connections are
between the bottleneck layers. The intermediate
expansion layer uses lightweight depthwise
convolutions to filter features as a source of non-
linearity. As a whole,
the architecture of MobileNetV2 contains the
initial fully convolution layer with 32 filters,
followed by 19 residual bottleneck layers.
Advantages :-
Reduced network size . Reduced number of parameters .

Faster in performance and are useful for mobile applications . Small, low-latency convolutional
neural network.

Disadvantages:-
Advantages always come up with some disadvantages and with MobileNet, it’s the accuracy. Yes! Eventhough
MobileNet has reduced size, reduced parameters and performs faster, it is less accurate than other state-of-the-art
networks.
There is only a slight reduction in accuracy when compared to other networks.
Area Under
Algorithms Accuracy Curvature Loss
Resnet 101 0.974 0.996 0.071

MobileNetV2 0.955 0.992 0.112

VGG 16 0.977 0.998 0.059


Model: "sequential_1"
________________________________________________________________
Layer (type) Output Shape Param

vgg16 (Functional) (None, 7, 7, 512) 14714688


flatten_1 (Flatten) (None, 25088) 0
batch_normalization_2 (Batch Normalization) (None, 25088) 100352
dense_2 (Dense) (None, 256) 6422784
MODEL batch_normalization_3 (Batch Normalization) (None, 256) 1024
activation_1 (Activation) (None, 256) 0
SUMMARY OF dropout_1 (Dropout) (None, 256) 0
dense_3 (Dense) (None, 2) 514
VGG 16 =============================================================

=============================================================
Total params: 21,239,362
Trainable params: 6,473,986
Non-trainable params: 14,765,376
____________________________________________________________
LIBRARIES USED:-
In this project standard libraries for database analysis and model creation are used. The following are the main libraries used
in this project.

1.Numpy:-
Numpy is core library of scientific computing in python. It provides powerful tools to deal with various multi-dimensional
arrays in python. It is a general purpose array processing package.

Numpy’s main purpose is to deal with multidimensional homogeneous array. It has tools ranging from array creation to its
handling. It makes it easier to create an dimensional array just by using np.zeros() or handle its contents using various other
methods such as replace, arrange, random, save, load it also helps in array processing using methods like sum, mean, std,
max, min, all, etc.

Array created with numpy also behave differently then arrays created normally when they are operated upon using
operators such as +,-,*,/. All the above qualities and services offered by numpy array makes it highly suitable for our purpose
of handling data. Data manipulation occurring in arrays while performing various operations need to give the desired results
while predicting outputs require such high operational capabilities.
2. pandas : it is the most popular python library used for data analysis. It provides highly optimized performance with back-
end source
code purely written in C or python.
Data in python can be analysed with 2 ways

* Series
* Dataframes
Series is one dimensional array defined in pandas used to store any data type. Dataframes are two-dimensional data structure
used in python to store data consisting of rows and columns.

Pandas dataframe is used extensively in this project to use datasets required for training and testing the algorithms.
Dataframes makes it easier to work with attributes and results. Several of its inbuilt functions such as replace were used in
our project for data manipulation and preprocessing.

3. sklearn:-
Sklearn is an open source python library with implements a huge range of machine learning, pre-processing, cross-validation
and visualization algorithms. It features various simple and efficient tools for data mining and data processing. It features
various classification, regression and clustering algorithm. In this project we have used sklearn to get advantage of inbuilt
classification algorithms like metrics ,model selection, support vector machine and decomposition.
4. Matplotlib:- for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab
feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the
inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex
commands to add math to your plot.

5. TensorFlow:-TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can
be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

Conclusion And Result:-


We have demonstrated how to classify whether a given mammograph consists cancerous tissues or not . We build
our model which separates it from other models that rely heavily on deep learning approach. Our model training for
VGG-16 accuracy is 0.977 and loss is 0.05 .
THANK YOU

You might also like