Report
Report
Is a bonafide work carried out by them under the supervision of Prof. Kirti Walke
and it is approved for the partial fulfillment of the requirement of Savitribai Phule
Pune University, for the award of the Degree of Bachelor of Engineering (Information
Technology).
The project work has not been earlier submitted to any other institute or
university for the award of degree or diploma.
Seal
Acknowledgement
We express our sense of gratitude towards our project guide Prof. Kirti Walke for
his/her valuable guidance at every step of study of this project, also his/her contribu-
tion for the solution of every problem at each stage.
We are thankful to Prof. Dr. P.D.Halle Head,Department of Information Tech-
nology, all the staff members and project Coordinator Prof. A. T. Sonawane who
extended the preparatory steps of this project. We are very much thankful to respected
Principal Dr. M. S. Rohakale for his support and providing all facilities for project.
Finally we want to thank to all our friends for their support & suggestions. Last
but not the least we want to express thanks to our family for giving us support and confi-
dence at each and every stage of this project.
I
Abstract
This paper focuses on the data-driven diagnosis of polycystic ovary syndrome (PCOS)
in women. For this, machine learning algorithms are applied to a dataset freely avail-
able in Kaggle repository. This dataset has many attributes of 541 women, among
which 177 are patients of PCOS disease. Firstly, univariate feature selection algorithm
is applied to find the best features that can predict PCOS. The ranking of the attributes
is computed and it is found that the most important attribute is the ratio of Follicle-
stimulating hormone (FSH) and Luteinizing hormone (LH). Next, holdout and cross
validation methods are applied to the dataset to separate the training and testing data.
A number of classifiers such as SVM are applied to the dataset. Results show that the
first 10 highest ranked attributed are good enough to predict the PCOS disease. Re-
sults also demonstrate that SVM exhibits the best testing accuracy of 91.01applied to
the 10 most important features. Hence, SVM is suitable for reliably classifying PCOS
patients. Also The world’s deadliest and most prevalent cancer is breast cancer. This
cancer can be prevented if it is treated early. Currently, this malignancy is diagnosed
via manual clinical procedures, which increases the risk of human mistake and delays
the healing process. As a result, we suggest a Convolutional Neural Network (CNN)
model used in conjunction with a transfer learning strategy. Breast tissue cancer cells
are found using the histopathology image collection. By adjusting different optimiz-
ers (Adam, SGDM, and RMSProp) for each transfer learning model, we compare the
accuracy-based performance of various models. The outcomes demonstrate that the
accuracy of the CNN algorithm with Adam optimizer is superior.
Keywords: PCOS, breast cancer, SVM, CNN, image processing, machine learning.
II
Contents
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Justification of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Need for the New System . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
III
6.1 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.1.1 Level-0 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.1.2 Level-1 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2.4 Deployment Diagram . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.5 Component Diagram . . . . . . . . . . . . . . . . . . . . . . . . 19
7 System Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1 Code Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.3 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.4 Protocols Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8 Working Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.1 GUI of Working Module . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.2 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
9 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.1 Test Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.1.1 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.1.2 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.1.3 Integration Testing . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.1.4 Validation Testing . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.2 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
10 Conclusion...................................................................................................................28
10.1 Conclusion..............................................................................................................28
Bibliography......................................................................................................................29
Appendices.........................................................................................................................31
IV
Nomenclature
Shamir, Adleman
V
List of Figures
VI
List of Tables
VII
Chapter 1
Introduction
1.1 Overview
Technology is changing every outlook of our lives making remarkable transformations
in the healthcare industry, nowadays technology and humans are working hand in
hand. For example, robots performing surgeries once seemed a fiction but now they
are performing critical and complex surgeries in hospitals. Machine learning is a sub-
class of artificial intelligence, it helps the system learn, identify patterns of datasets,
make logical decisions and performing digital analysis on digital information including
words, numbers, images and clicks. Machine Learning applications mainly include im-
age recognition, data prediction, Medical Diagnosis – Health Care and Clinical Care,
etc. In this world of technology many advancements are taking place for detection of
PCOS and breast cancer detection system.
Polycystic ovary syndrome (PCOS), also known as polycystic ovarian syndrome, is
hormonal endocrine disorder among women of reproductive age. Over five million
women worldwide in their reproductive age are suffering from PCOS. The most com-
mon symptoms of this disorder may include missed periods, irregular periods, or very
light periods, it affects in a way that ovaries become large or may contain many cysts,
it can also cause excess body hair, including the chest, stomach, and hirsutism, can
cause weight gain, especially around the abdomen, Acne or oily skin. The exact
pathophys- iology of PCOS is not yet known. This heterogenous disorder is
characterized by the ovaries mainly. PCOS is a multifactorial and polygenic
condition. Machine Learning is capable of ”learning” features from very large
amount through clinical practice to diagnose this disorder. This paper put forwards a
solution to this problem which helps in early detection and prediction of PCOS
treatment from an optimal and minimal
1
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
set of parameters which have been statistically analyzed. The solution is built using
machine learning algorithms such as Support Vector Classifier. Breast cancer is the
deadliest and most common cancer in the world. Early treatment of this cancer can
help to nip it in the bud. In present medical setting, this cancer is identified by manual
clinical procedures, which can lead to human errors and further delay the treatment
procedure. So, we propose a Convolutional Neural Network (CNN) model employed
with transfer learning approach. The histopathological image dataset is used to de-
tect cancer cells in the tissues of the breast. We examine the performance of different
models based on their accuracy, by varying different optimizers (Adam, SGDM and
RMSProp) for each transfer learning model. The results show that the CNN algorithm
with Adam optimizer outperforms in terms of accuracy.
1.2 Motivation
Many cancer investigators have become increasingly aware in recent years that cancer
may be a systemic disease almost from its inception. It is also becoming apparent
that most cancer when it is brought to the attention of the physician by the patient
is already in a late stage. The obvious lesion may be removed, but the seeds are in a
sufficiently advanced stage of development, or the body resistance to them has suffi-
ciently deteriorated, so that the patient is not cured.
1.3 Objectives
• We will be building a model which will detect whether a person has cancer or
not.
• We will use various technologies like machine learning and image processing to
detect Breast Cancer.
• To detect the tumor and classify the tissue of tumor area. Using the preprocess-
ing, segmentation , feature extraction, optimization, and classification .
Literature Survey
4
Chapter 3
Problem Statement
5
Chapter 4
IDE: Spyder
4.1.1 PerformanceRequirements
The performance of the functions and every module must be well. The overallperfor-
mance of the software will enable the users to work eciently. Perfor-mance of encryp-
tion of data should be fast. Performance of the providingvirtual environment should
be fastSafety Requirement•The application is designed in modules where errors can be
detected and xedeasily. This makes it easier to install and update new functionality if
required.
6
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
Availability: This software is freely available to all users. The availability of the soft-
ware is easy for everyone.
Maintainability: After the deployment of the project if any error occurs then it can be
easily maintained by the software developer.
Reliability: The performance of the software is better which will increase the reliabili-
tyof the Software.
User Friendliness: Since, the software is a GUI application; the output generated is
much user friendly in its behavior.
Integrity: Integrity refers to the extent to which access to software or data by unau-
thorized persons can be controlled.
Security: Users are authenticated using many security phases so reliable security is
provided.
9
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
6.1 DFD
In Data Flow Diagram,we Show that flow of data in our system in DFD0 we show
that base DFD in which rectangle present input as well as output and circle show our
system,In DFD1 we show actual input and actual output of system input of our
system is text or image and output is rumor detected like wise in DFD 2 we present
operation of user as well as admin.
11
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
6.2 UML
Unified Modeling Language is a standard language for writing software blueprints.The
UML may be used to visualize,specify,construct and document the artifacts of a soft-
wareintensive system.UML is process independent,although optimally it should be used
in process that is use case driven,architecture-centric,iterative,and incremental.The
Number of UML Diagram is available.
Chapter 7
System Implementation
7.2 Algorithm
A Convolutional Neural Network (CNN) is a type of deep learning algorithm specif-
ically designed for image processing and recognition tasks. Compared to alternative
classification models, CNNs require less preprocessing as they can automatically learn
hierarchical feature representations from raw input images. They excel at assigning
importance to various objects and features within the images through convolutional
layers, which apply filters to detect local patterns.
The connectivity pattern in CNNs is inspired by the visual cortex in the human brain,
where neurons respond to specific regions or receptive fields in the visual space.
This architecture enables CNNs to effectively capture spatial relationships and
patterns in
In theory, the SVM algorithm, aka the support vector machine algorithm, is linear.
What makes the SVM algorithm stand out compared to other algorithms is that it
can deal with classification problems using an SVM classifier and regression problems
using an SVM regressor. However, one must remember that the SVM classifier is the
backbone of the support vector machine concept and, in general, is the aptest algorithm to
solve classification problems.
Being a linear algorithm at its core can be imagined almost like a Linear or Logis- tic
Regression. For example, an SVM classifier creates a line (plane or hyper-plane,
depending upon the dimensionality of the data) in an N-dimensional space to clas-
sify data points that belong to two separate classes. It is also noteworthy that the
original SVM classifier had this objective and was originally designed to solve binary
classification problems, however unlike, say, linear regression that uses the concept of
line of best fit, which is the predictive line that gives the minimum Sum of Squared
Error (if using OLS Regression), or Logistic Regression that uses Maximum Likelihood
Estimation to find the best fitting sigmoid curve, Support Vector Machines uses the
concept of Margins to come up with predictions.
9.1 Methodologies
Working Modules
26
27
28
29
30
31
32
8.2 Experim
ental
Results
33
Figure 8.2.1: Main GUI Page
34
Figure 8.2.3: Registration Page
35
Chapter 9
Testing
1. White-box testing
In white-box testing an internal perspective of the system, as well as programming
skills, are used to design test cases.
2. Black-box testing
Black-box testing treats the software as a quot;black boxquot;, examining func-
tionality without any knowledge of internal implementation. The testers are only
aware of what the software is supposed to do, not how it does it.
3. Grey-box testing
Grey-box testing involves having knowledge of internal data structures and algo-
rithms for purposes of designing tests, while executing those tests at the user, or
black-box level. The tester is not required to have full access to the software.
36
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
that program inputs produce valid outputs. All decision branches and internal code
flow should be validated. This is a structural testing, that relies on knowledge of its
construction and is invasive. Unit tests perform basic tests at component level and test
a specific business process,application, and/or system configuration. Unit tests ensure
that each unique path of a business process performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
an application. Test case is a commonly used term for a specific test. It consists of
information such as test steps, verification steps, outputs, test environment, etc. It is a
set of inputs, execution preconditions and expected outcomes. Test cases are executed
against software products, and then expected outcomes are compared with actual re-
sults to determine test has failed or passed. Table shows the test cases and results. It
is a set of inputs, execution preconditions and expected outcomes. Test cases to be
verified for the project as per mentioned below:
1) Input prime number and points on x and y axis.
2) Calculating elliptic curve points.
3) Input generator (Base) point for communication.
4) Entering Private keys.
5) Calculating Public keys.
6) Send secret message.
7) Encryption and Decryption.
Test Results
Figure 9.1.2:
Login test cases
Conclusion
10.1 Conclusion
This project will be done using python. In this project we will learn to built the Breast
cancer prediction on the dataset. and On That basis of Image Dataset we will create
results and detect Breast Cancer. It has been observed that a good dataset provides
better accuracy. We Detect cancer in early stage and Give Proper Response to
human. Also polycystic Ovary Syndrome (PCOS) is a medical condition which causes
hormonal disorder in women in their childbearing years. The hormonal imbalance
leads to a de- layed or even absent menstrual cycle. Women with PCOS majorly suffer
from excessive weight gain, facial hair growth, acne, hair loss, skin darkening and
irregular periods leading to infertility in rare cases. Our proposed system helps in
early detection and prediction of PCOS treatment from an optimal and minimal set of
parameters which have been statistically analysed using the chi- square method. The
SVM Classifer was found to be the most reliable and most accurate among 4 others
with accuracy being 90.9percent.
42
Bibliography
6 Gayathri BM, Sumathi CP. , “Comparative study of relevance vector machine with
various machine learning techniques used for detecting breast cancer,” In Compu-
tational Intelligence and Computing Research (ICCIC), 2016 IEEE International
Conference on 2016 Dec 15 (pp. 1-5). IEEE.
43
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
7 Forsyth AW, Barzilay R, Hughes KS, Lui D, Lorenz KA, Enzinger A, Tulsky JA,
Lindvall C., “Machine learning methods to extract documentation of breast cancer
symptoms from electronic health records,”Journal of pain and symptom manage-
ment. 2018 Jun 1;55(6):1492-9.
31
Detecting Stein-Leventhal Syndrome And Ductal Carcinoma
B. Base Paper(s)
C. Tools Used
Anaconda Navigator
I. List of Publications
II. Certificates
Figure