2-Breast Cancer Detection Using K-Nearest Neighbor Machine Learning Algorithm
2-Breast Cancer Detection Using K-Nearest Neighbor Machine Learning Algorithm
Abstract— Breast cancer is very popular between females neural network that consists of many neurons in many
all over the world. However, detecting this cancer in its first layers.
stages helps in saving lives. Radiologists have the ability to
predict if the mammography images have cancer or not, but A group of neurons in the neural network have separate
they may miss about 15% of them. In this paper, we proposed functions at the same time [6]. In the neural networks there
a new method to detect the breast cancer with high accuracy. is a learning stage in which the weights are adjusted to get
This method consists of two main parts, in the first part the the desired output and testing stage in which the neural
image processing techniques are used to prepare the network is tested to see its accuracy in detecting process.
mammography images for feature and pattern extraction Generally we have three types of learning that are
process. The extracted features are utilized as an input for a supervised learning that need a teacher, unsupervised
two types of supervised learning models, which are Back learning that works without teacher, and hybrid learning that
Propagation Neural Network (BPNN) model and the Logistic is between supervised and unsupervised learning [7].
Regression (LR) model with comparing the result and the
accuracy for the both models. Another type of the supervised learning algorithm is the
Logistic Regression (LR), where this algorithm is used for
Keywords— Breast Cancer, Image Processing, learning process; it is a type of statistical classification
Mammography, Machine Learning. model. This model is used for predicting the outputs of
many probable outcomes. The mathematical equation of the
I. INTRODUCTION logistic function is shown in the following equation.
Breast cancer is the second most dangerous cancer after
lung cancer. Early detection can survive the people lives
because it is easier to treat and prevent the tumor from
expanded. Tumor is the abnormal growth of cells.
For many years, the X-ray was the only method that
was used to detect the breast cancer [1, 2]. However, many
another methods have been generated and proposed for
detecting process that are more efficient than x-ray
procedure such as, neural networks [3], artificial
intelligence, and data mining.
There is a self-test every woman can do it monthly
using her hand to check for any abnormal growing cells,
another way is going to a specialist doctor for
mammography test. Mammography is “the process of using Fig. 1: Logistic Function Curve
low-dose X-rays to examine the human breast and is used as
a diagnostic as well as a screening tool” [4].
By using the Logistic Regression (LR) model, it will
Image processing techniques are used to convert the estimate the logistic function to get a value between zero
image from one to another format and for feature extraction and one with making appealing to know the risk factor that
of the images that helps to get a more useful data set. There mean a risk for disease.
are a large number of applications that relates to the human
activities use the image processing, from remotely position
The first section of this paper will be the introduction
explanation to biomedical image interpretation [5].
then the literature review of the breast cancer is presented
Artificial neural networks (ANNs) are one of the most where the next section will be the explanation of the
common in machine learning, it simulate the human body
36
can observe that the shown values of these plots were B. Machine Learning Algorithm
distributed cross zero value.
In our work, supervised learning algorithm has been
used. Indeed, we used two types of supervised machine
learning algorithms which were the Logistic Regression and
the Backpropagation neural Network and we compared the
results from both of them.
Fig. 3. Filtered Image Where we have the values of weights of the hypothesis,
x's are the input values and the y's are the output values.
This distribution cross zero value was utilized in our Now, our purpose to optimize the cost function where this is
model to extract information. After generating a new matrix can be achieved by repeating equation 4 many times until
depending on our algorithm with the count of zero crossing reach the desired cost function.
we got the input values for the learning model. The values
of the algorithm outputs were 0's for the normal images and
1's for images that have tumor, so we need to normalize the
data by dividing these data on the maximum value for each
column of these output data. Many features and numbers were required to get the
optimal value with utilization such as the following
parameter:
x Value of A = 0.45
37
The second learning algorithm that was used in our work
was the Backpropagation Neural Network (BPNN).
Parameter Value
Fig. 7. MSE of Neural Network
Number of Hidden Layers 1
Number of Neurons in the second layer 10 The utilization percentage for training, testing, and
validation was 70%, 20%, and 10%.
Number of Neurals in the first layer 240
38
REFERENCES
39