0% found this document useful (0 votes)
20 views12 pages

Breast Cancer Survival Prediction With Machine Learning

The document discusses predicting breast cancer patient survival using machine learning. It introduces a dataset of over 400 breast cancer patients containing information like age, tumor stage, surgery type, and patient status. The task is to predict whether patients will survive after surgery. The document walks through importing data, checking for null values, and gaining insights about the dataset columns. It then splits the data into training and test sets to build a support vector machine model to predict breast cancer survival.

Uploaded by

kranti29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Breast Cancer Survival Prediction With Machine Learning

The document discusses predicting breast cancer patient survival using machine learning. It introduces a dataset of over 400 breast cancer patients containing information like age, tumor stage, surgery type, and patient status. The task is to predict whether patients will survive after surgery. The document walks through importing data, checking for null values, and gaining insights about the dataset columns. It then splits the data into training and test sets to build a support vector machine model to predict breast cancer survival.

Uploaded by

kranti29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Breast Cancer Survival Prediction with


Machine Learning
AMAN KHARWAL / MARCH 8, 2022 / MACHINE LEARNING / 6

Breast cancer is one of the types of cancer that starts in the


breast. It occurs in women, but men can get breast cancer too. It
is the second leading cause of death in women. As the use of
data in healthcare is very common today, we can use machine
learning to predict whether a patient will survive a deadly
disease like breast cancer or not. So if you want to learn how to
predict the survival of a breast cancer patient, this article is for
you. In this article, I will take you through the task of breast
cancer survival prediction with machine learning using Python.

Breast Cancer Survival Prediction


with Machine Learning
You have a dataset of over 400 breast cancer patients who
underwent surgery for the treatment of breast cancer. Below is
the information of all columns in the dataset:

1. Patient_ID: ID of the patient


https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 1/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

2. Age: Age of the patient


3. Gender: Gender of the patient
4. Protein1, Protein2, Protein3, Protein4: expression levels
5. Tumor_Stage: Breast cancer stage of the patient
6. Histology: Infiltrating Ductal Carcinoma, Infiltration Lobular
Carcinoma, Mucinous Carcinoma
7. ER status: Positive/Negative
8. PR status: Positive/Negative
9. HER2 status: Positive/Negative
10. Surgery_type: Lumpectomy, Simple Mastectomy, Modified
Radical Mastectomy, Other
11. DateofSurgery: The date of Surgery
12. DateofLast_Visit: The date of the last visit of the patient
13. Patient_Status: Alive/Dead

So by using this dataset, our task is to predict whether a breast


cancer patient will survive or not after the surgery.

I hope you have an overview of the dataset we are using for the
task of breast cancer survival prediction. This dataset was
collected from Kaggle. You can download this dataset from here.
Now, in the section below, I will walk you through the task of
predicting breast cancer survival with machine learning using
Python.

Breast Cancer Survival Prediction


using Python
https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 2/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

I will start the task of breast cancer survival prediction by


importing the necessary Python libraries and the dataset we
need:

1 import pandas as pd
2 import numpy as np
3 import plotly.express as px
4 from sklearn.model_selection import train_test_spl
5 from sklearn.svm import SVC
6 ​
7 data = pd.read_csv("BRCA.csv")
8 print(data.head())

Patient_ID Age Gender Protein1 Protein2 Protein3


Protein4 \
0 TCGA-D8-A1XD 36.0 FEMALE 0.080353 0.42638 0.54715
0.273680
1 TCGA-EW-A1OX 43.0 FEMALE -0.420320 0.57807 0.61447
-0.031505
2 TCGA-A8-A079 69.0 FEMALE 0.213980 1.31140 -0.32747
-0.234260
3 TCGA-D8-A1XR 56.0 FEMALE 0.345090 -0.21147 -0.19304
0.124270
4 TCGA-BH-A0BF 56.0 FEMALE 0.221550 1.90680 0.52045
-0.311990

Tumour_Stage Histology ER status PR status


HER2 status \
0 III Infiltrating Ductal Carcinoma Positive Positive
Negative
1 II Mucinous Carcinoma Positive Positive
Negative
2 III Infiltrating Ductal Carcinoma Positive Positive
Negative
3 II Infiltrating Ductal Carcinoma Positive Positive
https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 3/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Negative
4 II Infiltrating Ductal Carcinoma Positive Positive
Negative

Surgery_type Date_of_Surgery Date_of_Last_Visit


\
0 Modified Radical Mastectomy 15-Jan-17 19-Jun-17
1 Lumpectomy 26-Apr-17 09-Nov-18
2 Other 08-Sep-17 09-Jun-18
3 Modified Radical Mastectomy 25-Jan-17 12-Jul-17
4 Other 06-May-17 27-Jun-19

Patient_Status
0 Alive
1 Dead
2 Alive
3 Alive
4 Dead

Let’s have a look at whether the columns of this dataset contains


any null values or not:

1 print(data.isnull().sum())

Patient_ID 7
Age 7
Gender 7
Protein1 7
Protein2 7
Protein3 7
Protein4 7
Tumour_Stage 7
Histology 7
ER status 7
PR status 7
HER2 status 7

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 4/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Surgery_type 7
Date_of_Surgery 7
Date_of_Last_Visit 24
Patient_Status 20
dtype: int64

So this dataset has some null values in each column, I will drop
these null values:

1 data = data.dropna()

Now let’s have a look at the insights about the columns of this
data:

1 data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 317 entries, 0 to 333
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Patient_ID 317 non-null object
1 Age 317 non-null float64
2 Gender 317 non-null object
3 Protein1 317 non-null float64
4 Protein2 317 non-null float64
5 Protein3 317 non-null float64
6 Protein4 317 non-null float64
7 Tumour_Stage 317 non-null object
8 Histology 317 non-null object
9 ER status 317 non-null object
10 PR status 317 non-null object
11 HER2 status 317 non-null object
12 Surgery_type 317 non-null object

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 5/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

13 Date_of_Surgery 317 non-null object


14 Date_of_Last_Visit 317 non-null object
15 Patient_Status 317 non-null object
dtypes: float64(5), object(11)
memory usage: 42.1+ KB

Breast cancer is mostly found in females, so let’s have a look at


the Gender column to see how many females and males are
there:

1 print(data.Gender.value_counts())

FEMALE 313
MALE 4
Name: Gender, dtype: int64

As expected, the proportion of females is more than males in the


gender column. Now let’s have a look at the stage of tumour of
the patients:

1 # Tumour Stage
2 stage = data["Tumour_Stage"].value_counts()
3 transactions = stage.index
4 quantity = stage.values
5 ​
6 figure = px.pie(data,
7 values=quantity,
8 names=transactions,hole = 0.5,
9 title="Tumour Stages of Patients")
10 figure.show()

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 6/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

So most of the patients are in the second stage. Now let’s have
a look at the histology of breast cancer patients. (Histology is a
description of a tumour based on how abnormal the cancer cells
and tissue look under a microscope and how quickly cancer can
grow and spread):

1 # Histology
2 histology = data["Histology"].value_counts()
3 transactions = histology.index
4 quantity = histology.values
5 figure = px.pie(data,
6 values=quantity,
7 names=transactions,hole = 0.5,
8 title="Histology of Patients")
9 figure.show()

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 7/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Now let’s have a look at the values of ER status, PR status, and


HER2 status of the patients:

1 # ER status
2 print(data["ER status"].value_counts())
3 # PR status
4 print(data["PR status"].value_counts())
5 # HER2 status
6 print(data["HER2 status"].value_counts())

Positive 317
Name: ER status, dtype: int64
Positive 317
Name: PR status, dtype: int64
Negative 288
Positive 29
Name: HER2 status, dtype: int64

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 8/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Now let’s have a look at the type of surgeries done to the


patients:

1 # Surgery_type
2 surgery = data["Surgery_type"].value_counts()
3 transactions = surgery.index
4 quantity = surgery.values
5 figure = px.pie(data,
6 values=quantity,
7 names=transactions,hole = 0.5,
8 title="Type of Surgery of Patients")
9 figure.show()

So we explored the data, the dataset has a lot of categorical


features. To use this data to train a machine learning model, we
need to transform the values of all the categorical columns. Here
is how we can transform values of the categorical features:

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 9/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

1 data["Tumour_Stage"] = data["Tumour_Stage"].map({"
2 data["Histology"] = data["Histology"].map({"Infilt
3 "Infilt
4 data["ER status"] = data["ER status"].map({"Positi
5 data["PR status"] = data["PR status"].map({"Positi
6 data["HER2 status"] = data["HER2 status"].map({"Po
7 data["Gender"] = data["Gender"].map({"MALE": 0, "F
8 data["Surgery_type"] = data["Surgery_type"].map({"
9 "
10 print(data.head())

Patient_ID Age Gender Protein1 Protein2 Protein3


Protein4 \
0 TCGA-D8-A1XD 36.0 1 0.080353 0.42638 0.54715
0.273680
1 TCGA-EW-A1OX 43.0 1 -0.420320 0.57807 0.61447
-0.031505
2 TCGA-A8-A079 69.0 1 0.213980 1.31140 -0.32747
-0.234260
3 TCGA-D8-A1XR 56.0 1 0.345090 -0.21147 -0.19304
0.124270
4 TCGA-BH-A0BF 56.0 1 0.221550 1.90680 0.52045
-0.311990

Tumour_Stage Histology ER status PR status HER2 status


Surgery_type \
0 3 1 1 1 2
2
1 2 3 1 1 2
3
2 3 1 1 1 2
1
3 2 1 1 1 2
2
4 2 1 1 1 2

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 10/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Date_of_Surgery Date_of_Last_Visit Patient_Status


0 15-Jan-17 19-Jun-17 Alive
1 26-Apr-17 09-Nov-18 Dead
2 08-Sep-17 09-Jun-18 Alive
3 25-Jan-17 12-Jul-17 Alive
4 06-May-17 27-Jun-19 Dead

Breast Cancer Survival Prediction


Model
We can now move on to training a machine learning model to
predict the survival of a breast cancer patient. Before training the
model, we need to split the data into training and test set:

1 # Splitting data
2 x = np.array(data[['Age', 'Gender', 'Protein1', 'P
3 'Tumour_Stage', 'Histology', 'E
4 'HER2 status', 'Surgery_type']]
5 y = np.array(data[['Patient_Status']])
6 xtrain, xtest, ytrain, ytest = train_test_split(x,

Now here’s how we can train a machine learning model:

1 model = SVC()
2 model.fit(xtrain, ytrain)

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 11/17
7/7/23, 9:55 PM Breast Cancer Survival Prediction with Machine Learning | Aman Kharwal

Now let’s input all the features that we have used to train this
machine learning model and predict whether a patient will
survive from breast cancer or not:

1 # Prediction
2 # features = [['Age', 'Gender', 'Protein1', 'Prote
3 features = np.array([[36.0, 1, 0.080353, 0.42638,
4 print(model.predict(features))

['Alive']

Summary
So this is how we can use machine learning for the task of
breast cancer survival prediction. As the use of data in
healthcare is very common today, we can use machine learning
to predict whether a patient will survive a deadly disease like
breast cancer or not. I hope you liked this article on Breast
cancer survival prediction with machine learning using Python.
Feel free to ask valuable questions in the comments section
below.

https://thecleverprogrammer.com/2022/03/08/breast-cancer-survival-prediction-with-machine-learning/ 12/17

You might also like