0% found this document useful (0 votes)

16 views13 pages

Breast Cancer Diagnosis Using Machine Learning Alg

The document describes a breast cancer diagnosis dataset containing patient information like age, tumor characteristics, and genetic markers. It then discusses code to preprocess, analyze, and visualize the data using machine learning algorithms to build models that can accurately diagnose breast cancer.

Uploaded by

Azmeraw Zenaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views13 pages

Breast Cancer Diagnosis Using Machine Learning Alg

Uploaded by

Azmeraw Zenaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

ARBAMINCH UNIVERSITY

FACULITY OF COMPUTING AND SOFTWARE ENGINEERING

DEPARTMENT OF INFORMATION TECHNOLOGY

Artificial Intelligence assignment

Group members ID No

1Amanuel Asale…………………………………………………NSR/2899/13
2 Abenezer Asefa………………………………………………..NSR/101/13
3 Dagim Syum…………………………………………………...NSR/694/13
4 Fitsum Eerena ………………………………………………..NSR/1045/13
5 Ermiyas G/Hiwot………………………………………………NSR/2972/13
6 Abel Tesema……………………………………………………NSR/081/13
7 Nihal Mussa…………………………………………………….NSR/1910/13
8 Amira Neri……………………………………………………..NSR/291/13
9 Eden Yazachew…………………………………………………NSR/802/13
10 Mulusew Aynalem……………………………………………NSR/1809/13
Breast Cancer Diagnosis using Machine
Learning Algorithms
Introduction
Breast cancer remains a formidable health challenge worldwide, constituting a
significant cause of morbidity and mortality, particularly among women. Timely and
accurate diagnosis is paramount for effective treatment planning and improving
patient outcomes. The integration of machine learning (ML) algorithms into breast
cancer diagnosis offers a promising approach to harnessing complex patient data for
enhanced prognostication and therapeutic decision-making.

Problem Definition

The core objective revolves around the development of robust ML models capable of
accurately distinguishing between benign and malignant breast tumors based on
multifaceted clinical and pathological features. By leveraging diverse datasets
encompassing patient demographics, tumor characteristics, histopathological findings,
and genetic markers, the aim is to empower clinicians with sophisticated tools for risk
stratification and treatment guidance.

Dataset Description

Age:

Description: Age of the patient at the time of diagnosis.

Data Type: Continuous numerical.

Race:

Description: Ethnicity or racial background of the patient.

Data Type: Categorical (e.g., White, Black, Asian, Hispanic, etc.).

Marital Status:

Description: Marital status of the patient at the time of diagnosis.

Data Type: Categorical (e.g., Single, Married, Divorced, Widowed, etc.).

T_Stage:

Description: Tumor stage, indicating the size and extent of the primary tumor.

Data Type: Categorical or ordinal (e.g., T1, T2, T3, T4).

N Stage:

1
Description: Lymph node stage, indicating the extent of regional lymph node
involvement.

Data Type: Categorical or ordinal (e.g., N0, N1, N2, N3).

6th Stage:

Description: Cancer stage according to the 6th edition of the TNM staging system,
which incorporates tumor size (T), lymph node status (N), and metastasis (M).

Data Type: Categorical or ordinal (e.g., Stage I, Stage II, Stage III, Stage IV).

Differentiate:

Description: Histological grade or degree of tumor differentiation, indicating how

closely the tumor resembles normal tissue.

Data Type: Categorical or ordinal (e.g., Well-differentiated, Moderately-

differentiated, Poorly-differentiated).

Grade:

Description: Histological grade of the tumor, reflecting the aggressiveness and

abnormality of tumor cells.

Data Type: Categorical or ordinal (e.g., Grade 1, Grade 2, Grade 3).

A Stage:

Description: Cancer stage according to the American Joint Committee on Cancer

(AJCC) staging system, incorporating tumor size, lymph node status, and metastasis.

Data Type: Categorical or ordinal (e.g., Stage I, Stage II, Stage III, Stage IV).

Tumor Size:

Description: Size of the primary tumor, typically measured in millimeters.

Data Type: Continuous numerical.

Estrogen Status:

Description: Estrogen receptor (ER) status of the tumor, indicating whether the tumor
cells have receptors for estrogen hormone.

2
Data Type: Categorical (e.g., Positive, Negative, Unknown).

Progesterone Status:

Description: Progesterone receptor (PR) status of the tumor, indicating whether the
tumor cells have receptors for progesterone hormone.

Data Type: Categorical (e.g., Positive, Negative, Unknown).

Regional Node Examined:

Description: Number of regional lymph nodes examined during surgery or biopsy.

Data Type: Continuous numerical.

Regional Node Positive:

Description: Number of regional lymph nodes positive for cancer cells.

Data Type: Continuous numerical.

Survival Months:

Description: Duration of survival in months following the diagnosis of breast cancer.

Data Type: Continuous numerical.

Status:

Description: Survival status of the patient at the end of the observation period.

Data Type: Categorical (e.g., Alive, Dead).

Code Explanation

Importing libraries

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import seaborn as sns
import warnings
warnings.simplefilter("ignore")

In the above code we have imported the necessary libraries including

numpy : used for numerical computing in Python.

Pandas : used for data manipulation and analysis in Python.

3
matplotlib : used for creating static, animated, and interactive visualizations in
Python

sklearn.model_selection import train_test_split: Imports the train_test_split

function from the model_selection module in Scikit-learn, which is used for splitting
data into training and testing sets for machine learning models.

seaborn: Imports the Seaborn library, which is another data visualization library built
on top of Matplotlib. Seaborn provides a high-level interface for drawing attractive
and informative statistical graphics.

import warnings: Imports the warnings module, which is used to handle warning
messages in Python.

data_f=pd.read_csv("Breast_Cancer.csv")
data_f.tail(10)
pd.read_csv("Breast_Cancer.csv"): This function call reads the CSV file named
"Breast_Cancer.csv" into a pandas DataFrame. The read_csv function is a part of the
pandas library (pd). It reads the CSV file and converts it into a DataFrame, which is a
tabular data structure in pandas.

data_f.tail(10): Once the CSV file is read into the DataFrame data_f, the .tail(10)
method is called on the DataFrame. This method returns the last 10 rows of the
DataFrame. It's a way to quickly inspect the end of the dataset and see the most recent
entries.

Here is the output

data_f['Status'].value_counts()
The code data_f['Status'].value_counts() simply counts the occurrences of each unique
value in the 'Status' column of the DataFrame data_f, providing a summary of the
distribution of different statuses in the dataset.

4
Here is the output

Status
Alive 3408
Dead 616
Name: count, dtype: int64

data_f.dtypes
The code data_f.dtypes retrieves the data types of each column in the DataFrame
data_f. It returns a Series where the index contains the column names and the values
contain the corresponding data types of each column. This helps in understanding the
data types of different variables in the dataset, which is essential for data manipulation
and analysis.
Here is the output
Age int64
Race object
Marital Status object
T_Stage object
N Stage object
6th Stage object
differentiate object
Grade object
A Stage object
Tumor Size int64
Estrogen Status object
Progesterone Status object
Regional Node Examined int64
Reginol Node Positive int64
Survival Months int64
Status object
dtype: object

data_f.info()
The data_f.info() function provides a concise summary of the DataFrame data_f,
including information about the index dtype and column dtypes, non-null values, and
memory usage. This method is useful for quickly understanding the structure of the
DataFrame, the number of non-null values in each column, and the memory usage,
which can be helpful for data cleaning and optimization.

Here is the output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4024 entries, 0 to 4023
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 4024 non-null int64
1 Race 4024 non-null object
2 Marital Status 4024 non-null object
3 T_Stage 4024 non-null object
4 N Stage 4024 non-null object

5
5 6th Stage 4024 non-null object
6 differentiate 4024 non-null object
7 Grade 4024 non-null object
8 A Stage 4024 non-null object
9 Tumor Size 4024 non-null int64
10 Estrogen Status 4024 non-null object
11 Progesterone Status 4024 non-null object
12 Regional Node Examined 4024 non-null int64
13 Reginol Node Positive 4024 non-null int64
14 Survival Months 4024 non-null int64
15 Status 4024 non-null object
dtypes: int64(5), object(11)
memory usage: 503.1+ KB

The code data_f.isnull().sum() quickly calculates the total number of missing values
in each column of the DataFrame data_f.

In short, data_f represents a DataFrame containing your dataset. Printing or displaying

data_f will show the entire DataFrame, including its rows and columns, allowing you
to visually inspect the data.

6
Visualizing

7
The above code splits the dataset x and target y into training and testing sets using
train_test_split. Then, it fits a logistic regression model (model1) to the training data
and makes predictions on the test data. The predicted labels are stored in y_pred.

 The following code and output preprocesses a dataset x containing non-numeric

data by:
Identifying non-numeric columns.
Label encoding non-numeric columns using LabelEncoder.
Scaling the entire dataset using MinMaxScaler. This prepares the data for use with
machine learning algorithms that require numeric input.
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder

# Assuming x is your dataset and contains non-numeric data

# Let's assume 'x' is a DataFrame

# Identify non-numeric columns

non_numeric_columns = x.select_dtypes(exclude=['number']).columns

# Label encode non-numeric columns

label_encoders = {}
for col in non_numeric_columns:
label_encoders[col] = LabelEncoder()
x[col] = label_encoders[col].fit_transform(x[col])

# Now, all columns should be numeric or contain numeric data

8
# You can apply MinMaxScaler
scaler = MinMaxScaler()
x_encod = scaler.fit_transform(x)

 The following code evaluates the performance of a classification model (e.g.,

logistic regression) by calculating and printing various metrics:
 It computes and prints the confusion matrix, summarizing the model's
predictions.
 It prints a classification report, including precision, recall, F1-score, and support
for each class.
 It calculates and prints the accuracy of the model.
 It calculates and prints the misclassification error.
from sklearn.metrics import confusion_matrix,accuracy_score, classification_report
conf= confusion_matrix(y_test,y_pred)
print(classification_report(y_test, y_pred))
print(conf)

accuracy = accuracy_score(y_test,y_pred,)
print('accuracy of LR IS: {:.2f}%'.format(accuracy*100))

# Calculate misclassification error

misclassification_error = 1-accuracy
print('MCE of LR IS: {:.2f}%'.format(misclassification_error*100))
Output:

The following code evaluates the performance of a logistic regression model using the
following steps:
 It computes the confusion matrix, providing a summary of the model's
predictions.
 It prints a classification report, including precision, recall, F1-score, and support
for each class.
 It calculates the accuracy of the model.
 It computes the misclassification error as 1 minus the accuracy.

9
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
modelsvm=SVC(kernel='linear', random_state=0)# Initialize the SVM classifier
modelsvm.fit(x_train,y_train)#Fit the SVM model to the training data
y_pred=modelsvm.predict(x_test)
print(y_pred)
confusion_matrix(y_test,y_pred)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

accuracysvm = accuracy_score(y_test,y_pred,)
print('accuracy of SVM IS:{:.2f}%'.format(accuracysvm*100))
# Calculate misclassification error
misclassification_error = 1-accuracysvm
print('MCE of SVM IS: {:.2f}%'.format(misclassification_error*100))

Output:

The following code and output trains a Gaussian Naive Bayes classifier
(NBclassifier1) on training data (x_train, y_train) and evaluates its performance on
test data (x_test, y_test) using the following steps:
 It initializes and trains the Gaussian Naive Bayes classifier (NBclassifier1) using
the training data.
 It predicts the class labels for the test data using the trained classifier and stores
the predictions in y_pred.
 It prints a classification report, which includes precision, recall, F1-score, and
support for each class, based on the actual and predicted labels (y_test, y_pred).
 It prints the confusion matrix, providing a summary of the model's predictions.
 It calculates and prints the accuracy of the Naive Bayes model.

10
 It calculates and prints the misclassification error.

 The following code and output trains a Support Vector Machine (SVM) classifier
with a linear kernel on training data, makes predictions on test data, and evaluates
its performance using the following steps:
 It initializes and trains the SVM classifier with a linear kernel on the training
data.
 It predicts the class labels for the test data using the trained SVM model.
 It prints the predicted class labels.
 It prints a classification report, confusion matrix, accuracy, and misclassification
error to evaluate the performance of the SVM model on the test data.

11
12

All Units Python Notes by MultiAtomsPlus
100% (1)
All Units Python Notes by MultiAtomsPlus
119 pages
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
No ratings yet
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
17 pages
Breast Cancer Dataset
No ratings yet
Breast Cancer Dataset
154 pages
Clustering On Breast Cancer Wisconsin
No ratings yet
Clustering On Breast Cancer Wisconsin
7 pages
BT Segmentation
No ratings yet
BT Segmentation
25 pages
Time Allowed: 3 Hours Class: XII Max. Marks: 70: Informatics Practices
No ratings yet
Time Allowed: 3 Hours Class: XII Max. Marks: 70: Informatics Practices
270 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
Pandas
No ratings yet
Pandas
167 pages
Machine Learning Data Analysis
No ratings yet
Machine Learning Data Analysis
21 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
CH1 Introduction To Information Assurance and Security
No ratings yet
CH1 Introduction To Information Assurance and Security
37 pages
ML Healthcare Clean APA Final
No ratings yet
ML Healthcare Clean APA Final
9 pages
Breast Cancer Classification Using Python
No ratings yet
Breast Cancer Classification Using Python
26 pages
Python Final Project Group 03
No ratings yet
Python Final Project Group 03
18 pages
Haberman Data Set Ed A
No ratings yet
Haberman Data Set Ed A
10 pages
Complete Roadmap To Learn Data Science in 2 Months - by Data Analytics - Medium
No ratings yet
Complete Roadmap To Learn Data Science in 2 Months - by Data Analytics - Medium
12 pages
Project Final
No ratings yet
Project Final
15 pages
Final Report
No ratings yet
Final Report
26 pages
1.fundamentals of 1D Visualization
No ratings yet
1.fundamentals of 1D Visualization
246 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
Support Vector Machines Com Python
No ratings yet
Support Vector Machines Com Python
13 pages
Analysis of Impact of Principal Component Analysis and Feature Selection For Detection of Breast Cancer Using Machine Learning Algorithms
No ratings yet
Analysis of Impact of Principal Component Analysis and Feature Selection For Detection of Breast Cancer Using Machine Learning Algorithms
26 pages
Healthcare - Chatbot Report
No ratings yet
Healthcare - Chatbot Report
44 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Assignment Bigdata
No ratings yet
Assignment Bigdata
17 pages
Breast Cancer Survival Prediction With Machine Learning
No ratings yet
Breast Cancer Survival Prediction With Machine Learning
12 pages
Mental Illness Prediction Using Deep Learning
No ratings yet
Mental Illness Prediction Using Deep Learning
58 pages
SampleReport INSE6220
No ratings yet
SampleReport INSE6220
8 pages
Data-Scientist-Train - 20240404112448 - 160 DELHI NCR DETA
No ratings yet
Data-Scientist-Train - 20240404112448 - 160 DELHI NCR DETA
33 pages
Breastcancer Research
No ratings yet
Breastcancer Research
9 pages
Credit Card Default Prediction: Final Project Report
No ratings yet
Credit Card Default Prediction: Final Project Report
28 pages
Support Vector Machine (SVM) - Bioinformatics
No ratings yet
Support Vector Machine (SVM) - Bioinformatics
10 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
DSML PROJECT REPORt Harshit
No ratings yet
DSML PROJECT REPORt Harshit
6 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Hw0 Programming Handout 4TbRRB6IAl
No ratings yet
Hw0 Programming Handout 4TbRRB6IAl
2 pages
IT BSC Curriculum 2013 Revised
100% (1)
IT BSC Curriculum 2013 Revised
172 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
Chapter 3
No ratings yet
Chapter 3
120 pages
BR Inel
No ratings yet
BR Inel
11 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Breast Cancer Detection Using Machine Learning Algorithms: Abstract
No ratings yet
Breast Cancer Detection Using Machine Learning Algorithms: Abstract
5 pages
Yashas RajuIP Practical File
No ratings yet
Yashas RajuIP Practical File
36 pages
Breast Cancer Project Analysis Report
No ratings yet
Breast Cancer Project Analysis Report
4 pages
BR Inel
No ratings yet
BR Inel
11 pages
Internship Project Ppt-1
No ratings yet
Internship Project Ppt-1
23 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Mod 5
No ratings yet
Mod 5
61 pages
Bellaachia PDF
No ratings yet
Bellaachia PDF
4 pages
Data Science
No ratings yet
Data Science
8 pages
SRMS Monu Kumar Project Report
No ratings yet
SRMS Monu Kumar Project Report
71 pages
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
No ratings yet
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
27 pages
Python Viva Questions With Answers
No ratings yet
Python Viva Questions With Answers
45 pages
Class 12 Syllabus-Timetable-1
No ratings yet
Class 12 Syllabus-Timetable-1
3 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
14 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
A Comparative Analysis and Predicting For Breast Cancer Detection Based On Data Mining Models
No ratings yet
A Comparative Analysis and Predicting For Breast Cancer Detection Based On Data Mining Models
15 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Final Big Data
No ratings yet
Final Big Data
23 pages
TGDG 2018 Introduction To Python Adrian Martinez November 2018
No ratings yet
TGDG 2018 Introduction To Python Adrian Martinez November 2018
64 pages
1599311465islam2020 Article BreastCancerPredictionACompara
No ratings yet
1599311465islam2020 Article BreastCancerPredictionACompara
14 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
Grdjev06i010003 PDF
No ratings yet
Grdjev06i010003 PDF
4 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
MAD Ch4
No ratings yet
MAD Ch4
20 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
Parikshit TASK 1
No ratings yet
Parikshit TASK 1
15 pages
Gangadhara Kubureddy - 5 Year(s)
No ratings yet
Gangadhara Kubureddy - 5 Year(s)
4 pages
Rudra Aiml 1.4
No ratings yet
Rudra Aiml 1.4
4 pages
MAD Ch5
No ratings yet
MAD Ch5
13 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
Descriptive Analytics2.Ipynb - Colab
No ratings yet
Descriptive Analytics2.Ipynb - Colab
9 pages
Comparison of Decision Tree Methods For Breast Cancer Diagnosis
No ratings yet
Comparison of Decision Tree Methods For Breast Cancer Diagnosis
7 pages
1
No ratings yet
1
6 pages
DSBDA Mini Project Report
No ratings yet
DSBDA Mini Project Report
9 pages
IP - Class XII - Question Paper - Pre Board I (Offline) Examination
No ratings yet
IP - Class XII - Question Paper - Pre Board I (Offline) Examination
8 pages
Assignment Instructions:: Import As
No ratings yet
Assignment Instructions:: Import As
1 page
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Exp3 2
No ratings yet
Exp3 2
5 pages
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
No ratings yet
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
6 pages
Xi Ai PT Ii QP
No ratings yet
Xi Ai PT Ii QP
2 pages
Documentation
No ratings yet
Documentation
7 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
E-Commerce Platform
No ratings yet
E-Commerce Platform
1 page
Detection of Breast Cancer Using Data Mining Tool WEKA PDF
No ratings yet
Detection of Breast Cancer Using Data Mining Tool WEKA PDF
5 pages
Web Based Student Union Voting System For Arbaminch University
No ratings yet
Web Based Student Union Voting System For Arbaminch University
1 page
Data Science Course Outline CES LUMS
No ratings yet
Data Science Course Outline CES LUMS
4 pages
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
No ratings yet
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
3 pages
4 Exploratory Data Analysis.
No ratings yet
4 Exploratory Data Analysis.
1 page
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet

Breast Cancer Diagnosis Using Machine Learning Alg

Uploaded by

Breast Cancer Diagnosis Using Machine Learning Alg

Uploaded by

ARBAMINCH UNIVERSITY

FACULITY OF COMPUTING AND SOFTWARE ENGINEERING

Artificial Intelligence assignment

Description: Age of the patient at the time of diagnosis.

Data Type: Continuous numerical.

Description: Ethnicity or racial background of the patient.

Data Type: Categorical (e.g., White, Black, Asian, Hispanic, etc.).

Description: Marital status of the patient at the time of diagnosis.

Data Type: Categorical (e.g., Single, Married, Divorced, Widowed, etc.).

Data Type: Categorical or ordinal (e.g., T1, T2, T3, T4).

Data Type: Categorical or ordinal (e.g., N0, N1, N2, N3).

Description: Histological grade or degree of tumor differentiation, indicating how

Data Type: Categorical or ordinal (e.g., Well-differentiated, Moderately-

Description: Histological grade of the tumor, reflecting the aggressiveness and

Data Type: Categorical or ordinal (e.g., Grade 1, Grade 2, Grade 3).

Description: Cancer stage according to the American Joint Committee on Cancer

Description: Size of the primary tumor, typically measured in millimeters.

Data Type: Continuous numerical.

Data Type: Categorical (e.g., Positive, Negative, Unknown).

Regional Node Examined:

Description: Number of regional lymph nodes examined during surgery or biopsy.

Data Type: Continuous numerical.

Regional Node Positive:

Description: Number of regional lymph nodes positive for cancer cells.

Data Type: Continuous numerical.

Description: Duration of survival in months following the diagnosis of breast cancer.

Data Type: Continuous numerical.

Data Type: Categorical (e.g., Alive, Dead).

In the above code we have imported the necessary libraries including

numpy : used for numerical computing in Python.

Pandas : used for data manipulation and analysis in Python.

sklearn.model_selection import train_test_split: Imports the train_test_split

Here is the output

Here is the output

In short, data_f represents a DataFrame containing your dataset. Printing or displaying

 The following code and output preprocesses a dataset x containing non-numeric

# Assuming x is your dataset and contains non-numeric data

# Identify non-numeric columns

# Label encode non-numeric columns

# Now, all columns should be numeric or contain numeric data

 The following code evaluates the performance of a classification model (e.g.,

# Calculate misclassification error

You might also like