0% found this document useful (0 votes)
7 views

Decision Tree Code Explanation

The document explains the process of implementing a Decision Tree Classifier using the breast cancer dataset. It covers importing necessary libraries, loading the dataset, splitting the data into training and testing sets, training the model, making predictions, calculating accuracy, testing on a single sample, and visualizing the decision tree. The decision tree operates by asking a series of yes/no questions based on tumor characteristics to classify samples as malignant or benign.

Uploaded by

prajwalcg3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Decision Tree Code Explanation

The document explains the process of implementing a Decision Tree Classifier using the breast cancer dataset. It covers importing necessary libraries, loading the dataset, splitting the data into training and testing sets, training the model, making predictions, calculating accuracy, testing on a single sample, and visualizing the decision tree. The decision tree operates by asking a series of yes/no questions based on tumor characteristics to classify samples as malignant or benign.

Uploaded by

prajwalcg3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Decision Tree Code Explanation

1. Importing Libraries

python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree

What it does: These lines import all the tools we need:

numpy (as np): For working with arrays and numerical data

matplotlib.pyplot (as plt): For creating graphs and visualizations

load_breast_cancer : A built-in dataset about breast cancer cases

train_test_split : Splits data into training and testing portions

DecisionTreeClassifier : The machine learning algorithm we'll use

accuracy_score : Measures how well our model performs

tree : Helps us visualize the decision tree

2. Loading the Dataset

python

data = load_breast_cancer()
X = data.data
y = data.target

What it does:

data = load_breast_cancer() : Loads the breast cancer dataset (569 samples with 30 features each)

X = data.data : Gets the input features (measurements like tumor size, texture, etc.)

y = data.target : Gets the labels (0 = malignant, 1 = benign)

3. Splitting the Data


python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

What it does:

Splits the data into training (80%) and testing (20%) sets

Training data: Used to teach the model


Testing data: Used to evaluate how well the model learned
random_state=42 : Ensures we get the same split every time (reproducibility)

4. Creating and Training the Model

python

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

What it does:

clf = DecisionTreeClassifier() : Creates a decision tree classifier object

clf.fit(X_train, y_train) : Trains the model using the training data

The model learns patterns by asking questions like "Is the tumor radius > 15?" and creating a tree of
decisions

5. Making Predictions

python

y_pred = clf.predict(X_test)

What it does:

Uses the trained model to predict outcomes for the test data

y_pred contains the model's guesses (0 or 1) for each test sample

6. Calculating Accuracy

python

accuracy = accuracy_score(y_test, y_pred)


print(f"Model Accuracy: {accuracy * 100:.2f}%")

What it does:
Compares the model's predictions ( y_pred ) with the actual answers ( y_test )

Calculates what percentage the model got right

Prints the accuracy as a percentage (e.g., "Model Accuracy: 93.86%")

7. Testing on a Single Sample

python

new_sample = np.array([X_test[0]])
prediction = clf.predict(new_sample)
prediction_class = "Benign" if prediction == 1 else "Malignant"
print(f"Predicted Class for the new sample: {prediction_class}")

What it does:

Takes the first sample from the test set

Makes a prediction for just this one sample


Converts the numerical prediction (0 or 1) to a readable label:
1 = "Benign" (not cancerous)
0 = "Malignant" (cancerous)

Prints the result

8. Visualizing the Decision Tree

python

plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names, class_names=data.target_name
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()

 

What it does:

Creates a large figure (12x8 inches)

Draws the decision tree with:


filled=True : Colors the nodes based on the majority class

feature_names : Shows actual feature names instead of numbers

class_names : Shows "malignant" and "benign" instead of 0 and 1

Adds a title and displays the visualization

How the Decision Tree Works


The decision tree makes predictions by asking a series of yes/no questions about the tumor
characteristics. For example:

1. "Is the mean radius ≤ 16.8?"


If yes → go left branch

If no → go right branch

2. Continue asking questions until reaching a final decision (leaf node)

Each path from top to bottom represents a different rule for classification, making the model
interpretable and easy to understand!

You might also like