0% found this document useful (0 votes)

21 views7 pages

Decision Trees

The document explains the concept of Decision Trees, a tree-structured classifier that predicts target variables using feature values. It outlines the process of building a Decision Tree, including splitting data, selecting the best split using Gini index and Entropy, and implementing a basic classifier from scratch. Additionally, it provides Python code for calculating entropy, information gain, and constructing a Decision Tree classifier.

Uploaded by

రమ్యకిరణ్ మలిశెట్టి

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views7 pages

Decision Trees

Uploaded by

రమ్యకిరణ్ మలిశెట్టి

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Decision Trees from Scratch

A Decision Tree is a tree-structured classifier that uses a set of rules

to predict a target variable. The tree consists of nodes where
decisions are made based on feature values, leading to predictions at
the leaf nodes.

Process:
 Splitting
 Selecting the best Split (Gini index - impurity)
 Pure node, Max depth

Implementation:
1. Implement a basic Decision Tree Classifier from scratch.
2. Use Entropy (disorder) and Information Gain (variance) to
determine the best feature to split on.

Class A: 30
Class B: 20
Total: 50
PA = 30/50
PB = 20/50

Gini Index: G = 1−∑ p 2i

i =1

Entropy: E = -∑ pi log2 ⁡(p i)

i=1
n
ni
Information Gain: IG = E p −¿ ∑ E
i=1 n c

import math

# Helper functions
def calculate_entropy(data):
"""
Calculate the entropy of a dataset.
data: List of target labels
"""
total = len(data)
if total == 0:
return 0
counts = {}
for label in data:
counts[label] = counts.get(label, 0) + 1
entropy = 0
for count in counts.values():
prob = count / total
entropy -= prob * math.log2(prob)
return entropy
def split_data(dataset, feature_index):
"""
Split the dataset based on a feature.
dataset: List of lists where each inner list is a data point
feature_index: Index of the feature to split on
"""
splits = {}
for row in dataset:
key = row[feature_index]
if key not in splits:
splits[key] = []
splits[key].append(row)
return splits

def calculate_information_gain(dataset, feature_index, target_index):

"""
Calculate the Information Gain for splitting on a specific feature.
dataset: List of lists where each inner list is a data point
feature_index: Index of the feature to split on
target_index: Index of the target variable
"""
total_entropy = calculate_entropy([row[target_index] for row in
dataset])
splits = split_data(dataset, feature_index)
total_samples = len(dataset)

weighted_entropy = 0
for subset in splits.values():
prob = len(subset) / total_samples
subset_entropy = calculate_entropy([row[target_index] for row
in subset])
weighted_entropy += prob * subset_entropy

information_gain = total_entropy - weighted_entropy

return information_gain

# Decision Tree Classifier

class DecisionTree:
def __init__(self, max_depth=None):
self.max_depth = max_depth
self.tree = None

def fit(self, dataset, features, target_index):

"""
Build the decision tree.
dataset: List of lists (rows of data)
features: List of feature names
target_index: Index of the target variable
"""
self.tree = self._build_tree(dataset, features, target_index,
depth=0)

def _build_tree(self, dataset, features, target_index, depth):

# Check stopping criteria
target_values = [row[target_index] for row in dataset]
if len(set(target_values)) == 1: # Pure node
return target_values[0]
if not features or (self.max_depth is not None and depth >=
self.max_depth): # No features or max depth
return max(set(target_values), key=target_values.count)

# Find the best feature to split

best_feature_index = -1
best_gain = -float('inf')
for i in range(len(features)):
gain = calculate_information_gain(dataset, i, target_index)
if gain > best_gain:
best_gain = gain
best_feature_index = i

if best_gain == 0: # No further splits

return max(set(target_values), key=target_values.count)
# Split dataset
best_feature = features[best_feature_index]
splits = split_data(dataset, best_feature_index)
subtree = {}
remaining_features = features[:best_feature_index] +
features[best_feature_index + 1:]

for value, subset in splits.items():

subtree[value] = self._build_tree(subset, remaining_features,
target_index, depth + 1)

return {best_feature: subtree}

def predict(self, row):

"""
Predict the class label for a single data point.
row: List of feature values
"""
node = self.tree
while isinstance(node, dict):
feature = list(node.keys())[0]
value = row[feature]
node = node[feature].get(value, None)
if node is None:
return None
return node

# Example Usage
dataset = [
['Sunny', 'Hot', 'High', 'No'],
['Sunny', 'Hot', 'High', 'No'],
['Overcast', 'Hot', 'High', 'Yes'],
['Rainy', 'Mild', 'High', 'Yes'],
['Rainy', 'Cool', 'Normal', 'Yes'],
['Rainy', 'Cool', 'Normal', 'No'],
['Overcast', 'Cool', 'Normal', 'Yes'],
['Sunny', 'Mild', 'High', 'No'],
['Sunny', 'Cool', 'Normal', 'Yes'],
['Rainy', 'Mild', 'Normal', 'Yes']
]

features = ['Outlook', 'Temperature', 'Humidity']

target_index = 3

tree = DecisionTree(max_depth=3)
tree.fit(dataset, features, target_index)
print(tree.tree)

TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
No ratings yet
TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
121 pages
Krage Et Al, 2020
No ratings yet
Krage Et Al, 2020
17 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
221IT027 DA Lab3
No ratings yet
221IT027 DA Lab3
5 pages
Da Lab3 221it084 Final
No ratings yet
Da Lab3 221it084 Final
6 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
Lab 3
No ratings yet
Lab 3
7 pages
ASSESSMENT2
No ratings yet
ASSESSMENT2
22 pages
Pramkk
No ratings yet
Pramkk
10 pages
Assignment:2 Topic: Decision Tree in Python Submitted To:prof - Taj Submitted By:Ayesha Akram Class:BSIT 6 (SS1) Roll No:BSIT51F22S038 Code
No ratings yet
Assignment:2 Topic: Decision Tree in Python Submitted To:prof - Taj Submitted By:Ayesha Akram Class:BSIT 6 (SS1) Roll No:BSIT51F22S038 Code
7 pages
Da Lab3 221it064
No ratings yet
Da Lab3 221it064
6 pages
ML 5
No ratings yet
ML 5
2 pages
ASSESSMENT2
No ratings yet
ASSESSMENT2
22 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
Lab Program 3
No ratings yet
Lab Program 3
6 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Lab Manual2
No ratings yet
Lab Manual2
6 pages
Rabia Malik (s0001)
No ratings yet
Rabia Malik (s0001)
5 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
DT RF
No ratings yet
DT RF
64 pages
ML Lab
No ratings yet
ML Lab
7 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab
No ratings yet
Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab
9 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
AI Report 4
No ratings yet
AI Report 4
6 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
01 Section 6.2.1 QR Code Content
No ratings yet
01 Section 6.2.1 QR Code Content
5 pages
Ai Int-1
No ratings yet
Ai Int-1
6 pages
ML Functions
No ratings yet
ML Functions
12 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
MANUAL
No ratings yet
MANUAL
33 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
22MCA1008 - Varun ML LAB ASSIGNMENTS
100% (1)
22MCA1008 - Varun ML LAB ASSIGNMENTS
41 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
ML Exp 3
No ratings yet
ML Exp 3
6 pages
ML 4
No ratings yet
ML 4
5 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
ML File
No ratings yet
ML File
13 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
PRGM 4
No ratings yet
PRGM 4
3 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Lab Program 3
No ratings yet
Lab Program 3
6 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Hci Lab2 1
No ratings yet
Hci Lab2 1
4 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Alishba (S005)
No ratings yet
Alishba (S005)
5 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Vogelsang ETEP-Journal Detection of Electrical Tree Propagation by Partial Discharge Measurements
No ratings yet
Vogelsang ETEP-Journal Detection of Electrical Tree Propagation by Partial Discharge Measurements
7 pages
Matlab Programmming Previous Papers
No ratings yet
Matlab Programmming Previous Papers
4 pages
Mathematics - A Course in Fluid Mechanics With Vector Field
No ratings yet
Mathematics - A Course in Fluid Mechanics With Vector Field
198 pages
PHY210 CHAPTER 5 - THERMAL PHYSICS Students PDF
No ratings yet
PHY210 CHAPTER 5 - THERMAL PHYSICS Students PDF
34 pages
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
No ratings yet
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
8 pages
Imd 95-280y Im 0322
No ratings yet
Imd 95-280y Im 0322
6 pages
Module 3
No ratings yet
Module 3
5 pages
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
No ratings yet
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
2 pages
Muhammad-Junaid Khan Opt
No ratings yet
Muhammad-Junaid Khan Opt
3 pages
Probability Inequalities: 15.1. Boole's Inequality, Bonferroni Inequalities
No ratings yet
Probability Inequalities: 15.1. Boole's Inequality, Bonferroni Inequalities
14 pages
How Find Out How Many Numbers in A Minitab Column Are in A Given Range
No ratings yet
How Find Out How Many Numbers in A Minitab Column Are in A Given Range
2 pages
TEXA Axone Nemo Specs
No ratings yet
TEXA Axone Nemo Specs
36 pages
17 GEOG245 Tutorial9 PDF
No ratings yet
17 GEOG245 Tutorial9 PDF
7 pages
Cells and Tissues
No ratings yet
Cells and Tissues
5 pages
Automata Theory: José Anastacio Hernández Saldaña
No ratings yet
Automata Theory: José Anastacio Hernández Saldaña
4 pages
什么是自动分配ip地址的服务器？
100% (1)
什么是自动分配ip地址的服务器？
6 pages
MS Access II PDF
No ratings yet
MS Access II PDF
44 pages
Barber Colman
No ratings yet
Barber Colman
61 pages
School of Mechanical Engineering MEE437 Operations Research - FS 2016-17 - PBL Faculty: Siva Prasad Darla Project Based Learning Course
No ratings yet
School of Mechanical Engineering MEE437 Operations Research - FS 2016-17 - PBL Faculty: Siva Prasad Darla Project Based Learning Course
5 pages
BOP Control System BC0114001A
No ratings yet
BOP Control System BC0114001A
2 pages
MS5105 Module Outline 2022-2023
No ratings yet
MS5105 Module Outline 2022-2023
4 pages
Excavador 330 BL Shematic System Electrical
No ratings yet
Excavador 330 BL Shematic System Electrical
11 pages
ECF1100 Individual Assignment 2 Sem 1 2022
No ratings yet
ECF1100 Individual Assignment 2 Sem 1 2022
4 pages
Verilog Paractice Assignments
No ratings yet
Verilog Paractice Assignments
3 pages
Egyptian Numbers
No ratings yet
Egyptian Numbers
3 pages
Positouch DBF Files 2
No ratings yet
Positouch DBF Files 2
68 pages
Carry Out Mensuration and Calculation
No ratings yet
Carry Out Mensuration and Calculation
31 pages
Machine Design-Ii: Gears
100% (1)
Machine Design-Ii: Gears
50 pages

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Decision Trees from Scratch

A Decision Tree is a tree-structured classifier that uses a set of rules

Gini Index: G = 1−∑ p 2i

Entropy: E = -∑ pi log2 ⁡(p i)

def calculate_information_gain(dataset, feature_index, target_index):

information_gain = total_entropy - weighted_entropy

# Decision Tree Classifier

def fit(self, dataset, features, target_index):

def _build_tree(self, dataset, features, target_index, depth):

# Find the best feature to split

if best_gain == 0: # No further splits

for value, subset in splits.items():

return {best_feature: subtree}

def predict(self, row):

features = ['Outlook', 'Temperature', 'Humidity']

You might also like