0% found this document useful (0 votes)

66 views16 pages

MLA Lab 6:-Implementation of Decision Tree

The document is a student's lab report on implementing a decision tree algorithm. It includes: 1) An overview of decision trees, how they work, and key concepts like splitting criteria, tree construction, and pruning. 2) The student loads and explores a car evaluation dataset, then splits it into train and test sets. 3) Categorical features are encoded before training decision trees with gini impurity and entropy splitting criteria. 4) The accuracy is reported on both training and test sets, and the trees are visualized.

Uploaded by

tushar3patil03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views16 pages

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

tushar3patil03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

MLA Lab 6:- Implementation of Decision Tree

Name : Tushar Patil

Roll no: A254
Batch : B

Theory:-
Decision trees are a popular machine learning algorithm used for both
classification and regression tasks. They operate by recursively partitioning the
input space into regions, with each partition corresponding to a decision based
on the values of input features. Here's a concise overview:

Splitting Criteria: Decision trees make decisions based on splitting criteria, such
as Gini impurity for classification or mean squared error for regression. These
criteria quantify the impurity or uncertainty in a dataset.

Tree Construction: Decision trees are constructed recursively. At each step,

the algorithm selects the best feature and corresponding threshold to split the
data into two or more subsets. This process continues until a stopping criterion
is met, such as reaching a maximum depth or minimum number of samples in
a node.

Pruning: Decision trees can suffer from overfitting, especially when they grow
too deep. Pruning techniques help to prevent overfitting by removing nodes
that do not significantly improve the tree's performance on a validation set.

Tree Interpretability: One of the main advantages of decision trees is their

interpretability. The resulting tree structure can be easily visualized and
understood, making it valuable for explaining the decision-making process to
stakeholders.
Handling Categorical Features: Decision trees naturally handle categorical
features by splitting them into distinct categories. Some implementations may
require encoding categorical features into numerical values.

Ensemble Methods: Decision trees can be combined into ensemble methods

like Random Forests or Gradient Boosted Trees, which often result in improved
performance by aggregating the predictions of multiple trees.

Scalability: While decision trees are efficient for small to medium-sized

datasets, they may not scale well to very large datasets due to their
computational complexity.

Handling Missing Values: Decision trees can handle missing values by either
ignoring them during the splitting process or imputing them based on certain
criteria.

Handling Imbalanced Classes: Decision trees can be biased towards the

majority class in imbalanced datasets. Techniques such as class weights or
resampling can be employed to mitigate this issue.

Hyperparameter Tuning: Decision trees have hyperparameters that can be

tuned to optimize performance, such as maximum depth, minimum samples
per leaf, and maximum features considered for splitting.

Code(Python):-
"""MLA_lAB6_a254ipynb

Automatically generated by Colaboratory.

Original file is located at

https://colab.research.google.com/drive/1rd0tGaEJq0VTrq4QvnCeWS-tpCzqOi9B
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

import warnings

warnings.filterwarnings('ignore')

"""# 8. Import dataset <a class="anchor" id="8"></a>

[Table of Contents](#0.1)
"""

data = '/content/car_evaluation.csv'

df = pd.read_csv(data, header=None)

"""# 9. Exploratory data analysis <a class="anchor" id="9"></a>

[Table of Contents](#0.1)

Now, I will explore the data to gain insights about the data.
"""

df.shape

"""We can see that there are 1728 instances and 7 variables in the data
set."""

df.head()

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

'class']

df.columns = col_names

col_names

# let's again preview the dataset

df.head()

"""We can see that the column names are renamed. Now, the columns have
meaningful names."""

df.info()

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

'class']

for col in col_names:

print(df[col].value_counts())

"""We can see that the `doors` and `persons` are categorical in nature. So, I
will treat them as categorical variables.

### Explore `class` variable

"""

df['class'].value_counts()

df.isnull().sum()

X = df.drop(['class'], axis=1)

y = df['class']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33,

random_state = 42)

X_train.shape, X_test.shape

X_train.dtypes

X_train.head()

# import category encoders

!pip install category_encoders
import category_encoders as ce

!pip install category_encoders

encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons',

'lug_boot', 'safety'])
X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test)

X_train.head()

X_test.head()

# import DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

# instantiate the DecisionTreeClassifier model with criterion gini index

clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3,

random_state=0)

# fit the model

clf_gini.fit(X_train, y_train)

y_pred_gini = clf_gini.predict(X_test)

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion gini index: {0:0.4f}'.

format(accuracy_score(y_test, y_pred_gini)))

y_pred_train_gini = clf_gini.predict(X_train)

y_pred_train_gini

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

y_pred_train_gini)))

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_gini.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_gini.score(X_test, y_test)))

plt.figure(figsize=(12,8))

from sklearn import tree

tree.plot_tree(clf_gini.fit(X_train, y_train))
import graphviz
dot_data = tree.export_graphviz(clf_gini, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)

graph = graphviz.Source(dot_data)

graph

# instantiate the DecisionTreeClassifier model with criterion entropy

clf_en = DecisionTreeClassifier(criterion='entropy', max_depth=3,

random_state=0)

# fit the model

clf_en.fit(X_train, y_train)

y_pred_en = clf_en.predict(X_test)

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion entropy: {0:0.4f}'.

format(accuracy_score(y_test, y_pred_en)))

y_pred_train_en = clf_en.predict(X_train)

y_pred_train_en

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

y_pred_train_en)))

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_en.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_en.score(X_test, y_test)))

plt.figure(figsize=(12,8))

from sklearn import tree

tree.plot_tree(clf_en.fit(X_train, y_train))

import graphviz
dot_data = tree.export_graphviz(clf_en, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)

graph = graphviz.Source(dot_data)

graph

# Print the Confusion Matrix and slice it into four pieces

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred_en)

print('Confusion matrix\n\n', cm)

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred_en))

OP:-

Previewing dataset:---
Train.head
Test.head:-

10 COOKERY Q2M2 Tle10 - He - Cookery - q2 - Mod2 - Preparingvegetabledishes - v3 (70 Pages)
93% (15)
10 COOKERY Q2M2 Tle10 - He - Cookery - q2 - Mod2 - Preparingvegetabledishes - v3 (70 Pages)
71 pages
MLT - Lab - Manual FINAL
No ratings yet
MLT - Lab - Manual FINAL
38 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
Minor Project
No ratings yet
Minor Project
21 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
ML Exp8 C36
No ratings yet
ML Exp8 C36
18 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Tensor Flow and Keras Sample Programs
No ratings yet
Tensor Flow and Keras Sample Programs
22 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Team 5
No ratings yet
Team 5
12 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Lab 2
No ratings yet
Lab 2
17 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Presentation On Decision Trees
No ratings yet
Presentation On Decision Trees
12 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Da Programs
No ratings yet
Da Programs
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Prac 6
No ratings yet
Prac 6
6 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
AIH Lab2
No ratings yet
AIH Lab2
10 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
1.load and Explore CSV and Excel Using Pandas
No ratings yet
1.load and Explore CSV and Excel Using Pandas
5 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Lab 4 - Logistic Regression - KNN - Notes
No ratings yet
Lab 4 - Logistic Regression - KNN - Notes
6 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
AI Assignment-6
No ratings yet
AI Assignment-6
7 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
ML Functions
No ratings yet
ML Functions
12 pages
Decision Tree
No ratings yet
Decision Tree
4 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Mathematics - Grade 9 Worksheet - Algebraic Expressions
No ratings yet
Mathematics - Grade 9 Worksheet - Algebraic Expressions
9 pages
Chapter 1 - Introduction To Project Management
No ratings yet
Chapter 1 - Introduction To Project Management
10 pages
F170a.23 LMLD
No ratings yet
F170a.23 LMLD
70 pages
Mic Project
No ratings yet
Mic Project
15 pages
Component 3 - Curriculum Experiences
No ratings yet
Component 3 - Curriculum Experiences
3 pages
Subject: Elements of Civil Engineering Topic: Surveying
No ratings yet
Subject: Elements of Civil Engineering Topic: Surveying
43 pages
Converting Common Units of Mass Measure KG and Grams
No ratings yet
Converting Common Units of Mass Measure KG and Grams
7 pages
Concrete Masonry Report
No ratings yet
Concrete Masonry Report
21 pages
Unit 1 QS & SPW I
No ratings yet
Unit 1 QS & SPW I
13 pages
Se3,5 120
No ratings yet
Se3,5 120
19 pages
English Assignment 1
No ratings yet
English Assignment 1
30 pages
IHRM Notes UNIT 2 MBA Batch 2022-24 Semester 4
No ratings yet
IHRM Notes UNIT 2 MBA Batch 2022-24 Semester 4
17 pages
Master Akmal Sept
No ratings yet
Master Akmal Sept
6 pages
Conference Book of Abstracts - I-CMME 2022
No ratings yet
Conference Book of Abstracts - I-CMME 2022
139 pages
A320 NEO Limitation + Auto Flight
No ratings yet
A320 NEO Limitation + Auto Flight
26 pages
anization+Theory+and+Public+Administration Final
No ratings yet
anization+Theory+and+Public+Administration Final
27 pages
Project Control Manager
No ratings yet
Project Control Manager
3 pages
Reviewed Essay Retailing GD Goenka
No ratings yet
Reviewed Essay Retailing GD Goenka
46 pages
The Flip Side
No ratings yet
The Flip Side
4 pages
33 Chemistry in Everyday Life Formula Sheets
No ratings yet
33 Chemistry in Everyday Life Formula Sheets
4 pages
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
No ratings yet
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
2 pages
Integrity Without It Nothing Works.
No ratings yet
Integrity Without It Nothing Works.
7 pages
Maths
No ratings yet
Maths
5 pages
Guide To Road
No ratings yet
Guide To Road
332 pages
Internship Report - Shubham
No ratings yet
Internship Report - Shubham
60 pages
Series BNS-B20 Safety Door-Handle Switch With Integrated Safety Sensor
No ratings yet
Series BNS-B20 Safety Door-Handle Switch With Integrated Safety Sensor
2 pages
Class 10 English Solutions VP2
No ratings yet
Class 10 English Solutions VP2
3 pages
Nat An Skigin: Nskigin@nd - Edu
No ratings yet
Nat An Skigin: Nskigin@nd - Edu
4 pages
Revision Index: Data Sheet
No ratings yet
Revision Index: Data Sheet
6 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

MLA Lab 6:- Implementation of Decision Tree

Name : Tushar Patil

Tree Construction: Decision trees are constructed recursively. At each step,

Tree Interpretability: One of the main advantages of decision trees is their

Ensemble Methods: Decision trees can be combined into ensemble methods

Scalability: While decision trees are efficient for small to medium-sized

Handling Imbalanced Classes: Decision trees can be biased towards the

Hyperparameter Tuning: Decision trees have hyperparameters that can be

Automatically generated by Colaboratory.

Original file is located at

"""# **8. Import dataset** <a class="anchor" id="8"></a>

"""# **9. Exploratory data analysis** <a class="anchor" id="9"></a>

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

# let's again preview the dataset

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

for col in col_names:

### Explore `class` variable

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33,

# import category encoders

!pip install category_encoders

encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons',

from sklearn.tree import DecisionTreeClassifier

# instantiate the DecisionTreeClassifier model with criterion gini index

clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3,

# fit the model

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion gini index: {0:0.4f}'.

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_gini.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_gini.score(X_test, y_test)))

from sklearn import tree

# instantiate the DecisionTreeClassifier model with criterion entropy

clf_en = DecisionTreeClassifier(criterion='entropy', max_depth=3,

# fit the model

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion entropy: {0:0.4f}'.

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_en.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_en.score(X_test, y_test)))

from sklearn import tree

# Print the Confusion Matrix and slice it into four pieces

from sklearn.metrics import confusion_matrix

print('Confusion matrix\n\n', cm)

from sklearn.metrics import classification_report

You might also like

"""# 8. Import dataset <a class="anchor" id="8"></a>

"""# 9. Exploratory data analysis <a class="anchor" id="9"></a>