Class Assignment On Decision Trees

Uploaded by

mohammed.ansari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Class Assignment On Decision Trees

Uploaded by

mohammed.ansari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Class Assignment on Decision Trees

Name: Ansari Mohammed Shanouf Valijan

Class: B.E. Computer Engineering, Semester - VII
UID: 2021300004
Batch: Monday

Aim:
To implement decision trees for regression analysis on a healthcare dataset.

Dataset Description:
Here, in order to construct the decision tree, the Body Mass Index Detection dataset was
utilized.
(https://www.kaggle.com/datasets/sayanroy058/body-mass-index-detection)

The idea was to predict the BMI of a person given his/her age, weight, bio-impudence and
gender. The dataset has about 741 records.

Implementation:
Following is a step-by-step implementation of the task at hand-
Link to Notebook -> DecisionTreeRegression

Importing the necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
import seaborn as sns

Importing the dataset

df = pd.read_csv('/content/Body Mass Index.csv')

Dropping irrelevant columns and encoding the categorical columns

df = df.drop(columns=['BmiClass'])
label_encoder = LabelEncoder()

df['Gender_encoded'] = label_encoder.fit_transform(df['Gender'])
df = df.drop(columns=['Gender'])

Visualizing the various features of the dataset to better understand it

numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

for col in numeric_columns:

plt.figure(figsize=(8, 4))
sns.histplot(df[col], kde=True, bins=30)
plt.title(f'Distribution of {col}')
plt.show()

categorical_columns = df.select_dtypes(include=['object']).columns

for col in categorical_columns:

plt.figure(figsize=(8, 4))
sns.countplot(data=df, x=col)
plt.title(f'Count of {col}')
plt.show()
Viewing the correlation among different features present in the dataset
corr_matrix = df.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()
The above plot clearly depicts a high dependence of BMI on weight, which is quite logical.
Further, height shows a correlation almost half as strong as weight, still an important factor
to take into consideration. Age seems to have the least positive correlation with the BMI.

Viewing pair-wise plots

sns.pairplot(df, hue='Bmi')
plt.show()

In the above plots, darker hues (purple in colour) depict higher BMI values and as can be
observed, almost all features with values towards higher end are pointing towards a high
BMI value. An exception to this is the Bio Impudence v/s Height plot where high BMI values
seem to be scattered.

Splitting the processed and analysed dataset into train and test sets
X = df.drop(columns='Bmi')
y = df['Bmi']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Defining the decision tree regressor model and training it (parameters were chosen after
experimenting with different configurations and choosing the ones that avoided overfitting)
regressor = DecisionTreeRegressor(
max_depth=25,
min_samples_split=40,
min_samples_leaf=15,
max_features='sqrt',
random_state=10
)
regressor.fit(X_train, y_train)

Evaluating the model

y_pred = regressor.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error (MAE): {mae}")

print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R-squared (R^2): {r2}")

Following performance parameters were obtained on training dataset-

Mean Absolute Error (MAE): 1.85
Mean Squared Error (MSE): 10.16
Root Mean Squared Error (RMSE): 3.19
R-squared (R^2): 0.89

Following performance parameters were obtained on test dataset-

Mean Absolute Error (MAE): 2.1160518106723467
Mean Squared Error (MSE): 10.597756621559329
Root Mean Squared Error (RMSE): 3.255419576883958
R-squared (R^2): 0.8517373327150053

Printing the decision tree as hypothesized

plt.figure(figsize=(20, 10))
plot_tree(regressor,
feature_names=X.columns,
filled=True,
rounded=True,)
plt.title('Decision Tree Visualization')
plt.show()

Decision tree that was hypothesized for the regression task is as follows-

Conclusion:
By implementing the assigned task, I was able to brush up on the basic concepts associated
with building a decision tree. I was able to build, train and test the tree in python and was
able to come up with the following inferences-
 For the assigned regression task, the analysis, logically, entailed a heavy dependence
on weight and height as features for the prediction of body mass index of an
individual.
 The model trained initially had a test r-square value of 0.98 which was identified as
overfitting. The rectified model, then, had the test r-square value of around 0.8517
while the r-square value on training data was approximately 0.89.

Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Body Fat Prediction
No ratings yet
Body Fat Prediction
11 pages
AI Lab9 22it3044
No ratings yet
AI Lab9 22it3044
21 pages
Aih Exp 2
No ratings yet
Aih Exp 2
8 pages
Cardiovascular_Disease_Prediction
No ratings yet
Cardiovascular_Disease_Prediction
2 pages
DT_R
No ratings yet
DT_R
2 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
Data Mining Journal 4 Kashan
No ratings yet
Data Mining Journal 4 Kashan
8 pages
Decision Trees
No ratings yet
Decision Trees
28 pages
SUMMARY
No ratings yet
SUMMARY
16 pages
Baitap 5
No ratings yet
Baitap 5
1 page
Simple Linear Regression - Assign
No ratings yet
Simple Linear Regression - Assign
8 pages
Expt7_ML2025_250306_143857
No ratings yet
Expt7_ML2025_250306_143857
5 pages
Chatbot For Prediction of Weight and BMI
No ratings yet
Chatbot For Prediction of Weight and BMI
3 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
4164 ML-Assignment
No ratings yet
4164 ML-Assignment
4 pages
Lecture 15: Tree-Based Algorithms — Applied ML
No ratings yet
Lecture 15: Tree-Based Algorithms — Applied ML
17 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Rapport
No ratings yet
Rapport
21 pages
Heart Disease Predictor - ML - Report
No ratings yet
Heart Disease Predictor - ML - Report
15 pages
trees_regression.ipynb - Colab
No ratings yet
trees_regression.ipynb - Colab
4 pages
DA_Lab_Week-3 (1)
No ratings yet
DA_Lab_Week-3 (1)
15 pages
University of California Los Angeles
No ratings yet
University of California Los Angeles
45 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
Health Analysis and Diet Recommendation System Ppt
No ratings yet
Health Analysis and Diet Recommendation System Ppt
9 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
JETIR2205326
No ratings yet
JETIR2205326
9 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
BCSP241006_BCS221016_BCS221023_REPORT
No ratings yet
BCSP241006_BCS221016_BCS221023_REPORT
38 pages
PR 6
No ratings yet
PR 6
2 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
MlProject Cse 30 37
No ratings yet
MlProject Cse 30 37
27 pages
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
No ratings yet
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
5 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Task2
No ratings yet
Task2
4 pages
IT0089 TB391 Decision Tree RABE
No ratings yet
IT0089 TB391 Decision Tree RABE
6 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
20MIS7095 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7095 (LAB 7) .Ipynb Colaboratory
4 pages
Identification of Malnutrition and Prediction of BMI From Facial Images Using Machine Learning
No ratings yet
Identification of Malnutrition and Prediction of BMI From Facial Images Using Machine Learning
51 pages
Second Progres Report
No ratings yet
Second Progres Report
10 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Decision_Tree_Regression.ipynb - Colab
No ratings yet
Decision_Tree_Regression.ipynb - Colab
3 pages
20MIS7043 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7043 (LAB 7) .Ipynb Colaboratory
4 pages
AI ML - Cycle 2 Programs (1)
No ratings yet
AI ML - Cycle 2 Programs (1)
15 pages
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
No ratings yet
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
40 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Thyroid Predection System
No ratings yet
Thyroid Predection System
23 pages
DS-Food
No ratings yet
DS-Food
18 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
p3 Assesslearners Report
No ratings yet
p3 Assesslearners Report
8 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Assign 6 Solution
No ratings yet
Assign 6 Solution
11 pages
ML
No ratings yet
ML
131 pages
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
No ratings yet
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
25 pages
FP Report - Group 2
No ratings yet
FP Report - Group 2
4 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Experiment-8
No ratings yet
Experiment-8
13 pages
Experiment-3
No ratings yet
Experiment-3
9 pages
Experiment-1
No ratings yet
Experiment-1
21 pages
Experiment-5
No ratings yet
Experiment-5
14 pages
Experiment-4
No ratings yet
Experiment-4
8 pages
Experiment-2
No ratings yet
Experiment-2
12 pages
Experiment-7
No ratings yet
Experiment-7
13 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Experiment 1
No ratings yet
Experiment 1
16 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
examples
No ratings yet
examples
10 pages
Kriging
No ratings yet
Kriging
4 pages
Chapter 09 - Dummy Variables
No ratings yet
Chapter 09 - Dummy Variables
21 pages
Stats Interview Questions Answers 1697190472
No ratings yet
Stats Interview Questions Answers 1697190472
54 pages
Lampiran 3: Hasil Uji Chi-Square: Case Processing Summary
No ratings yet
Lampiran 3: Hasil Uji Chi-Square: Case Processing Summary
6 pages
Test of Equal Covariance
No ratings yet
Test of Equal Covariance
5 pages
SPSS-RAK Faktorial
No ratings yet
SPSS-RAK Faktorial
61 pages
Time Series Forecasting Business Report-1
No ratings yet
Time Series Forecasting Business Report-1
65 pages
CH 12
No ratings yet
CH 12
30 pages
Data Pre Processing
No ratings yet
Data Pre Processing
63 pages
Demand Forecasting
No ratings yet
Demand Forecasting
26 pages
JCN 9 381 PDF
No ratings yet
JCN 9 381 PDF
1 page
2014 SSAC The Hot Hand A New Approach
No ratings yet
2014 SSAC The Hot Hand A New Approach
10 pages
Lean Six Sigma Black Belt - Introduction To Minitab
No ratings yet
Lean Six Sigma Black Belt - Introduction To Minitab
134 pages
Random Variables PDF
No ratings yet
Random Variables PDF
90 pages
MLQuestion-Bank (1)
No ratings yet
MLQuestion-Bank (1)
2 pages
Datamites Certified Data Scientist Syllabus PDF
50% (2)
Datamites Certified Data Scientist Syllabus PDF
12 pages
Parametric Test
No ratings yet
Parametric Test
40 pages
Zuur Protocol 2010
No ratings yet
Zuur Protocol 2010
12 pages
Introductory Business Statistics With Interactive Spreadsheets 1st Canadian Edition 1660157589. Print
No ratings yet
Introductory Business Statistics With Interactive Spreadsheets 1st Canadian Edition 1660157589. Print
110 pages
Lesson 32
No ratings yet
Lesson 32
3 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
CE204 - 2024 HOMEWORK 3 - SOLUTIONS v00
No ratings yet
CE204 - 2024 HOMEWORK 3 - SOLUTIONS v00
7 pages
PDF Survival Analysis With Interval Censored Data A Practical Approach With Examples in R SAS and BUGS 1st Edition Kris Bogaerts Download
100% (3)
PDF Survival Analysis With Interval Censored Data A Practical Approach With Examples in R SAS and BUGS 1st Edition Kris Bogaerts Download
52 pages
Lecture5 More of Chapter 3
100% (1)
Lecture5 More of Chapter 3
58 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Stat 5 T Test F Test Z Test Chi Square Test PDF
100% (2)
Stat 5 T Test F Test Z Test Chi Square Test PDF
20 pages
Type I, II and III Sums of Squares
No ratings yet
Type I, II and III Sums of Squares
3 pages
Correlation Analysis
No ratings yet
Correlation Analysis
3 pages