0% found this document useful (0 votes)

6 views7 pages

ML2

The document outlines a series of steps for building and evaluating a Decision Tree model and comparing it with a Logistic Regression model using a dataset. It includes code for data preparation, model training, accuracy assessment, confusion matrix generation, and ROC-AUC score calculation. Additionally, it discusses the Gini index as a measure of node impurity and provides a method for optimizing the model's max depth using GridSearchCV.

Uploaded by

hudsonnnnn16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

ML2

Uploaded by

hudsonnnnn16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

NAME : P KOUSHIK REDDY

ROLL NO : 12212161

1) 1. Import Libraries(code)

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, confusion_matrix

2) 2. Load the Dataset

# Load the dataset

df = pd.read_csv('your_dataset.csv')

# Check the first few rows of the dataset

print(df.head())

3) 3. Prepare Features and Target Variables

# Define features (X) and target (y)

X = df.drop('target_column', axis=1) # Replace 'target_column' with your actual target column

y = df['target_column'] # Replace 'target_column' with your actual target column

4) 4. Split the Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5) 5. Build the Decision Tree Model

# Build the decision tree classifier with Gini index and max depth of 4

clf = DecisionTreeClassifier(criterion='gini', max_depth=4, random_state=42)

# Train the model

clf.fit(X_train, y_train)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

# Confusion matrix

cm = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:\n", cm)

2) CODE TO COMPARE BOTH THE MODELS:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

# Load the dataset

df = pd.read_csv('your_dataset.csv')

# Define features (X) and target (y)

X = df.drop('target_column', axis=1) # Drop the target column

y = df['target_column'] # Target is CHD (0 or 1)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Decision Tree Classifier

dt_clf = DecisionTreeClassifier(criterion='gini', max_depth=4, random_state=42)

# Train the model

dt_clf.fit(X_train, y_train)

# Make predictions

dt_pred = dt_clf.predict(X_test)
# Initialize Logistic Regression

lr_clf = LogisticRegression(max_iter=1000)

# Train the model

lr_clf.fit(X_train, y_train)

# Make predictions

lr_pred = lr_clf.predict(X_test)

dt_accuracy = accuracy_score(y_test, dt_pred)

lr_accuracy = accuracy_score(y_test, lr_pred)

print("Decision Tree Accuracy:", dt_accuracy)

print("Logistic Regression Accuracy:", lr_accuracy)

print("Decision Tree Confusion Matrix:")

print(confusion_matrix(y_test, dt_pred))

print("Logistic Regression Confusion Matrix:")

print(confusion_matrix(y_test, lr_pred))

print("Decision Tree Classification Report:")

print(classification_report(y_test, dt_pred))

print("Logistic Regression Classification Report:")

print(classification_report(y_test, lr_pred))

dt_roc_auc = roc_auc_score(y_test, dt_pred)

lr_roc_auc = roc_auc_score(y_test, lr_pred)

print("Decision Tree ROC-AUC:", dt_roc_auc)

print("Logistic Regression ROC-AUC:", lr_roc_auc)

Summary: The model with the higher ROC-AUC score combined with good precision, recall, and F1-
score should be considered better for classifying CHD cases.
3) Step 1: Code to Plot the Decision Tree

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

import matplotlib.pyplot as plt

# Assuming you have already trained the decision tree classifier

# dt_clf is your decision tree classifier object

# Plot the decision tree

plt.figure(figsize=(20,10))

plot_tree(dt_clf, feature_names=X.columns, class_names=['No CHD', 'CHD'], filled=True,

rounded=True)

plt.show()

Step 2: Identify the Top Node and Splitting Criteria

Once you plot the tree, the top node is the root node of the tree. The feature shown at this node is
the most important feature, i.e., the one that best splits the data according to the Gini index (or
whichever criterion is used).

 Look for the first feature in the plot, which will be the top node.

 The threshold at this node is the value that splits the dataset into two branches

Step 3: Deriving the Gini Index at the Top Node

Step-by-Step Calculation:

1. Calculate class proportions at the top node:

o Count how many samples belong to class 0 (No CHD) and class 1 (CHD).

o Calculate the proportion of each class: p0p_0p0 and p1p_1p1.

2. Plug the proportions into the Gini formula:

Gini=1−(p02+p12)\text{Gini} = 1 - (p_0^2 + p_1^2)Gini=1−(p02+p12)

3. Interpret the result:

o A Gini index of 0 means the node is pure (all samples belong to one class).

o A higher Gini index indicates a more mixed node.

Summary:
1. The most important splitting criterion at the top node is the feature that gives the best
separation between classes, based on the Gini index.

2. The Gini index is a measure of how impure a node is. The lower the Gini index, the purer the
node.

3. The Gini index is calculated using the class proportions at the node, and it tells you how well
the split divides the data into classes.

4) CODE FOR OPTIMAL MAX SEARCH:

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import GridSearchCV, train_test_split

from sklearn.metrics import roc_auc_score

# Load the dataset

df = pd.read_csv('your_dataset.csv')

# Define features (X) and target (y)

X = df.drop('target_column', axis=1) # Replace 'target_column' with the actual target column for CHD

y = df['target_column'] # Target is CHD (0 or 1)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Decision Tree Classifier using Gini index

dt_clf = DecisionTreeClassifier(criterion='gini', random_state=42)

# Define the parameter grid for max_depth

param_grid = {'max_depth': range(3, 11)}

# Set up the GridSearchCV with ROC-AUC as scoring

grid_search = GridSearchCV(estimator=dt_clf, param_grid=param_grid, scoring='roc_auc', cv=5)

# Train the model using grid search

grid_search.fit(X_train, y_train)

# Get the optimal max_depth

best_max_depth = grid_search.best_params_['max_depth']

best_score = grid_search.best_score_

print(f"Optimal max_depth: {best_max_depth}")

print(f"Best ROC-AUC score: {best_score}")

# Train the Decision Tree with the optimal max_depth

optimal_tree = DecisionTreeClassifier(criterion='gini', max_depth=best_max_depth,

random_state=42)

optimal_tree.fit(X_train, y_train)

# Predict probabilities for the test set

y_pred_prob = optimal_tree.predict_proba(X_test)[:, 1]

# Calculate ROC-AUC score on the test set

roc_auc = roc_auc_score(y_test, y_pred_prob)

print(f"ROC-AUC Score on Test Set: {roc_auc}")

import matplotlib.pyplot as plt

from sklearn.metrics import roc_curve

# Compute ROC curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

# Plot the ROC curve

plt.figure()

plt.plot(fpr, tpr, label=f'Decision Tree (AUC = {roc_auc:.2f})')

plt.plot([0, 1], [0, 1], 'k--', label='Random Guess')

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('ROC Curve')
plt.legend(loc='best')

plt.show()

Interpret the Results

 The optimal max_depth is the one that gives the highest ROC-AUC score during cross-
validation.

 After finding the optimal max_depth, you evaluate the model on the test set to ensure its
performance is consistent.

 A higher ROC-AUC score indicates better model performance in distinguishing between

positive and negative classes (CHD and non-CHD cases).

Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
1988 - On The Methodology of Selecting Design Wave Height - Goda
No ratings yet
1988 - On The Methodology of Selecting Design Wave Height - Goda
15 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
Week 5
No ratings yet
Week 5
72 pages
ML Exp8 C36
No ratings yet
ML Exp8 C36
18 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Dela Cruz - NB - AT
No ratings yet
Dela Cruz - NB - AT
6 pages
ML 6
No ratings yet
ML 6
15 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
Non Parametric Tests
100% (1)
Non Parametric Tests
49 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Expt7 ML2025 250306 143857
No ratings yet
Expt7 ML2025 250306 143857
5 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Decision Tree and Evalaution
No ratings yet
Decision Tree and Evalaution
50 pages
DT R
No ratings yet
DT R
2 pages
Problems On Two Dimensional Random Variable
100% (1)
Problems On Two Dimensional Random Variable
15 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Decision Tree
No ratings yet
Decision Tree
1 page
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
23 Ucc 554 Aiml
No ratings yet
23 Ucc 554 Aiml
5 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
23ucc542 ml9
No ratings yet
23ucc542 ml9
6 pages
Slip
No ratings yet
Slip
5 pages
Decision Tree Classification
No ratings yet
Decision Tree Classification
1 page
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
M2 L5 T-Test and ANOVA
No ratings yet
M2 L5 T-Test and ANOVA
9 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
How To Do A Dickey Fuller Test Using Excel
86% (7)
How To Do A Dickey Fuller Test Using Excel
2 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Runs Test - Stat Notes, From North Carolina State University, Public Administration Program
No ratings yet
Runs Test - Stat Notes, From North Carolina State University, Public Administration Program
3 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Unified Supplementary Learning Materials: (Uslem
100% (1)
Unified Supplementary Learning Materials: (Uslem
10 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Data Science Concepts Lesson04 Decision Tree Concepts
No ratings yet
Data Science Concepts Lesson04 Decision Tree Concepts
22 pages
Project SPSS 2016
No ratings yet
Project SPSS 2016
18 pages
Unit 09
100% (1)
Unit 09
25 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Session 7 Statistics 2023 AP Daily Practice Sessions
100% (1)
Session 7 Statistics 2023 AP Daily Practice Sessions
2 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Damped Trend Exponential Smoothing: Prediction and Control: Giacomo Sbrana
No ratings yet
Damped Trend Exponential Smoothing: Prediction and Control: Giacomo Sbrana
8 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Reliability Estimation Using Minitab
No ratings yet
Reliability Estimation Using Minitab
16 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Availability, Reliability, SIL: Mean Time Between Failures (MTBF)
No ratings yet
Availability, Reliability, SIL: Mean Time Between Failures (MTBF)
4 pages
AMSM-Multivariate Inferential Statistics - Chapter Eight: November 2018
No ratings yet
AMSM-Multivariate Inferential Statistics - Chapter Eight: November 2018
9 pages
Data Analysis Rough
No ratings yet
Data Analysis Rough
3 pages
ECON F213 Mathstat
No ratings yet
ECON F213 Mathstat
3 pages
Finches Statistics Student-1
No ratings yet
Finches Statistics Student-1
7 pages
Ema Riboon
No ratings yet
Ema Riboon
1 page
Statistics: Descriptive Statistics Inferntial Statistics
No ratings yet
Statistics: Descriptive Statistics Inferntial Statistics
5 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
10 pages
Chap 9 Hypothesis Testing - Student
No ratings yet
Chap 9 Hypothesis Testing - Student
68 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
3 pages
LS 02 - Correlation - Regression
No ratings yet
LS 02 - Correlation - Regression
17 pages
May 23
No ratings yet
May 23
21 pages
CFA Probability Distribution Tables For L1 & L2 (300hours Updated)
No ratings yet
CFA Probability Distribution Tables For L1 & L2 (300hours Updated)
6 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
3is Q4 Complete Notes
No ratings yet
3is Q4 Complete Notes
20 pages
Deeplg 3
No ratings yet
Deeplg 3
8 pages
Mine 5
No ratings yet
Mine 5
8 pages
TMGP Rep Adist Acrop
No ratings yet
TMGP Rep Adist Acrop
6 pages
ML Theory Questions Final
No ratings yet
ML Theory Questions Final
3 pages
Ip 3
No ratings yet
Ip 3
7 pages
Mudit Dubey Resume 2025
No ratings yet
Mudit Dubey Resume 2025
1 page
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet