0% found this document useful (0 votes)

8 views16 pages

Machine Learning

Uploaded by

ahersuraj23march

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views16 pages

Machine Learning

Uploaded by

ahersuraj23march

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

MACHINE LEARNING

PROJECT ON DECISION TREES

Project Name :

Name : Suraj Gopal Aher

Roll No : 23/IT/164
Abstract : Decision Trees are one of the most intuitive and widely used algorithms in
Machine Learning for both classification and regression tasks. This project explains the
working of Decision Trees, their applications, and how they help in making accurate and
interpretable decisions based on data. The study involves building a simple Decision Tree
classifier and analyzing its performance on a dataset.

Introduction
Decision Trees are a supervised learning technique used for both classification and regression
problems. They work by splitting the dataset into subsets based on the value of input features.
This structure resembles a tree, where internal nodes represent a test on a feature, branches
represent the outcome, and leaves represent the final decision.

Study Objective
• To understand the concept and working of Decision Trees.

• To build and evaluate a Decision Tree model using Python and Scikit-learn.

• To explore real-world applications and future trends.

Scope & Limitations

Scope:
• Implementation on publicly available datasets.

• Use of Scikit-learn for building and visualizing Decision Trees.

Limitations:

• Limited to small datasets.

• May not generalize well on complex, large-scale problems due to overfitting.

Real-World Application
Decision Trees are used in:

• Medical Diagnosis: To predict diseases based on symptoms.

• Credit Risk Analysis: To evaluate loan applicants.

• Customer Segmentation: To classify customers based on purchasing behavior.

• Fraud Detection: To detect unusual transactions.
• Recommendation Systems: For product recommendations.
Challenges and Future Trends
Challenges:

• Prone to overfitting on training data.

• Can become unstable with small variations in data.

• Not ideal for continuous or high-dimensional data.

Future Trends:
• Use of Ensemble Methods like Random Forests and Gradient Boosting to improve
accuracy.
• Integration with Explainable AI (XAI) for better interpretability.

• Application in areas like autonomous vehicles and personalized medicine.

Conclusion:
Decision Trees are a powerful and easy-to-understand model in Machine Learning. Despite
their limitations, they serve as a fundamental building block for advanced ensemble methods.
With proper tuning and modern techniques, Decision Trees continue to have a significant role
in real-world applications.

References
1. Scikit-learn documentation: https://scikit-learn.org/
2. “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien
Géron.

3. Kaggle Datasets: https://www.kaggle.com/

4. Research papers and articles on Decision Trees.

CODE:

# Credit Card Fraud Detection using Decision Trees

# This program builds a Decision Tree classifier to detect credit card fraud
# using the Kaggle Credit Card Fraud Detection dataset.

# Dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

# Import necessary libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.metrics import confusion_matrix, classification_report

from sklearn.preprocessing import StandardScaler

import warnings

warnings.filterwarnings('ignore')

# Set random seed for reproducibility

np.random.seed(42)

# Load the dataset

print("Loading the Credit Card Fraud dataset...")

try:
df = pd.read_csv('creditcard.csv')

print("Dataset successfully loaded.")

except FileNotFoundError:

print("Dataset file not found. Please download the dataset from Kaggle and place it in the
current directory.")

print("Download link: https://www.kaggle.com/mlg-ulb/creditcardfraud")

exit()

# Display basic information about the dataset

print("\n=== Dataset Information ===")

print(f"Shape of the dataset: {df.shape}")

print("\nFirst 5 rows of the dataset:")

print(df.head())

print("\nSummary statistics:")

print(df.describe())

# Check for missing values

print("\nMissing values in each column:")

print(df.isnull().sum())

# Display class distribution (fraud vs non-fraud)

print("\n=== Class Distribution ===")

class_counts = df['Class'].value_counts()
print(class_counts)

print(f"Percentage of fraud transactions: {class_counts[1] / len(df) * 100:.4f}%")

# Visualize class distribution

plt.figure(figsize=(10, 6))

sns.countplot(x='Class', data=df)

plt.title('Class Distribution (0: Normal, 1: Fraud)')

plt.ylabel('Count')
plt.yscale('log') # Using log scale for better visualization due to class imbalance

plt.savefig('class_distribution.png')

plt.close()

# Split the data into features and target

X = df.drop('Class', axis=1)

y = df['Class']

# Split the data into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,

stratify=y)

print("\n=== Dataset Split ===")

print(f"Training set size: {X_train.shape[0]} samples")

print(f"Testing set size: {X_test.shape[0]} samples")

# Create and train a Decision Tree Classifier

# Using class_weight='balanced' to handle imbalanced dataset

print("\n=== Training Decision Tree Classifier ===")

dt_classifier = DecisionTreeClassifier(

max_depth=10, # Limit depth to prevent overfitting

min_samples_split=20, # Minimum samples required to split an internal node

min_samples_leaf=5, # Minimum samples required at a leaf node

class_weight='balanced', # Handle class imbalance

random_state=42

# Fit the model to the training data

dt_classifier.fit(X_train, y_train)
print("Decision Tree training completed.")
# Make predictions on the test set

y_pred = dt_classifier.predict(X_test)

# Evaluate the model

print("\n=== Model Evaluation ===")

# Calculate evaluation metrics

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")

print(f"Precision: {precision:.4f}")

print(f"Recall: {recall:.4f}")

print(f"F1 Score: {f1:.4f}")

# Generate and display confusion matrix

cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")

print(cm)

# Display classification report

print("\nClassification Report:")

print(classification_report(y_test, y_pred))
# Visualize the confusion matrix

plt.figure(figsize=(8, 6))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

xticklabels=['Normal (0)', 'Fraud (1)'],

yticklabels=['Normal (0)', 'Fraud (1)'])

plt.xlabel('Predicted Label')

plt.ylabel('True Label')

plt.title('Confusion Matrix')

plt.savefig('confusion_matrix.png')

plt.close()

# Find the most important features based on the Decision Tree

feature_importance = pd.DataFrame({

'Feature': X.columns,

'Importance': dt_classifier.feature_importances_

})

feature_importance = feature_importance.sort_values('Importance', ascending=False)

print("\n=== Feature Importance ===")

print(feature_importance.head(10)) # Top 10 most important features

# Visualize feature importance

plt.figure(figsize=(12, 8))

sns.barplot(x='Importance', y='Feature', data=feature_importance.head(10))

plt.title('Top 10 Feature Importance')

plt.tight_layout()
plt.savefig('feature_importance.png')

plt.close()
# Visualize the Decision Tree (limited to max_depth=3 for clarity)

plt.figure(figsize=(20, 10))

plot_tree(

dt_classifier,
max_depth=3, # Limit visualization depth for clarity

feature_names=X.columns,

class_names=['Normal', 'Fraud'],

filled=True,

rounded=True,

fontsize=10

)
plt.title('Decision Tree Visualization (Limited to Depth 3)')
plt.savefig('decision_tree.png')

plt.close()

print("\n=== Decision Tree Interpretation ===")

print("""

The decision tree diagram shows how the model makes decisions to classify transactions as
fraudulent or normal:

1. Each node represents a decision based on a specific feature.

2. The color intensity indicates the class distribution at that node (darker = more
homogeneous).

3. The tree branches out based on feature thresholds.

4. The deeper the tree, the more complex the decision-making process.

5. Leaf nodes represent the final classification decisions.

The important features identified by the model (shown in the feature importance plot)

are the key indicators the algorithm uses to detect fraudulent transactions.
Time and transaction amount, along with certain V-features (which are PCA-transformed
features for confidentiality), play significant roles in fraud detection.

""")

print("\n=== Analysis Complete ===")

print("Decision Tree model for Credit Card Fraud Detection has been successfully built and
evaluated.")

print("Visualization files have been saved: class_distribution.png, confusion_matrix.png,

feature_importance.png, and decision_tree.png")

OUTPUT:

Plumbing Takeoff
No ratings yet
Plumbing Takeoff
39 pages
Ai Merge All Slides'
No ratings yet
Ai Merge All Slides'
314 pages
Directed Panspermia by Crick and Orgel
100% (1)
Directed Panspermia by Crick and Orgel
6 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Project 1
No ratings yet
Project 1
4 pages
Decision Tree Project Report
No ratings yet
Decision Tree Project Report
3 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory
No ratings yet
CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory
5 pages
Prac 6
No ratings yet
Prac 6
6 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
2 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Random Forest
No ratings yet
Random Forest
11 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
Experiment No 4 Vanraj
No ratings yet
Experiment No 4 Vanraj
2 pages
Lab # 10
No ratings yet
Lab # 10
6 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Reseach Paper 2023
No ratings yet
Reseach Paper 2023
9 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
What Is Decision Tree?: ISM Implementation of Decision Tree Submitted By: Sagiruddin Akthar 19mcmc28
No ratings yet
What Is Decision Tree?: ISM Implementation of Decision Tree Submitted By: Sagiruddin Akthar 19mcmc28
4 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Notes 221104 101858
No ratings yet
Notes 221104 101858
32 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
HSMC
No ratings yet
HSMC
5 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Prac5 AAM
No ratings yet
Prac5 AAM
2 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
20ee38011 Exp4
No ratings yet
20ee38011 Exp4
24 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Spotle - Ai Data Science Final Capstone Project Building An Credit Card Analyser Using Decision Tree Classifier
No ratings yet
Spotle - Ai Data Science Final Capstone Project Building An Credit Card Analyser Using Decision Tree Classifier
4 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Is Lab Aman Agarwal PDF
No ratings yet
Is Lab Aman Agarwal PDF
8 pages
Present
No ratings yet
Present
20 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Unit 3 Lecture 4
No ratings yet
Unit 3 Lecture 4
15 pages
Overview of American Literature
No ratings yet
Overview of American Literature
5 pages
The Prioress Tale
No ratings yet
The Prioress Tale
9 pages
Modernism in British Fiction
No ratings yet
Modernism in British Fiction
5 pages
Appreciation of Fiction Assignment - 23it140
No ratings yet
Appreciation of Fiction Assignment - 23it140
3 pages
Exploring The Romantic Era in English Literature
No ratings yet
Exploring The Romantic Era in English Literature
5 pages
Vending Machine Slides Another Way
No ratings yet
Vending Machine Slides Another Way
26 pages
Pocket-Book Communications-Multimedia 2019 PDF
No ratings yet
Pocket-Book Communications-Multimedia 2019 PDF
41 pages
Daka-r Manual 영문 Ver 1 161205
No ratings yet
Daka-r Manual 영문 Ver 1 161205
45 pages
CNB1 2 4 D01281 00
No ratings yet
CNB1 2 4 D01281 00
32 pages
DFD ManPower Information System
No ratings yet
DFD ManPower Information System
4 pages
B961-13 Standard Specification For Silver Coated Copper and Copper Alloy Stranded Conductors For Electronic Space Application
No ratings yet
B961-13 Standard Specification For Silver Coated Copper and Copper Alloy Stranded Conductors For Electronic Space Application
8 pages
Hotel Reservation System General
83% (12)
Hotel Reservation System General
16 pages
03 ICEMA Quarterly Sales Report - Q4 FY24
No ratings yet
03 ICEMA Quarterly Sales Report - Q4 FY24
116 pages
Cooling and Heating Load Calculations - Heat Transfer Through Buildings - Fabric Heat Gain/Loss
No ratings yet
Cooling and Heating Load Calculations - Heat Transfer Through Buildings - Fabric Heat Gain/Loss
28 pages
Maintenance Management of Electrical Equipment Condition Monitoring Based Part 3
No ratings yet
Maintenance Management of Electrical Equipment Condition Monitoring Based Part 3
4 pages
Katholieke Universiteit Leuven, E.E. Dept., Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Email: (Deepaknath - Tandur, Marc - Moonen) @esat - Kuleuven.be
No ratings yet
Katholieke Universiteit Leuven, E.E. Dept., Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Email: (Deepaknath - Tandur, Marc - Moonen) @esat - Kuleuven.be
4 pages
C200-047-09-Man 92 07 Mha Nov
No ratings yet
C200-047-09-Man 92 07 Mha Nov
18 pages
Attendance Sheet
No ratings yet
Attendance Sheet
11 pages
B2B Marketing PPT by Sir
No ratings yet
B2B Marketing PPT by Sir
115 pages
E Banking Questionaire
No ratings yet
E Banking Questionaire
28 pages
Hitachi NR83A Parts - Strip Nailer - Hitachi Nailer Parts - Hitachi Parts - Tool Parts
No ratings yet
Hitachi NR83A Parts - Strip Nailer - Hitachi Nailer Parts - Hitachi Parts - Tool Parts
1 page
JLINE PMP Brure N
No ratings yet
JLINE PMP Brure N
4 pages
Data Science Banking and Fintech
100% (2)
Data Science Banking and Fintech
25 pages
SCXI-1382 12V 26ah Battery Pack 371189b
No ratings yet
SCXI-1382 12V 26ah Battery Pack 371189b
7 pages
23-01-03 - Rooftop Copy - R.0 FINAL
No ratings yet
23-01-03 - Rooftop Copy - R.0 FINAL
12 pages
Nera Presentation5 MW Radiove Spoje
No ratings yet
Nera Presentation5 MW Radiove Spoje
29 pages
16C97 Door Lock Instruction Manual
No ratings yet
16C97 Door Lock Instruction Manual
6 pages
Ametek T 600 Series Manual
No ratings yet
Ametek T 600 Series Manual
17 pages
Linear Programming Worksheet Key PDF
100% (1)
Linear Programming Worksheet Key PDF
2 pages
Bwsi - Mit.edu-Saturday Programs
No ratings yet
Bwsi - Mit.edu-Saturday Programs
2 pages
Geometrical Stiffness of Loudspeaker Cones A FINECone FEM Study
No ratings yet
Geometrical Stiffness of Loudspeaker Cones A FINECone FEM Study
8 pages
Door Fan Test - Fire New Installers
100% (1)
Door Fan Test - Fire New Installers
99 pages
Berklee Practice Method Bass - 'Chapter 6' Playing Hard Rock
No ratings yet
Berklee Practice Method Bass - 'Chapter 6' Playing Hard Rock
7 pages