0% found this document useful (0 votes)

30 views4 pages

E21CSEU0770 Lab4

The document loads and analyzes the iris dataset using a decision tree classifier. It performs cross validation and hyperparameter tuning on the classifier. It then loads and explores a credit card fraud dataset, splitting it into features and target and plotting the target distribution. Decision tree classifiers are fit to the data with and without class weighting and their performance is evaluated through cross validation.

Uploaded by

kumar.nayan26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views4 pages

E21CSEU0770 Lab4

Uploaded by

kumar.nayan26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

15/09/2023, 15:44 E21CSEU0962_Lab4.

ipynb - Colaboratory

1
2 from sklearn.datasets import load_iris
3 from sklearn.tree import DecisionTreeClassifier
4 from sklearn.model_selection import cross_val_score, GridSearchCV
5 import pandas as pd
6 import matplotlib.pyplot as plt
7 from sklearn import tree
8
9

1 iris = load_iris()
2
3 df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
4 df['target'] = iris.target
5 clf = DecisionTreeClassifier()

1 scores = cross_val_score(clf, df.drop('target', axis=1), df['target'], cv=5)

2
3 print("Cross-validation scores:", scores)
4 print("Mean accuracy:", scores.mean())

Cross-validation scores: [0.96666667 0.96666667 0.9 1. 1. ]

Mean accuracy: 0.9666666666666668

1 clf.fit(df.drop('target', axis=1), df['target'])

2
3 plt.figure(figsize=(12,10))
4 tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
5 plt.show()

1
2 param_grid = {
3 'max_depth': [2, 4, 6, 8],
4 'min_samples_split': [2, 4, 6, 8],
5 'min_samples_leaf': [1, 2, 3]

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 1/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory
6 }
7

1
2 grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
3 grid_search.fit(df.drop('target', axis=1), df['target'])
4
5 best_params = grid_search.best_params_
6 best_score = grid_search.best_score_
7
8 print("Best Hyperparameters:", best_params)
9 print("Best Score (Accuracy):", best_score)

Best Hyperparameters: {'max_depth': 4, 'min_samples_leaf': 1, 'min_samples_split': 4}

Best Score (Accuracy): 0.9666666666666668

1 import pandas as pd
2
3 df = pd.read_csv('creditcard.csv')
4
5 print("Dataset Shape:", df.shape)
6 print("Statistical Summary:")
7 print(df.describe())
8

Dataset Shape: (284807, 31)

Statistical Summary:
Time V1 V2 V3 V4 \
count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean 94813.859575 1.168375e-15 3.416908e-16 -1.379537e-15 2.074095e-15
std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00
min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00
25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01
50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02
75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01
max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01

V5 V6 V7 V8 V9 \
count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean 9.604066e-16 1.487313e-15 -5.556467e-16 1.213481e-16 -2.406331e-15
std 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00
min -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01
25% -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01
50% -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02
75% 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01
max 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01

... V21 V22 V23 V24 \

count ... 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean ... 1.654067e-16 -3.568593e-16 2.578648e-16 4.473266e-15
std ... 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01
min ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00
25% ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01
50% ... -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02
75% ... 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01
max ... 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00

V25 V26 V27 V28 Amount \

count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000
mean 5.340915e-16 1.683437e-15 -3.660091e-16 -1.227390e-16 88.349619
std 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 250.120109
min -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000
25% -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000
50% 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000
75% 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000
max 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000

Class
count 284807.000000
mean 0.001727
std 0.041527
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 1.000000

[8 rows x 31 columns]

1 X = df.drop('Class', axis=1)
2 y = df['Class']
3
4 print("Features Shape (X):", X.shape)

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 2/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory
5 print("Target Shape (y):", y.shape)
6
Features Shape (X): (284807, 30)
Target Shape (y): (284807,)

1 import seaborn as sns

2 import matplotlib.pyplot as plt
3
4 sns.countplot(x='Class', data=df)
5 plt.title('Distribution of Target Variable')
6 plt.show()
7

1 from sklearn.tree import DecisionTreeClassifier

2
3 dt_model = DecisionTreeClassifier()

1 from sklearn.model_selection import RepeatedStratifiedKFold

2
3 rkf = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=1)

1 from sklearn.model_selection import cross_val_score

2 from sklearn.metrics import roc_auc_score
3
4 roc_auc_scores = cross_val_score(dt_model, X, y, cv=rkf, scoring='roc_auc')
5 print("ROC-AUC Scores:", roc_auc_scores)
6 print("Mean ROC-AUC:", roc_auc_scores.mean())
7

ROC-AUC Scores: [0.87728723 0.85693183 0.89766022 0.88736821 0.88747373 0.8797362

0.91970103 0.8467805 0.8568263 0.87735757]
Mean ROC-AUC: 0.8787122830612331

1
2 dt_model_balanced = DecisionTreeClassifier(class_weight="balanced")
3
4 roc_auc_scores_balanced = cross_val_score(dt_model_balanced, X, y, cv=rkf, scoring='roc_auc')
5 print("Balanced Class ROC-AUC Scores:", roc_auc_scores_balanced)
6 print("Mean ROC-AUC (Balanced Class):", roc_auc_scores_balanced.mean())
7

Balanced Class ROC-AUC Scores: [0.87726965 0.84671016 0.8976954 0.86718867 0.86725901 0.85985931
0.91980655 0.85700217 0.87741033 0.89766021]
Mean ROC-AUC (Balanced Class): 0.8767861446135539

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 3/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 4/4

Finite Element Procedures-Bathe
100% (2)
Finite Element Procedures-Bathe
397 pages
Mlcreditcardfraud
No ratings yet
Mlcreditcardfraud
71 pages
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
No ratings yet
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
10 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
101 pages
Fresco
100% (2)
Fresco
17 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Levelling and Profile Ploting PDF
100% (4)
Levelling and Profile Ploting PDF
5 pages
COMP5318
No ratings yet
COMP5318
42 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Credit Card-Fraud-Detection
No ratings yet
Credit Card-Fraud-Detection
39 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
AI&ML PGM
No ratings yet
AI&ML PGM
53 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Output Da Record
No ratings yet
Output Da Record
16 pages
Ai ML Programs
No ratings yet
Ai ML Programs
34 pages
Department of Computer Engineering Academic Term: June-Nov 2021
No ratings yet
Department of Computer Engineering Academic Term: June-Nov 2021
6 pages
Aiml
No ratings yet
Aiml
18 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
Schedule Risk Analysis
No ratings yet
Schedule Risk Analysis
40 pages
Assignment 3
No ratings yet
Assignment 3
15 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Package HMC': Title
No ratings yet
Package HMC': Title
25 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Skill and Competency
100% (2)
Skill and Competency
28 pages
Xgboost
No ratings yet
Xgboost
12 pages
ML File
No ratings yet
ML File
13 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
S 10
No ratings yet
S 10
11 pages
Course Subjects: List of Subjects According To CFS Courses in UIA
No ratings yet
Course Subjects: List of Subjects According To CFS Courses in UIA
3 pages
Keeratsi HW8
No ratings yet
Keeratsi HW8
17 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
7 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
Assignment II Machine Learning
No ratings yet
Assignment II Machine Learning
8 pages
Xtasy
No ratings yet
Xtasy
14 pages
ML Manual
No ratings yet
ML Manual
18 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
Ex 3
No ratings yet
Ex 3
5 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Neural Network
No ratings yet
Neural Network
7 pages
STATSCHEATSHeet
No ratings yet
STATSCHEATSHeet
5 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
EE 559 HW2Code PDF
No ratings yet
EE 559 HW2Code PDF
7 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
Naive
No ratings yet
Naive
5 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
BHMC17 P5.ipynb - Colaboratory
No ratings yet
BHMC17 P5.ipynb - Colaboratory
4 pages
EXAM PREPERATION - Ipynb - Colaboratory-1
No ratings yet
EXAM PREPERATION - Ipynb - Colaboratory-1
8 pages
HW7 Code
No ratings yet
HW7 Code
3 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
Exp 5
No ratings yet
Exp 5
4 pages
A Comprehensive Review of Acousto Ultrasonic-Echo (Au-E) Technique For Furnace Refractory Lining Assessment
100% (1)
A Comprehensive Review of Acousto Ultrasonic-Echo (Au-E) Technique For Furnace Refractory Lining Assessment
21 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Lab02 Summary Measures - Ipynb
No ratings yet
Lab02 Summary Measures - Ipynb
2 pages
Iso 67892003
No ratings yet
Iso 67892003
5 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
Pattern Recognition 2nd Ed. (2009)
No ratings yet
Pattern Recognition 2nd Ed. (2009)
113 pages
New Microsoft Office PowerPoint Presentation
No ratings yet
New Microsoft Office PowerPoint Presentation
27 pages
9709 s20 QP 31-Solved (Handwritten)
No ratings yet
9709 s20 QP 31-Solved (Handwritten)
12 pages
Problem On Ages 411119 Crwill
No ratings yet
Problem On Ages 411119 Crwill
6 pages
Norton
No ratings yet
Norton
14 pages
YP ICSE 10th Number Based Programs
No ratings yet
YP ICSE 10th Number Based Programs
8 pages
AL2 Series SOFTWARE MANUAL Jy992d74001l PDF
No ratings yet
AL2 Series SOFTWARE MANUAL Jy992d74001l PDF
124 pages
Quadric Surfaces
No ratings yet
Quadric Surfaces
5 pages
Tutorials On Hydrostatic Forces
No ratings yet
Tutorials On Hydrostatic Forces
3 pages
Semantics
100% (2)
Semantics
14 pages
111 Probability Theory Answers
No ratings yet
111 Probability Theory Answers
4 pages
NFA To DFA Conversion: Rabin and Scott (1959)
No ratings yet
NFA To DFA Conversion: Rabin and Scott (1959)
14 pages
A Review On Cartans Structure Equations For Certa
No ratings yet
A Review On Cartans Structure Equations For Certa
7 pages
Curriculum Content: 1. General Physics
No ratings yet
Curriculum Content: 1. General Physics
3 pages
1.1 Functions and Theis Representations
No ratings yet
1.1 Functions and Theis Representations
17 pages
Test Review
No ratings yet
Test Review
8 pages
GCSE H3 02g4 02 3D Trigonometry
No ratings yet
GCSE H3 02g4 02 3D Trigonometry
2 pages
Lee Smolin - A Real Ensemble Interpretation of Quantum Mechanics
No ratings yet
Lee Smolin - A Real Ensemble Interpretation of Quantum Mechanics
14 pages
Risk Matrix
No ratings yet
Risk Matrix
1 page
Strings Js Notes
No ratings yet
Strings Js Notes
3 pages
Naskah Fathi Slide I Slide 8
No ratings yet
Naskah Fathi Slide I Slide 8
3 pages
Inmo 2000
No ratings yet
Inmo 2000
1 page
Off the Grid - Getting Started
From Everand
Off the Grid - Getting Started
Wayne J Lutz
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet