0% found this document useful (0 votes)
30 views4 pages

E21CSEU0770 Lab4

The document loads and analyzes the iris dataset using a decision tree classifier. It performs cross validation and hyperparameter tuning on the classifier. It then loads and explores a credit card fraud dataset, splitting it into features and target and plotting the target distribution. Decision tree classifiers are fit to the data with and without class weighting and their performance is evaluated through cross validation.

Uploaded by

kumar.nayan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

E21CSEU0770 Lab4

The document loads and analyzes the iris dataset using a decision tree classifier. It performs cross validation and hyperparameter tuning on the classifier. It then loads and explores a credit card fraud dataset, splitting it into features and target and plotting the target distribution. Decision tree classifiers are fit to the data with and without class weighting and their performance is evaluated through cross validation.

Uploaded by

kumar.nayan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

15/09/2023, 15:44 E21CSEU0962_Lab4.

ipynb - Colaboratory

1
2 from sklearn.datasets import load_iris
3 from sklearn.tree import DecisionTreeClassifier
4 from sklearn.model_selection import cross_val_score, GridSearchCV
5 import pandas as pd
6 import matplotlib.pyplot as plt
7 from sklearn import tree
8
9

1 iris = load_iris()
2
3 df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
4 df['target'] = iris.target
5 clf = DecisionTreeClassifier()

1 scores = cross_val_score(clf, df.drop('target', axis=1), df['target'], cv=5)


2
3 print("Cross-validation scores:", scores)
4 print("Mean accuracy:", scores.mean())

Cross-validation scores: [0.96666667 0.96666667 0.9 1. 1. ]


Mean accuracy: 0.9666666666666668

1 clf.fit(df.drop('target', axis=1), df['target'])


2
3 plt.figure(figsize=(12,10))
4 tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
5 plt.show()

1
2 param_grid = {
3 'max_depth': [2, 4, 6, 8],
4 'min_samples_split': [2, 4, 6, 8],
5 'min_samples_leaf': [1, 2, 3]

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 1/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory
6 }
7

1
2 grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
3 grid_search.fit(df.drop('target', axis=1), df['target'])
4
5 best_params = grid_search.best_params_
6 best_score = grid_search.best_score_
7
8 print("Best Hyperparameters:", best_params)
9 print("Best Score (Accuracy):", best_score)

Best Hyperparameters: {'max_depth': 4, 'min_samples_leaf': 1, 'min_samples_split': 4}


Best Score (Accuracy): 0.9666666666666668

q2

1 import pandas as pd
2
3 df = pd.read_csv('creditcard.csv')
4
5 print("Dataset Shape:", df.shape)
6 print("Statistical Summary:")
7 print(df.describe())
8

Dataset Shape: (284807, 31)


Statistical Summary:
Time V1 V2 V3 V4 \
count 284807.000000 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean 94813.859575 1.168375e-15 3.416908e-16 -1.379537e-15 2.074095e-15
std 47488.145955 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00
min 0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00
25% 54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01
50% 84692.000000 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02
75% 139320.500000 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01
max 172792.000000 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01

V5 V6 V7 V8 V9 \
count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean 9.604066e-16 1.487313e-15 -5.556467e-16 1.213481e-16 -2.406331e-15
std 1.380247e+00 1.332271e+00 1.237094e+00 1.194353e+00 1.098632e+00
min -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01
25% -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01
50% -5.433583e-02 -2.741871e-01 4.010308e-02 2.235804e-02 -5.142873e-02
75% 6.119264e-01 3.985649e-01 5.704361e-01 3.273459e-01 5.971390e-01
max 3.480167e+01 7.330163e+01 1.205895e+02 2.000721e+01 1.559499e+01

... V21 V22 V23 V24 \


count ... 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05
mean ... 1.654067e-16 -3.568593e-16 2.578648e-16 4.473266e-15
std ... 7.345240e-01 7.257016e-01 6.244603e-01 6.056471e-01
min ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00
25% ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01
50% ... -2.945017e-02 6.781943e-03 -1.119293e-02 4.097606e-02
75% ... 1.863772e-01 5.285536e-01 1.476421e-01 4.395266e-01
max ... 2.720284e+01 1.050309e+01 2.252841e+01 4.584549e+00

V25 V26 V27 V28 Amount \


count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 284807.000000
mean 5.340915e-16 1.683437e-15 -3.660091e-16 -1.227390e-16 88.349619
std 5.212781e-01 4.822270e-01 4.036325e-01 3.300833e-01 250.120109
min -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01 0.000000
25% -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02 5.600000
50% 1.659350e-02 -5.213911e-02 1.342146e-03 1.124383e-02 22.000000
75% 3.507156e-01 2.409522e-01 9.104512e-02 7.827995e-02 77.165000
max 7.519589e+00 3.517346e+00 3.161220e+01 3.384781e+01 25691.160000

Class
count 284807.000000
mean 0.001727
std 0.041527
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 1.000000

[8 rows x 31 columns]

1 X = df.drop('Class', axis=1)
2 y = df['Class']
3
4 print("Features Shape (X):", X.shape)

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 2/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory
5 print("Target Shape (y):", y.shape)
6
Features Shape (X): (284807, 30)
Target Shape (y): (284807,)

1 import seaborn as sns


2 import matplotlib.pyplot as plt
3
4 sns.countplot(x='Class', data=df)
5 plt.title('Distribution of Target Variable')
6 plt.show()
7

1 from sklearn.tree import DecisionTreeClassifier


2
3 dt_model = DecisionTreeClassifier()

1 from sklearn.model_selection import RepeatedStratifiedKFold


2
3 rkf = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=1)

1 from sklearn.model_selection import cross_val_score


2 from sklearn.metrics import roc_auc_score
3
4 roc_auc_scores = cross_val_score(dt_model, X, y, cv=rkf, scoring='roc_auc')
5 print("ROC-AUC Scores:", roc_auc_scores)
6 print("Mean ROC-AUC:", roc_auc_scores.mean())
7

ROC-AUC Scores: [0.87728723 0.85693183 0.89766022 0.88736821 0.88747373 0.8797362


0.91970103 0.8467805 0.8568263 0.87735757]
Mean ROC-AUC: 0.8787122830612331

1
2 dt_model_balanced = DecisionTreeClassifier(class_weight="balanced")
3
4 roc_auc_scores_balanced = cross_val_score(dt_model_balanced, X, y, cv=rkf, scoring='roc_auc')
5 print("Balanced Class ROC-AUC Scores:", roc_auc_scores_balanced)
6 print("Mean ROC-AUC (Balanced Class):", roc_auc_scores_balanced.mean())
7

Balanced Class ROC-AUC Scores: [0.87726965 0.84671016 0.8976954 0.86718867 0.86725901 0.85985931
0.91980655 0.85700217 0.87741033 0.89766021]
Mean ROC-AUC (Balanced Class): 0.8767861446135539

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 3/4
15/09/2023, 15:44 E21CSEU0962_Lab4.ipynb - Colaboratory

https://colab.research.google.com/drive/18ZAzdOE2FZtW5RKLs5wsWGw5syPidvh-#printMode=true 4/4

You might also like