0% found this document useful (0 votes)
53 views4 pages

20MIS1025 - DecisionTree - Ipynb - Colaboratory

The document discusses building and visualizing a decision tree model for classification. It loads and preprocesses a dataset, splits it into training and test sets, builds a decision tree classifier, plots the decision regions and tree structure, and exports the tree to a graphic file. Standardization, stratification, and limiting the maximum depth are used in building the model.

Uploaded by

Sandip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views4 pages

20MIS1025 - DecisionTree - Ipynb - Colaboratory

The document discusses building and visualizing a decision tree model for classification. It loads and preprocesses a dataset, splits it into training and test sets, builds a decision tree classifier, plots the decision regions and tree structure, and exports the tree to a graphic file. Standardization, stratification, and limiting the maximum depth are used in building the model.

Uploaded by

Sandip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

8/23/23, 11:40 PM 20MIS1025_DecisionTree.

ipynb - Colaboratory

Importing the libraries

from IPython.display import Image
%matplotlib inline

import os
from pathlib import Path
import pandas as pd
df = pd.read_csv("KDD_Train.csv")
df.replace(('normal','anomaly'), (0,1), inplace=True)

import warnings
warnings.filterwarnings('ignore')
import numpy as np

X = df.iloc[:,[4,9]].values
y=df.iloc[:, -1].values

print('Class labels:',np.unique(y))

Class labels: [0 1]

df.isnull().sum().sort_values(ascending=False).head()

duration 0
dst_host_count 0
srv_count 0
serror_rate 0
srv_serror_rate 0
dtype: int64

70% & 30% Splitting into Training & Testing Data

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1, stratify = y)

print('Labels count in y', np.bincount(y))
print('Labels count in y_train', np.bincount(y_train))
print('Labels count in y_test', np.bincount(y_test))

Labels count in y [67343 58630]


Labels count in y_train [47140 41041]
Labels count in y_test [20203 17589]

Standardizing the features:

from sklearn.preprocessing import StandardScaler                                        #normalisation - converting values in the range 0
                                                                                        #all values are normalised
sc = StandardScaler()                                                                   #mean and standard deviation
sc.fit(X_train)                                                                         #fit against standard scalar                  con
X_train_std = sc.transform(X_train)                                                     #transformation of both training and test data is
X_test_std = sc.transform(X_test)

from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

def plot_decision_regions(X, y, classifier, test_idx=None, resolution=0.02):

    # setup marker generator and color map
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    # plot the decision surface
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution))
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1, xx2, Z, alpha=0.3, cmap=cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())

https://colab.research.google.com/drive/1FB1tRB0tXqyCLvBpmguU9gQbm0Ki-Plo#scrollTo=paC5OhBLFmoJ&printMode=true 1/4
8/23/23, 11:40 PM 20MIS1025_DecisionTree.ipynb - Colaboratory

    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0],
                    y=X[y == cl, 1],
                    alpha=0.8,
                    c=colors[idx],
                    marker=markers[idx],
                    label=cl,
                    edgecolor='black')

    # highlight test examples
    if test_idx:
        # plot all examples
        X_test, y_test = X[test_idx, :], y[test_idx]

        plt.scatter(X_test[:, 0],
                    X_test[:, 1],
                    c='',
                    edgecolor='black',
                    alpha=1.0,
                    linewidth=1,
                    marker='o',
                    s=100,
                    label='test set')

Decision tree learning

Building a decision tree

from sklearn.tree import DecisionTreeClassifier

tree_model = DecisionTreeClassifier(criterion='gini',max_depth=2,random_state=1)
tree_model.fit(X_train, y_train)

X_combined = np.vstack((X_train, X_test))
y_combined = np.hstack((y_train, y_test))
#plot_decision_regions(X_combined, y_combined,classifier=tree_model,test_idx=range(105, 150))

plt.xlabel('petal length [cm]')
plt.ylabel('petal width [cm]')
plt.legend(loc='upper left')
plt.tight_layout()
#plt.savefig('images/03_20.png', dpi=300)
plt.show()

WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that a

https://colab.research.google.com/drive/1FB1tRB0tXqyCLvBpmguU9gQbm0Ki-Plo#scrollTo=paC5OhBLFmoJ&printMode=true 2/4
8/23/23, 11:40 PM 20MIS1025_DecisionTree.ipynb - Colaboratory
from sklearn import tree

tree.plot_tree(tree_model)
#plt.savefig('images/03_21_1.pdf')
plt.show()

!pip3 install pydotplus

Requirement already satisfied: pydotplus in /usr/local/lib/python3.10/dist-packages (2.0.2)


Requirement already satisfied: pyparsing>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from pydotplus) (3.1.1)

!conda install python-graphviz

/bin/bash: line 1: conda: command not found

!pip install graphviz

Requirement already satisfied: graphviz in /usr/local/lib/python3.10/dist-packages (0.20.1)

!pip3 install pydotplus

Requirement already satisfied: pydotplus in /usr/local/lib/python3.10/dist-packages (2.0.2)


Requirement already satisfied: pyparsing>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from pydotplus) (3.1.1)

from pydotplus import graph_from_dot_data
from sklearn.tree import export_graphviz

dot_data = export_graphviz(tree_model,
                           filled=True,
                           rounded=True,
                           class_names=['Setosa',
                                        'Versicolor',
                                        'Virginica'],
                           feature_names=['petal length',
                                          'petal width'],
                           out_file=None)
graph = graph_from_dot_data(dot_data)
graph.write_png('tree.png')

True

https://colab.research.google.com/drive/1FB1tRB0tXqyCLvBpmguU9gQbm0Ki-Plo#scrollTo=paC5OhBLFmoJ&printMode=true 3/4
8/23/23, 11:40 PM 20MIS1025_DecisionTree.ipynb - Colaboratory

check 0s completed at 11:36 PM

https://colab.research.google.com/drive/1FB1tRB0tXqyCLvBpmguU9gQbm0Ki-Plo#scrollTo=paC5OhBLFmoJ&printMode=true 4/4

You might also like