Soft Computing Lab Practical Assignment 2
Soft Computing Lab Practical Assignment 2
Assignment No. 2
2.1 Title
2.2Problem Definition:
A dataset collected in a cosmetics shop showing details of customers and whether or not
they responded to a special offer to buy a new lip-stick is shown in table below. Use this
dataset to build a decision tree, with Buys as the target variable, to help in buying lip-sticks
in the future. Find the root node of decision tree. According to the decision tree you have
made from previous training data set, what is the decision for the test data: [Age < 21,
Income = Low, Gender = Female, Marital Status = Married]?
2.3Prerequisite:
2.4Software Requirements:
2.5Hardware Requirement:
2.6Learning Objectives:
Learn How to Apply Decision Tree Classifier to find the root node of decision tree.
According to the decision tree you have made from previous training data set.
2.7Outcomes:
After completion of this assignment students are able Implement code for Create Decision
tree for given dataset and find the root node for same based on the given condition.
2.8Theory Concepts:
2.8.1 Motivation
Peoples Empowerment Group
ISBM COLLEGE OF ENGINEERING, PUNE
DEPARTMENT OF COMPUTER ENGINEERING
Academic Year 2021-22
Suppose we have following plot for two classes represented by black circle and blue
squares. Is it possible to draw a single separation line? Perhaps no.
We need two lines here one separating according to threshold value of x and other for
threshold value of y. Decision Tree Classifier, repetitively divides the working area(plot)
into sub part by identifying lines. (Repetitively because there may be two distant
Peoples Empowerment Group
ISBM COLLEGE OF ENGINEERING, PUNE
DEPARTMENT OF COMPUTER ENGINEERING
Academic Year 2021-22
1. Impurity-
In above division, we had clear separation of classes. But what if we had following case?
Impurity is when we have a trace of one class division into other. This can arise due to
following reason
2. We run out of available features to divide the class upon.
We tolerate some percentage of impurity (we stop further division) for faster
performance. (There is always tradeoff between accuracy and performance).
For example, in second case we may stop our division when we have x number of
fewer number of elements left. This is also known as gini impurity.
2. Entropy
Entropy is degree of randomness of elements or in other words it is measure of
impurity. Mathematically, it can be calculated with the help of probability of the items as:
3. Information Gain
Suppose we have multiple features to divide the current working set. What feature should
we select for division? Perhaps one that gives us less impurity.
Suppose we divide the classes into multiple branches as follows, the information gain
at any node is defined as
Information Gain (n) = Entropy(x) — ([weighted average] * entropy(children
for feature)) This need a bit explanation!
Suppose we have following class to work with initially
112234445
What is the decision for the test data: [Age < 21, Income = Low, Gender = Female,
Marital Status = Married]?
Solution: A
[Hints: construct the decision tree to answer these questions]
2.10 Algorithm
Code:
#import packages
import numpy as np
import pandas as pd
import graphviz
#reading Dataset
dataset=pd.read_csv('E:/Soft Computing Practicals/Lab Practicals Assignment2
/dataset.csv')
X=dataset.iloc[:,1:-1]
Y=dataset.iloc[:,5].values
if (i==1):
print("income= High as 0, income= Medium as 2 & income= Low as 1")
if (i==2):
print("gender= Male as 1, gender= Female as 0")
if (i==3):
print("Marital Status= Single as 1, Marital Status= Married as 0")
#Creating Graph
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
dot_data = tree.export_graphviz(regressor, out_file=None, filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph.render("tree")
2.11 Output
Peoples Empowerment Group
ISBM COLLEGE OF ENGINEERING, PUNE
DEPARTMENT OF COMPUTER ENGINEERING
Academic Year 2021-22
Peoples Empowerment Group
ISBM COLLEGE OF ENGINEERING, PUNE
DEPARTMENT OF COMPUTER ENGINEERING
Academic Year 2021-22
2.12 Conclusion
We learn that how to create a Decision Tree based on given decision, Find the Root Node
of the tree using Decision tree Classifier.