Practice 2+
Practice 2+
DECISION TREE
CLASSIFIER
• Thegoal is to create a model that predicts the
value of a target variable by learning simple
decision rules inferred from the data features.
Libraries
1. sklearn :
1. In python, sklearn is a machine learning package which include a lot of ML algorithms.
2. Here, we are using some of its modules like train_test_split, DecisionTreeClassifier and
accuracy_score.
1.NumPy :
1. It is a numeric python module which provides
fast maths functions for calculations.
2. It is used to read data in numpy arrays and for
manipulation purpose.
•Pandas :
• Used to read and write different files.
• Data manipulation can be done easily with
dataframes.
Installation of the packages :
• In Python, sklearn is the package which contains all the required packages to implement
Machine learning algorithm. You can install the sklearn package by following the
commands given below.
df = pandas.read_csv("data.csv")
print(df)
To make a decision tree, all data has to be numerical.
We have to convert the non numerical columns
'Nationality' and 'Go' into numerical values.
Pandas has a map() method that takes a dictionary with
information on how to convert the values.
{'UK': 0, 'USA': 1, 'N': 2}
Means convert the values 'UK' to 0, 'USA' to 1, and 'N' to
2.
Change string values into
numerical values:
• d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
print(df)
Feature and target columns
X = df[features]
y = df['Go']
print(X)
print(y)
Decision tree
• Now we can create the actual decision tree, fit it with our details.
Start by importing the modules we need: