Decision Tree
Decision Tree
Problem: Take the following Salary.csv data and find out the salary of an employee given the
company, job and qualification.
Dataset: salaries.csv
Based on the following dataset, we want to answer if a job has salary > 100K $ or not ?
Why we started at company? Because the entropy is less in this model. Entropy is a measure of
randomness. [We can also use gini to decide this.] When less entropy is there, we can gain high
information at every split.
Gini impurity represents how much impurities are remaining. It should be less.
Program
# let us convert the column data into numerics. this is done with LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# find accuracy
accuracy = model.score(input_n, target)
accuracy # 1.0
# predict for a person working in google as sales executive with masters degree
model.predict([[2,2,1]]) # array([0]) # less than 100K $
model.predict([[2,0,0]]) # array([1]) # >= 100k$