Decision Tree Part 1
Decision Tree Part 1
BY
Mohd Vaseem
M.Tech IIT Delhi
Pursuing PhD IIT Kanpur
(Assistant Professor , NIFT Panchukla)
Classification
• A form of data analysis that extracts model or classifier to
predict class labels
– class labels categorical (discrete or nominal)
– classifies data based on training set and values in a
classifying attribute, and uses it in classifying new data
• Numeric Prediction
– models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical applications
– Credit/loan approval: loan application is “safe” or “risky”
– Medical diagnosis: tumor is “cancerous” or “benign”
– Fraud detection: transaction is “fraudulent”
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: Training data is accompanied by labels
indicating the class of the observations
– New data is classified based on the training set
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 6
Classification
Linearly Separable Not Linearly Separable
Decision Trees
Divides the feature space by Internal nodes
axes aligned decision boundaries branch (test on attributes)
Each rectangular region is (outcome of the test)
Leaf node
(class label)
Pure Node
Idea:
1. use counts at leaves to define probability distributions, so we
can measure uncertainty
2. a good attribute splits the examples into subsets that are
(ideally) pure