Supervised Classification & Trees
Supervised Classification & Trees
Supervised Classification & Trees
Introduction
Logistic Regressions
Decision trees
Random Forest
K-nearest neighbors
Jeroen VK Rombouts
Supervised Learning: Classification
BIG DATA ANALYTICS
Decision trees
Jeroen VK Rombouts
Decision or classification trees
¡ When do we stop?
3
Classification Tree: in practice
Training
Income Set
Age
>=50 <50
1500
YES Income
>=1500 <1500
20 50 age YES Age
>=20
NO YES
Prediction
4
Binary decision tree
s2 s3
t4 t5 t6 t7
l1 l2 l3
s4
t8 t9
5 l4 l5
Decision Trees Methods
Dependent Variable
Year Method (nature and number) Splitting criterion
Maximum deviation between theoretical
1959 Belson Dichotomous dependent variable
and observed counts
Maximise "between groups" deviance,
1963 AID Continuous dependent variable
minimise "within groups" deviance
Nominal or ordinal dependent
1970 ELISEE Maximise distance betwwen child nodes
variable
Maximum explained variance & distance
1972 THAID Nominal dependent variable
between centroids
A few continuous dependent
1974 MAID-M M2 & cRM
variables
1980 CHAID Nominal dependent variable Chi-square
1980 DNP Nominal dependent variable Minimise the Bayes misclassification risk
Maximise the decrease of impurity (e.g.
1984 CART Dependent variable
Gini coefficient)
1987 RECPAM 1 or more dependent variable(s) Minimise the information loss
Two-stage splitting
1991 Dependent variable Maximise the prediction performance
algorithm
…
Conditional
2006 Dependent variable Significant (p-value) split
Inference Trees
6
Steps of a segmentation procedure
1. A set of binary questions: define, for each node, the set of
admissible splits (we start with “no node”)
5. Quality assessment of the decision rule: estimate the risk for the
associated misclassification rate or prediction error
7
Remarks
8
Supervised Learning: Classification
BIG DATA ANALYTICS
Random forests
Jeroen VK Rombouts
Random forests
6. Complex to interpret
10
Supervised Learning: Classification
BIG DATA ANALYTICS
Jeroen VK Rombouts
Extreme Gradient Boosting (XGBoost)
12
Supervised Learning: Classification
BIG DATA ANALYTICS
Jeroen VK Rombouts
K-nearest neighbors
2. For a new feature vector a, what is the predicted target value (0-1 for
standard binary classification)?
-
𝑑 𝑥( , 𝑎 = ∑*+,(𝑥(* − 𝑎* ).
4. The predicted target for a is the most common class in this set
14