Lecture 8
Lecture 8
computational
Tools for
4170201
Classification
2
•Classification is (supervised learning): a
form of data analysis that extracts models escribing
important data classes.
Classification
Training
Algorithms
Data
Classifier
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
T om A ssistant P rof 2 no Tenured?
M erlisa A ssociate P rof 7 no
G eorge P rofessor 5 yes
Joseph A ssistant P rof 7 yes
8
Classification
- The primary task performed by classifiers is to assign
labels to objects.
- Labels in classifiers are pre-determined unlike in
clustering where we discover the structure and assign
labels.
- Classifier problems are supervised learning methods.
9
Classification Basic concepts:
Decision Trees
10
Decision Trees
• Decision Trees are a flexible method very commonly deployed in
classification applications.
simply a classification.
13
Root &C
parent
node
childe
branches
14
Trees
Gender
Female Male
Branch – outcome of test
19
• Branches refer to the outcome of a decision .When
the decision is numerical, the “greater than” branch
is usually shown on the right and “less than” on the
left.
21
Advantages of Decision Trees
• Easy to understand.
• Map nicely to a set of production rules.
• Applied to real problems.
• Able to process both numerical and
categorical data.
Disadvantages of Decision Trees
• Output attribute must be categorical.
• Limited to one output attribute.
• Decision tree algorithms are unstable( slight
variations in the training set can results in
different attribute selections).
• Trees created from numeric datasets can be
complex as attribute splits for numeric data are
typically binary)
From Trees to rules
Decision trees can be nicely mapped to a set of production rules ─
one advantage of DTs
Whether water is
present or not?
27
Check Your Knowledge
Your Thoughts?