0% found this document useful (0 votes)

41 views95 pages

5+6 Classification

The document summarizes lectures on classification using decision trees. It introduces classification problems and techniques, including decision trees. Decision trees represent a learned classification function as a tree structure, with internal nodes testing attributes, branches splitting on attribute values, and leaf nodes assigning classes. The tree is constructed by splitting training data into purer subgroups based on attribute tests, to minimize classification error.

Uploaded by

Hamed Rokni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views95 pages

5+6 Classification

Uploaded by

Hamed Rokni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

Fakultt fr Elektrotechnik und Informatik

Institut fr Verteilte Systeme

Fachgebiet Wissensbasierte Systeme (KBS)

Data Mining I
Summer semester 2017

Lecture 5 & 6: Classification

Lectures: Prof. Dr. Eirini Ntoutsi
Exercises: Tai Le Quy and Damianos Melidis
Recap from previous lecture

Apriori improvements
FPGrowth
Compact forms of frequent itemsets
Closed frequent itemsets
Maximal frequent itemsets
FIM and ARM beyond binary, assymetric data
Categorical data
Continuous data

Data Mining I: Classification 1 & 2 2

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 3

The classification problem

Given:
a dataset of instances D={t1,t2,,tn} and
a set of classes C={c1,,ck}
classification is the task of learning a target function/ mapping f:DC that assigns each ti to a cj.
The mapping or target function is known as the classification model.

ID Age Car type Risk

1 23 Familie high
2 17 Sport high
3 43 Sport high
4 68 Familie low
5 32 LKW low

Predictor attributes: Age, Car type Class attribute: risk={high, low}

Data Mining I: Classification 1 & 2 4
A supervised learning task

Classification is a supervised learning task ID Age Car type Risk

Supervision: The training data (observations, 1 23 Familie high
measurements, etc.) are accompanied by labels 2 17 Sport high
3 43 Sport high
indicating the class of the observations
4 68 Familie low
New data is classified based on the training set 5 32 LKW low

Class attribute: risk={high, low}

Clustering is an unsupervised learning task ID Age Car type Risk

1 23 Familie high
The class labels of training data is unknown
2 17 Sport high
Given a set of measurements, observations, etc., 3 43 Sport high
the goal is to group the data into groups of 4 68 Familie low
5 32 LKW low
similar data (clusters)

Data Mining I: Classification 1 & 2 5

Applications

Credit approval
Classify bank loan applications as e.g. safe or risky.
Fraud detection
e.g., in credit cards
Churn prediction
E.g., in telecommunication companies
Target marketing
Is the customer a potential buyer for a new computer?
Medical diagnosis
Character recognition

Data Mining I: Classification 1 & 2 6

Classification techniques

Decision trees
Bayesian classifiers
Neural networks
Nearest neighbors
Support vector machines
Boosting
Bagging
Random forests
.
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

Data Mining I: Classification 1 & 2 7

General approach for building a classification model

Data Mining I: Classification 1 & 2 8

General approach for building a classification model

Data Mining I: Classification 1 & 2 9

General approach for building a classification model

Model

Data Mining I: Classification 1 & 2 10

General approach for building a classification model

Model

Testing set
Data Mining I: Classification 1 & 2 11
General approach for building a classification model

Model

Testing set
Data Mining I: Classification 1 & 2 12
General approach for building a classification model

Different learning algorithms or classifiers:

Decision trees
kNN
Neural networks
SVMs

Model

Testing set
Data Mining I: Classification 1 & 2 13
General approach for building a classification model

Different learning algorithms or classifiers:

Decision trees
kNN
Neural networks
SVMs

Model

Induction: makes broad generalizations from specific observations

- Generates new theory emerging from the data

Testing set
Data Mining I: Classification 1 & 2 14
General approach for building a classification model

Different learning algorithms or classifiers:

Decision trees
kNN
Neural networks
SVMs

Model

Induction: makes broad generalizations from specific observations

- Generates new theory emerging from the data
Deduction: from general to specific
Testing set - Tests the theory

Data Mining I: Classification 1 & 2 15

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 16

Decision tree (DTs) classifiers
Training set
One of the most popular classification methods

DTs are included in many commercial systems nowadays

Easy to interpret, human readable, intuitive

Simple and fast methods

Many algorithms have been proposed

ID3 (Quinlan 1986)

C4.5 (Quinlan 1993)

CART (Breiman et al 1984)

Data Mining I: Classification 1 & 2 17
Representation 1/2

The learned function is represented by a decision tree!

Representation
Each internal node specifies a test of some predictor attribute
Each branch descending from a node corresponds to one of the possible values for this attribute
Attribute test
Each leaf node assigns a class label
Training set
Attribute value

Class value

Decision trees classify instances by sorting them down the tree from the root to some leaf node, which provides
the classification of the instance.

Data Mining I: Classification 1 & 2 18

Representation 2/2

Decision trees represent a disjunction of conjunctions of constraints on the attribute values of the
instances
Each path from the root to a leaf node, corresponds to a conjunction of attribute tests
The tree corresponds to a disjunction of these conjunctions, i.e., (... ... ... ) (... ... ... ) ...
We can translate each path into IF-THEN rules (human readable)

IF ((Outlook = Sunny) ^ (Humidity = Normal)),

THEN (Play tennis=Yes)

IF ((Outlook = Rain) ^ (Wind = Weak)),

THEN (Play tennis=Yes)

Data Mining I: Classification 1 & 2 19

How to build a decision tree?

Training set

What decisions do I have to make in order to build a decision tree?

Data Mining I: Classification 1 & 2 20

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

manner
At start, all the training examples are at the root node

#14

Data Mining I: Classification 1 & 2 21

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

manner
At start, all the training examples are at the root node
The question is Which attribute should be tested at the root?
Attributes are evaluated using some statistical measure, which determines how
well each attribute alone classifies the training examples.

#14

Outlook, Temperature,
Humidity or Wind?

Data Mining I: Classification 1 & 2 22

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

Data Mining I: Classification 1 & 2 23

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

For each possible value of the test attribute, a descendant of the root node
is created and the instances are mapped to the appropriate descendant
node.

Data Mining I: Classification 1 & 2 24

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

For each possible value of the test attribute, a descendant of the root node Sunny Overcast Rain
is created and the instances are mapped to the appropriate descendant
#5 #4 #5
node.

Data Mining I: Classification 1 & 2 25

The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

For each possible value of the test attribute, a descendant of the root node Sunny Overcast Rain
is created and the instances are mapped to the appropriate descendant
#5 #4 #5
node.
The procedure is repeated for each descendant node, so instances are
partitioned recursively. Temperature, Humidity or Wind?
Data Mining I: Classification 1 & 2 26
The basic decision tree learning algorithm

Basic algorithm (ID3, Quinlan 1986) Training set

The tree is constructed in a top-down recursive divide-and-conquer

For each possible value of the test attribute, a descendant of the root node
is created and the instances are mapped to the appropriate descendant
node.
The procedure is repeated for each descendant node, so instances are
partitioned recursively.
Data Mining I: Classification 1 & 2 27
The basic decision tree learning algorithm

Pseudocode Training set

Data Mining I: Classification 1 & 2 28

The basic decision tree learning algorithm

Pseudocode Training set

When do we stop partitioning?

All samples for a given node belong to the same class
There are no remaining attributes for further partitioning
majority voting for classifying the leaf

Data Mining I: Classification 1 & 2 29

The basic decision tree learning algorithm

Pseudocode Training set

When do we stop partitioning?

All samples for a given node belong to the same class
There are no remaining attributes for further partitioning
majority voting for classifying the leaf

Data Mining I: Classification 1 & 2 30

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 31

Which attribute is the best?

Which attribute to choose for splitting? A1 or A2?

Data Mining I: Classification 1 & 2 32

Which attribute is the best?

Which attribute to choose for splitting? A1 or A2?

The goal is to select the attribute that is most useful for classifying examples.
By useful we mean that the resulting partitioning is as pure as possible
A partition is pure if all its instances belong to the same class.

Data Mining I: Classification 1 & 2 33

Which attribute is the best?

Which attribute to choose for splitting? A1 or A2?

The goal is to select the attribute that is most useful for classifying examples
help us be more certain about the class after the split
we would like the resulting partitioning to be as pure as possible
A partition is pure if all its instances belong to the same class.

Different attribute selection measures

Information gain, gain ratio, gini index,
all based on the degree of impurity of the parent (before splitting) vs the children nodes (after splitting)
Data Mining I: Classification 1 & 2 34
Entropy for measuring impurity of a set of instances

Let S be a collection of positive and negative examples for a binary classification problem, C={+, -}.
p+: the percentage of positive examples in S
p-: the percentage of negative examples in S
Entropy measures the impurity of S:
Entropy( S ) p log 2 ( p ) p log 2 ( p )

Entropy = 0, when all members belong to the same class

Used in ID3 (Quinlan, 1986)

It uses entropy, a measure of pureness of the data
The information gain Gain(S, A) of an attribute A relative to a collection of examples S measures the
entropy reduction in S due to splitting on A:

| Sv |
Gain( S , A) Entropy( S )
vValues( A) | S |
Entropy( Sv )

Before splitting After splitting on A

information Gain measures the expected reduction in entropy due to splitting on A

The attribute with the higher entropy reduction is chosen for splitting

Data Mining I: Classification 1 & 2 41

Repeat recursively

Training set

Which attribute should we choose for splitting here?

Which attribute is chosen?

Data Mining I: Classification 1 & 2 49

Attribute selection measure: Information Gain

Information gain is biased towards attributes with a large number of distinct values.
| Sv |
Consider unique identifiers like ID or credit card Gain( S , A) Entropy( S )
vValues( A) | S |
Entropy( Sv )

Such attributes have a high information gain, because they uniquely identify each instance, but we
do not want to include them in the decision tree
E.g., deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we
haven't seen before.

Measures have been proposed that correct this issue

Quinlan suggested information gain in his ID3 system and later the gain ratio, both based on entropy.
Gini index

Data Mining I: Classification 1 & 2 50

Lecture 6 starts here

Data Mining I: Classification 1 & 2 51

Attribute selection measure: Gain ratio

C4.5 (a successor of ID3) uses gain ratio to overcome the problem, which normalizes the gain by
splitting on A by the split information of A:
Measures the information
Gain(S, A) w.r.t. classification
GainRatio(S, A)
SplitInfo(S, A)
Measures the information
generated by splitting S into
|Values(A)|partitions

Data Mining I: Classification 1 & 2 52

Attribute selection measure: Gain ratio

Data Mining I: Classification 1 & 2 53

Attribute selection measure: Gain ratio

Gain( S , A) Entropy( S )
| Sv |
Entropy( Sv ) Entropy(S ) pi log 2 ( pi )
vValues( A) | S | i 1

Data Mining I: Classification 1 & 2 54

Attribute selection measure: Gain ratio

High split info: partitions have more or less the same size (uniform)
Low split info: few partitions hold most of the tuples (peaks)
If an attribute produces many splits high SplitInfo()low GainRatio().
This is the case for e.g., the ID attribute

The attribute with the maximum gain ratio is selected as the splitting attribute

Attribute selection measure: Gini Index

Used in CART (Breiman et al, 1984)

Let a dataset S containing examples from k classes. Let pj be the probability of class j in S. The Gini
Index of S is given by:
k
Gini(S ) 1 p 2j
j 1
Gini index considers a binary split for each attribute
If S is split based on attribute A into two subsets S1 and S2 :
|S1| |S |
Gini (S , A) Gini (S1) 2 Gini (S2 )
|S | |S |
Reduction in impurity:

Gini(S , A) Gini(S ) Gini(S , A)

The attribute A that provides the smallest Gini(S,A) (or the largest reduction in impurity) is chosen to
split the node

Data Mining I: Classification 1 & 2 60

Attribute selection measure: Gini Index a small example

Let D has 14 instances

9 of class buys_computer = yes
5 in buys_computer = no

The Gini Index of D is:

k 2 2
Gini( D) 1 p 2j 9 5
Gini( D) 1 0.459
j 1 14 14

Data Mining I: Classification 1 & 2 61

Attribute selection measure: Gini Index for non-binary data

Gini index considers a binary split for each attribute

How to find the binary splits?
For a categorical attribute A, we consider all possible subsets that can be formed by values of A (next slides)
For a numerical attribute A, we find the split points of A (next slides)

Data Mining I: Classification 1 & 2 62

Attribute selection measure: Gini index for categorical attributes

Let the categorical attribute Income = {low, medium, high} . How can we convert it
into a binary attribute?

Data Mining I: Classification 1 & 2 63

Attribute selection measure: Gini index for categorical attributes

Let the categorical attribute Income = {low, medium, high} .

To generate the binary splits for Income, we check all possible subsets:
({low,medium} and {high})
({low,high} and {medium})
({medium,high} and {low})

For each subset, we check the Gini Index of setting up a split in that subset
Gini{low,medium} and {high}(D) = ?
Gini{low,high} and {medium}(D) = ?
Gini{medium,high} and {low}(D) = ?

The split that provides the smallest Gini(S,Asplit) (or the largest reduction in impurity) is chosen to
split the node

Data Mining I: Classification 1 & 2 64

Attribute selection measure: Gini index for categorical attributes

For each subset, we check the Gini Index:

For example, ({low,medium} and {high}) split result in D1 (#10 instances: 6+, 4-) and D2( #4 instances: 1+,3-)

10 4
Gini{low,medium}and {high} ( D) Gini( D1 ) Gini( D2 )
14 14

k
Gini(S ) 1 p 2j
j 1
|S | |S |
Gini (S , A) 1 Gini (S1) 2 Gini (S2 )
For the remaining binary split partitions: |S | |S |

Gini(S , A) Gini(S ) Gini(S , A)

Which split should we take?

Data Mining I: Classification 1 & 2 65

Attribute selection measure: Gini index for categorical attributes

For each subset, we check the Gini Index:

For example, ({low,medium} and {high}) split result in D1 (#10 instances: 6+, 4-) and D2( #4 instances: 1+,3-)

10 4
Gini{low,medium}and {high} ( D) Gini( D1 ) Gini( D2 )
14 14

For the remaining binary split partitions:

So, the best binary split for income is on ({medium, high} and {low})

Data Mining I: Classification 1 & 2 66

Attribute selection measure: Gini index for numerical attributes

Let attribute A be a continuous-valued attribute

Must determine the best split point t for A
Sort the values of A in increasing order
Identify adjacent examples that differ in their target classification
Typically, every such pair suggests a potential split threshold t = (ai+ai+1)/2
Select threshold t that yields the best value of the splitting criterion.

t=(48+60)/2=54 t =(80+90)/2=85

2 potential thresholds: Temperature>54, Temperature >85

Compute the attribute selection measure (e.g. information gain) for both
Choose the best (Temperature>54 here)

Data Mining I: Classification 1 & 2 67

Attribute selection measure: Gini index for numerical attributes

Let t be the threshold chosen from the previous step

Create a boolean attribute based on A and threshold t with two possible outcomes: yes, no
S1 is the set of tuples in S satisfying (A >t), and S2 is the set of tuples in S satisfying (A t)

Temperature>54 Temperature

How it looks yes no or >54 54

An example of a tree for the

play tennis problem when
attributes Humidity and Wind
are continuous

Data Mining I: Classification 1 & 2 68

Comparing Attribute Selection Measures

The three measures, are commonly used and in general, return good results but
Information gain Gain(S,A):
biased towards multivalued attributes

Gain ratio GainRatio(S,A) :

tends to prefer unbalanced splits in which one partition is much smaller than the others

Gini index:
biased to multivalued attributes
has difficulty when # of classes is large
tends to favor tests that result in equal-sized partitions and purity in both partitions

Other measures also exist

most previously published empirical results concluded that it is not possible to decide which one of the two tests to
prefer, Theoretical Comparison between the Gini Index and Information Gain Criteria, Raileanu and Stoffel, 2004.
https://link.springer.com/article/10.1023/B:AMAI.0000018580.96245.c6

Data Mining I: Classification 1 & 2 69

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 70

Training vs generalization errors

The errors of a classifier are divided into

Training errors (or resubstitution error or apparent
error):
errors commited in the training set
Generalization errors:
the expected error of the model on previously unseen
examples

A good classifier must

1. Fit the training data &
2. Accurately classify records never seen before
i.e., a good model low training error & low
generalization error

Data Mining I: Classification 1 & 2 72

Model overfitting

Model overfitting
A model that fits the training data well (low training error) but has a poor generalization power (high
generalization error

Consider a hypothesis h

Consider the error of h over:

The training set: errortrain(h)

The entire distribution D of the data: errorD(h)

Hypothesis h overfits training data if there is an alternative hypothesis h in H such that:

Data Mining I: Classification 1 & 2 73

Decision trees overfitting

An induced tree may overfit the training data Training set

Too many branches, some may reflect anomalies due to noise or

outliers
Very good performance in the training (already seen) samples
Poor accuracy for unseen samples
Example
D15 Sunny Hot Normal Strong No
Let us add a noisy/outlier training example (D15) to the training
set
How the earlier tree (built upon training examples D1-D14) would
be effected?

Data Mining I: Classification 1 & 2 74

Underfitting & Overfitting

The training error can be decreased by increasing the model complexity

But, a complex, tailored to the training data model, will also have a high generalization error
How the error in both training and test data
evolves with the tree complexity

Data Mining I: Classification 1 & 2 75

Underfitting & Overfitting

The training error can be decreased by increasing the model complexity

But, a complex, tailored to the training data model, will also have a high generalization error
How the error in both training and test data
evolves with the tree complexity

The model has yet to learn the true The model overspecializes
structure from the training data. to the training data

Model underfitting Model overfitting

Data Mining I: Classification 1 & 2 76

Potential causes of model overfitting

Overfitting due to presence of noise

Overfitting due to lack of representative samples

Data Mining I: Classification 1 & 2 77

Overfitting due to presence of noise

The decision boundary is distorted by the noise point.

Data Mining I: Classification 1 & 2 78

Overfitting due to presence of noise an example

Training set
(* stands for missclassified instances)

Test set

M2
Training error: 20%
Test error: 10%

M1
Training error: 0
Test error: 30%
Data Mining I: Classification 1 & 2 79
Overfitting due to lack of representative samples

Lack of data points in the lower half of the diagram makes it difficult to predict correctly the class
labels of that region
Insufficient number of training records in the region causes the decision tree to predict the test examples
using other training records that are irrelevant to the classification task
Data Mining I: Classification 1 & 2 80
Avoiding overfitting in Decision Trees

Two approaches to avoid overfitting:

Pre-pruning: Stop growing the tree when data split not statistically significant
do not split a node if this would result in the goodness measure falling below a threshold

Difficult to choose an appropriate threshold

Post-pruning: Grow full tree, then prune it

Get a sequence of progressively pruned trees

How to select best pruned tree?

Measure performance over training data

Measure performance over a separate validation dataset

Add complexity penalty to performance measure

Data Mining I: Classification 1 & 2 82

Reduced-error pruning

Split data into training and validation set

Do until further pruning is harmful
Evaluate impact on validation set of pruning each possible node (plus those below it)
Greedily remove the one that most improves the performance on the validation set

Data Mining I: Classification 1 & 2 83

Effect of reduced-error pruning?

How the error in both training and test data evolves with the tree complexity; with and without
pruning

Data Mining I: Classification 1 & 2 84

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 85

Hypothesis space search in decision tree learning

In classification we want to learn a target function/ mapping f:DC

In case of decision trees, f is represented by a decision tree
Hypothesis space: set of possible decision trees
Search method:
hill-climbing
from simple to complex (top-down)
Only a single current hypothesis is maintained
No backtracking: split attributes are fixed
Local minima risk
Greedy approach
Evaluation function: Information gain
Batch learning: use all training data
Data Mining I: Classification 1 & 2 86
Inductive bias in decision tree learning

Inductive bias: the set of assumptions that, together with the training data, deductively justify the
classifications assigned by the learner to future instances.
What is the policy by which ID3 generalizes from observed training examples to classify unseen
instances.
Inductive bias in ID3: It chooses the first acceptable tree it encounters in its simple-to-complex, hill
climbing search through the space of possible trees.
shorter trees are preferred over larger trees.
trees that place high information gain attributes close to the root are preferred over those that do not.
The inductive bias of ID3 follows from its search strategy (search bias or preference bias)

Data Mining I: Classification 1 & 2 87

Why prefer shorter hypothesis?

Occams Razor: Prefer the simplest hypothesis that fits the data
Scientists seem to do that: Physicists, for example, prefer simple explanations for the motions of the
planets, over more complex explanations.
Argument:
Since there are fewer short hypotheses than long ones, it is less likely that one will find a short hypothesis
that coincidentally fits the training data
In contrast there are often many very complex hypotheses that fit the current training data but fail to
generalize correctly to subsequent data.

Data Mining I: Classification 1 & 2 88

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 89

Decision tree decision boundaries

DTs partition the feature space into axis-parallel hyper-rectangles, and label each rectangle with one of the class labels.

These rectangles are called decision regions

Decision boundary: the border line between two neighboring regions of different classes

Data Mining I: Classification 1 & 2 90

Decision tree decision boundaries

Data Mining I: Classification 1 & 2 91

Decision tree decision boundaries

Data Mining I: Classification 1 & 2 92

When to consider decision trees

Instances are represented by attribute-value pairs

Instances are represented by a fixed number of attributes, e.g. outlook, humidity, wind and their values, e.g.
(wind=strong, outlook =rainy, humidity=normal)
The easiest situation for a DT is when attributes take a small number of disjoint possible values, e.g. wind={strong, weak}
There are extensions for numerical attributes also, e.g. temperature, income.

The class attribute has discrete output values

Usually binary classification, e.g. {yes, no}, but also for more class values, e.g. {pos, neg, neutral}

The training data might contain errors

DTs are robust to errors: both errors in the class values of the training examples and in the attribute values of these
examples

The training data might contain missing values

DTs can be used even when some training examples have some unknown attribute values

Data Mining I: Classification 1 & 2 93

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 94

Reading material

This lecture (Decision trees) reading material:

Chapter 3: Decision tree learning, Machine Learning book by Tom Mitchel
Chapter 4: Classification, Introduction to Data Mining book by Tan et al

Next lecture reading material:

Evaluation of classifiers Section 4.5, Tan et al book
Lazy learners KNN Section 5.2, Tan et al book
Bayesian classifiers Section 5.3, Tan et al book

Data Mining I: Classification 1 & 2 95

Outline

Classification basics
Decision tree classifiers
Splitting attributes
Hypothesis space
Decision tree decision boundaries
Overfitting
Reading material
Things you should know from this lecture

Data Mining I: Classification 1 & 2 96

Outline

Decision tree learning

Measures for attribute selection
Overfitting
Prunning

Data Mining I: Classification 1 & 2 97

UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Classification
No ratings yet
Classification
148 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Classification
No ratings yet
Classification
52 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
DM Lect 9 - Classification - Decision Trees
No ratings yet
DM Lect 9 - Classification - Decision Trees
39 pages
Unit 4
No ratings yet
Unit 4
78 pages
Chap4 - Basic - Classification - Class Teaching
No ratings yet
Chap4 - Basic - Classification - Class Teaching
168 pages
Week 8 - Understanding The Decision Tree
No ratings yet
Week 8 - Understanding The Decision Tree
28 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
05 Classification
No ratings yet
05 Classification
33 pages
Unit-IV Classification Part 1
No ratings yet
Unit-IV Classification Part 1
38 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
DAMI 011114a
No ratings yet
DAMI 011114a
48 pages
اﺪﺧ مﺎﻨﺑ ﯽﻟﺮﺘﻨﮐ ﯽﻔﯿﮐ يﺎﻫﺪﻨﯾاﺮﻓ To-Be: Bpmn Process Model (Lane - Task - Gateway)
No ratings yet
اﺪﺧ مﺎﻨﺑ ﯽﻟﺮﺘﻨﮐ ﯽﻔﯿﮐ يﺎﻫﺪﻨﯾاﺮﻓ To-Be: Bpmn Process Model (Lane - Task - Gateway)
1 page
7 Classification
100% (3)
7 Classification
63 pages
Data Mining
No ratings yet
Data Mining
68 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Machine Learning - Conceptual Clustering - 4/27/2019
No ratings yet
Machine Learning - Conceptual Clustering - 4/27/2019
11 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Unit 4
No ratings yet
Unit 4
27 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Classification
No ratings yet
Classification
81 pages
Turing Machine
No ratings yet
Turing Machine
39 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Lecture 04
No ratings yet
Lecture 04
44 pages
Lecture11 Ch8 ClassBasic Part1
No ratings yet
Lecture11 Ch8 ClassBasic Part1
38 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Chapter 6. Decision Tree Classification
No ratings yet
Chapter 6. Decision Tree Classification
19 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Module 04
No ratings yet
Module 04
75 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Notes DAA
No ratings yet
Notes DAA
15 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
06 Classification
No ratings yet
06 Classification
32 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Data Structure (Search Algorthm)
No ratings yet
Data Structure (Search Algorthm)
14 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
52 pages
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
No ratings yet
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
17 pages
Unit 6 (Sorting
No ratings yet
Unit 6 (Sorting
46 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Siv UNIT-3 Classification DWM PART-A
No ratings yet
Siv UNIT-3 Classification DWM PART-A
12 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
68 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Deadlock Prevention, Avoidance, and Detection
No ratings yet
Deadlock Prevention, Avoidance, and Detection
29 pages
Fast Fourier Transform
No ratings yet
Fast Fourier Transform
14 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
Type Checking
No ratings yet
Type Checking
17 pages
Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks
No ratings yet
Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks
11 pages
Guideline On Scenario Development For (Distributed) Simulation Environments
No ratings yet
Guideline On Scenario Development For (Distributed) Simulation Environments
88 pages
Unit-4-Spanning Tree
No ratings yet
Unit-4-Spanning Tree
12 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Finding Optimal Patrol Routing
No ratings yet
Finding Optimal Patrol Routing
18 pages
7.classification After
No ratings yet
7.classification After
51 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
13.2 - Polynomial and Rational Functions - Finding X and y Intercepts Given A Polynomial Function
No ratings yet
13.2 - Polynomial and Rational Functions - Finding X and y Intercepts Given A Polynomial Function
2 pages
Operations Management: William J. Stevenson
No ratings yet
Operations Management: William J. Stevenson
19 pages
Dsa 2022
No ratings yet
Dsa 2022
3 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
47 pages
IES (ECE) Previous Years Solutions
No ratings yet
IES (ECE) Previous Years Solutions
5 pages
Max-Flow Min-Cut Problems
No ratings yet
Max-Flow Min-Cut Problems
9 pages
Traveling Salesman Problem
No ratings yet
Traveling Salesman Problem
11 pages
EEE - 321: Signals and Systems Lab Assignment 5
No ratings yet
EEE - 321: Signals and Systems Lab Assignment 5
4 pages
Us and Eu Regulatory Competition and Authentication Standards in Electronic Commerceq
No ratings yet
Us and Eu Regulatory Competition and Authentication Standards in Electronic Commerceq
22 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Adaptive Hypermedia Solution
No ratings yet
Adaptive Hypermedia Solution
6 pages
Pohon
No ratings yet
Pohon
25 pages
User Modeling and Personalization: Exercise 3: Bayesian Networks
No ratings yet
User Modeling and Personalization: Exercise 3: Bayesian Networks
5 pages
Linked List: Dept. of Computer Science Faculty of Science and Technology
No ratings yet
Linked List: Dept. of Computer Science Faculty of Science and Technology
12 pages
Tesla Story Part 1
No ratings yet
Tesla Story Part 1
1 page
Application of Linear Programming Problem in Manufacturing of Local Soap
No ratings yet
Application of Linear Programming Problem in Manufacturing of Local Soap
5 pages
Journal of Prime Research in Mathematics Vol. 14 (2018), 51-61
No ratings yet
Journal of Prime Research in Mathematics Vol. 14 (2018), 51-61
11 pages
Notes - Data Structure - Algorithm
No ratings yet
Notes - Data Structure - Algorithm
65 pages
Home Project 2017
No ratings yet
Home Project 2017
3 pages
User Modeling and Personalization 2: AEHS & Stereotypes
No ratings yet
User Modeling and Personalization 2: AEHS & Stereotypes
3 pages
04 Exercise Entropy
No ratings yet
04 Exercise Entropy
3 pages
Business Planning For Digital Libraries: International Approaches
No ratings yet
Business Planning For Digital Libraries: International Approaches
3 pages
User Modeling and Personalization 1: Adaptive Hypermedia
No ratings yet
User Modeling and Personalization 1: Adaptive Hypermedia
2 pages
01 ICT Law of EU Teaching Program v01
No ratings yet
01 ICT Law of EU Teaching Program v01
2 pages
02 Structure of Essay
No ratings yet
02 Structure of Essay
2 pages
Dsa 07 T Shankar
No ratings yet
Dsa 07 T Shankar
4 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Exercise 1: Assignment 1: Introducing Postgresql
No ratings yet
Exercise 1: Assignment 1: Introducing Postgresql
1 page
Overview of Electronic Signature Law in The EU
No ratings yet
Overview of Electronic Signature Law in The EU
3 pages
Oracle Self-Service E-Billing On Demand For Consumers: Reduce Costs and Improve Customer Loyalty
No ratings yet
Oracle Self-Service E-Billing On Demand For Consumers: Reduce Costs and Improve Customer Loyalty
3 pages
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
No ratings yet
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
7 pages
Computer Oriented Numerical Methods
No ratings yet
Computer Oriented Numerical Methods
2 pages