Chapter 7. Classification and Prediction
Chapter 7. Classification and Prediction
Classification and
Prediction
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association rule
mining
Other Classification Methods
Prediction
Classification accuracy
Summary
12/7/21 Data Mining: Concepts and Techniques 1
Classification vs. Prediction
Classification:
predicts categorical class labels
target marketing
medical diagnosis
or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
will occur
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
12/7/21 Data Mining: Concepts and Techniques 5
Supervised vs. Unsupervised
Learning
Supervised learning (classification)
Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
12/7/21 Data Mining: Concepts and Techniques 6
Chapter 7. Classification and
Prediction
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association rule
mining
Other Classification Methods
Prediction
Classification accuracy
Summary
12/7/21 Data Mining: Concepts and Techniques 7
Issues regarding classification and
prediction (1): Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle
missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
Predictive accuracy
Speed and scalability
time to construct the model
Robustness
handling noise and missing values
Scalability
efficiency in disk-resident databases
Interpretability:
understanding and insight provded by the model
Goodness of rules
decision tree size
Tree pruning
age?
<=30 overcast
30..40 >40
no yes no yes
manner
At start, all the training examples are at the root
discretized in advance)
Examples are partitioned recursively based on selected attributes
attribute
May need other tools, such as clustering, to get the
p p n n
I ( p, n) log 2 log 2
pn pn pn pn
Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
methods)
convertible to simple and easy to understand
classification rules
can use SQL queries for accessing databases
classification-trees
Semantic interpretation problems.
Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
P(X) is constant for all classes
P(C) = relative freq of class C samples
C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
Problem: computing P(X|C) is unfeasible!
P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286
Criticism
long training time
Output vector
Err j O j (1 O j ) Errk w jk
Output nodes k
j j (l) Err j
wij wij (l ) Err j Oi
Hidden nodes Err j O j (1 O j )(T j O j )
wij 1
Oj I j
1 e
Input nodes
I j wij Oi j
i
Input vector: xi
Chapter 7. Classification and
Prediction
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by backpropagation
Classification based on concepts from association rule
mining
Other Classification Methods
Prediction
Classification accuracy
Summary
12/7/21 Data Mining: Concepts and Techniques 47
Association-Based Classification
space.
Locally weighted regression
Case-based reasoning
based inference
12/7/21 Data Mining: Concepts and Techniques 51
The k-Nearest Neighbor Algorithm
All instances correspond to points in the n-D space.
The nearest neighbor are defined in terms of
Euclidean distance.
The target function could be discrete- or real- valued.
For discrete-valued, the k-NN returns the most
common value among the k training examples nearest
to xq.
Vonoroi diagram: the decision surface induced by 1-
NN for a typical set of training examples.
_
_
_ _ .
+
_ .
+
xq + . . .
12/7/21
_ + .
Data Mining: Concepts and Techniques 52
Discussion on the k-NN Algorithm
The k-NN algorithm for continuous-valued target functions
Calculate the mean values of the k nearest neighbors
relevant attributes.
12/7/21 Data Mining: Concepts and Techniques 53
Case-Based Reasoning
Also uses: lazy evaluation + analyze similar instances
Difference: Instances are not “points in a Euclidean space”
Example: Water faucet problem in CADET (Sycara et al’92)
Methodology
Instances represented by rich symbolic descriptions
instance space
above.
Log-linear models:
The multi-way table of joint probabilities is
linear function: f ( x ) w w a ( x ) w n a n ( x )
0 1 1
minimize the squared error: distance-decreasing weight
K 1 2
E ( xq ) ( f ( x) f ( x)) K (d ( xq , x))
2 xk _nearest _neighbors _of _ xq
the gradient descent training rule:
w j K (d ( xq , x))(( f ( x) f ( x))a j ( x)
x k _ nearest _ neighbors_ of _ xq
In most cases, the target function is approximated by a
constant, linear, or quadratic function.
12/7/21 Data Mining: Concepts and Techniques 63
Prediction: Numerical Data
classifier
Learn a series of classifiers, where each
classifier in the series pays more attention to
the examples misclassified by its predecessor
Boosting requires only linear time and
constant space