Chapter 6 Classification and Prediction25.10.13
Chapter 6 Classification and Prediction25.10.13
will occur
If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
October 25, 2013 Data Mining: Concepts and Techniques 4
Process (1): Model Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
M erlisa Associate Prof 7 no
G eorge Professor 5 yes
Joseph Assistant Prof 7 yes
October 25, 2013 Data Mining: Concepts and Techniques 6
Issues: Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle
missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
Accuracy
classifier accuracy: predicting class label
attributes
Speed
time to construct the model (training time)
age?
<=30 overcast
31..40 >40
no yes no yes
manner
At start, all the training examples are at the root
discretized in advance)
Examples are partitioned recursively based on selected attributes
but gini{medium,high} is 0.30 and thus the best since it is the lowest
All attributes are assumed continuous-valued
May need other tools, e.g., clustering, to get the possible split values
Can be modified for categorical attributes
methods)
convertible to simple and easy to understand
classification rules
can use SQL queries for accessing databases
- µk
x0 w0
x1 w1
∑ f
output y
xn wn
For Example
n
Input weight weighted Activation y = sign(∑ wi xi − µ k )
vector x vector w sum function i =0
Output vector
w(jk +1) = w(jk ) + λ ( yi − yˆ i( k ) ) xij
Output layer
Hidden layer
wij
Input layer
Input vector: X
October 25, 2013 Data Mining: Concepts and Techniques 24
How A Multi-Layer Neural Network Works?
Instance-based learning:
Store training examples and delay the processing
(“lazy evaluation”) until a new instance must be
classified
Typical approaches
k-nearest neighbor approach
space.
Locally weighted regression
Case-based reasoning
based inference
October 25, 2013 Data Mining: Concepts and Techniques 30
The k-Nearest Neighbor Algorithm
All instances correspond to points in the n-D space
The nearest neighbor are defined in terms of
Euclidean distance, dist(X1, X2)
Target function could be discrete- or real- valued
For discrete-valued, k-NN returns the most common
value among the k training examples nearest to xq
Vonoroi diagram: the decision surface induced by 1-
NN for a typical set of training examples
_
_
_ _ .
+
_
. +
xq +
. . .
_
October 25, 2013
+ .
Data Mining: Concepts and Techniques 31
Discussion on the k-NN Algorithm
Non-linear regression
∑ (x i − x )( y i − y )
w = i =1 w = y −w x
1 |D|
0 1
∑ (x
i =1
i − x )2
Measure predictor accuracy: measure how far off the predicted value is
from the actual known value
Loss function: measures the error betw. yi and the predicted value yi’
Absolute error: | yi – yi’|
Squared error: (yi – yi’)2
Test error (generalization error):
d
the average loss over the test
d
set
Mean absolute error: ∑| y
i =1
i − yi ' | Mean squared error: ∑(y
i =1
i − yi ' ) 2
d d
d
i − y i '|
Relative squared error:
∑ ( yi − yi ' ) 2
i =1
i =1
d d
∑|y
i =1
i − y |
∑(y
i =1
i − y)2
The mean squared-error exaggerates the presence of outliers
Popularly use (square) root mean-square error, similarly, root relative
squared error
October 25, 2013 Data Mining: Concepts and Techniques 41
Evaluating the Accuracy of a Classifier
or Predictor (I)
Holdout method
Given data is randomly partitioned into two independent sets
obtained
Cross-validation (k-fold, where k = 10 is most popular)
Randomly partition the data into k mutually exclusive subsets,