0% found this document useful (0 votes)

1 views51 pages

Classification lecture 1

Chapter 6 discusses classification and prediction in data mining, defining classification as predicting categorical labels and prediction as modeling continuous values. It outlines the processes of model construction and usage, the importance of data preparation, and evaluates classification methods based on accuracy, speed, and interpretability. The chapter also covers decision tree induction, Bayesian classification, and issues like overfitting and scalability in large databases.

Uploaded by

agents0209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views51 pages

Classification lecture 1

Uploaded by

agents0209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Chapter 6.

Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 1

Classification vs. Prediction
 Classification
 predicts categorical class labels (discrete or nominal)

 classifies data (constructs a model) based on the

training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
 Prediction
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications
 Credit approval

 Target marketing

 Medical diagnosis

 Fraud detection

January 27, 2015 Data Mining: Concepts and Techniques 2

Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction is training set

 The model is represented as classification rules, decision trees,

or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model

 The known label of test sample is compared with the

classified result from the model

 Accuracy rate is the percentage of test set samples that are

correctly classified by the model

 Test set is independent of training set, otherwise over-fitting

will occur
 If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
January 27, 2015 Data Mining: Concepts and Techniques 3
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
IF rank = ‘professor’
D ave A ssistant P rof 6 no
OR years > 6
A nne A ssociate P rof 3 no
THEN tenured = ‘yes’
January 27, 2015 Data Mining: Concepts and Techniques 4
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
T om A ssistant P rof 2 no Tenured?
M erlisa A ssociate P rof 7 no
G eorge P rofessor 5 yes
Joseph A ssistant P rof 7 yes
January 27, 2015 Data Mining: Concepts and Techniques 5
Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
January 27, 2015 Data Mining: Concepts and Techniques 6
Issues: Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data

January 27, 2015 Data Mining: Concepts and Techniques 7

Issues: Evaluating Classification Methods

 Accuracy
 classifier accuracy: predicting class label

 predictor accuracy: guessing value of predicted

attributes
 Speed
 time to construct the model (training time)

 time to use the model (classification/prediction time)

 Robustness: handling noise and missing values

 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model

 Other measures, e.g., goodness of rules, such as decision

tree size or compactness of classification rules
January 27, 2015 Data Mining: Concepts and Techniques 8
Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

<=30 high no fair no
This <=30 high no excellent no
31…40 high no fair yes
follows an >40 medium no fair yes
example >40 low yes fair yes
of >40 low yes excellent no
31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
ID3 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

January 27, 2015 Data Mining: Concepts and Techniques 9

Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

January 27, 2015 Data Mining: Concepts and Techniques 10

Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
 There are no samples left
January 27, 2015 Data Mining: Concepts and Techniques 11
Algorithm for Decision Tree
Induction
 Algorithm: Generate_decision_tree.Generate a
decision tree from the given data
 Input: The training samples , represented by
discrete valued attributes, the set of candidate
attributes, attribute_list.
 Output: A decision tree.
 Method:
(1) create a node N;
(2) if samples are all of the same class , C then
(3) return N as a leaf node labelled with the
class C;
January 27, 2015 Data Mining: Concepts and Techniques 12
Algorithm for Decision Tree
Induction

(4) if attribute_list is empty then

(5) return N as a leaf node labelled with the most
common class in samples;// majority voting
(6) select test_attribute , the attribute among
attribute_list with the highest information
gain;
(7) label node N with the test_attribute;
(8) for each known value ai of test attribute // partition the
samples

(9) grow a branch from node N for the condition test_attribute= ai

(10) let si be the set of samples in samples for which test attribute = ai
January 27, 2015 Data Mining: Concepts and Techniques 13
Algorithm for Decision Tree
Induction

(11) if si is empty then

(12) attach a leaf labelled with the most
common class in samples ;
(13) else attach a node returned by
Generate_decision_tree
(si,attribute_list,test_attribute)

January 27, 2015 Data Mining: Concepts and Techniques 14

15
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple
in D: m
Info( D)   pi log 2 ( pi )
i 1

 Information needed (after using A to split D into v

partitions) to classify D: v |D |
InfoA ( D)    I (D j )
j

j 1 | D |

 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)
January 27, 2015 Data Mining: Concepts and Techniques 16
Attribute Selection: Information Gain

 Class P: buys_computer = “yes” age pi ni I(pi, ni)

 Class N: buys_computer = “no”
<=30 2 3 0.971
Info( D)  I (9,5)  
9 9
log 2 ( ) 
5 5
log 2 ( ) 0.940 31…40 4 0 0
14 14 14 14
>40 3 2 0.971
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 low yes fair yes
>40 medium yes fair yes Gain(income)  0.029
<=30 medium yes excellent yes
31…40 medium no excellent yes Gain( student)  0.151
31…40 high yes fair yes
>40 medium no excellent no Gain(credit _ rating)  0.048
17
Gini index (CART, IBM IntelligentMiner)
 Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
2 2
9 5
gini( D)  1        0.459
 14   14 
 Suppose the attribute income partitions D into 10 in D1: {low,
medium} and 4 in D2 gini  10  4
income{low, medium} ( D )   Gini( D1 )   Gini( D1 )
 14   14 

but gini{medium,high} is 0.30 and thus the best since it is the lowest
 All attributes are assumed continuous-valued
 May need other tools, e.g., clustering, to get the possible split values
 Can be modified for categorical attributes

January 27, 2015 Data Mining: Concepts and Techniques 25

Overfitting and Tree Pruning

 Overfitting:
 Overfitting results in decision trees that are more
complex than necessary
 An induced tree may overfit the training data
 Too many branches, some may reflect anomalies due to noise or
outliers
 Poor accuracy for unseen samples
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold
 Difficult to choose an appropriate threshold

January 27, 2015 Data Mining: Concepts and Techniques 27

Post pruning

– Trim the nodes of the decision tree in a

bottom-up fashion
– If generalization error improves after trimming,
replace sub-tree by a leaf node.
– Class label of leaf node is determined from
majority class of instances in the sub-tree

January 27, 2015 Data Mining: Concepts and Techniques 28

Classification in Large Databases

 Classification—a classical problem extensively studied by

statisticians and machine learning researchers
 Scalability: Classifying data sets with millions of examples
and hundreds of attributes with reasonable speed
 Why decision tree induction in data mining?
 relatively faster learning speed (than other classification
methods)
 convertible to simple and easy to understand
classification rules
 can use SQL queries for accessing databases
 comparable classification accuracy with other methods

January 27, 2015 Data Mining: Concepts and Techniques 30

Scalable Decision Tree
Induction Methods in Data
Mining Studies
 SLIQ
 builds an index for each attribute and only class list and

the current attribute list reside in memory.

 Handles disk resident data sets using disk resident
attribute list and memory resident class list.
 Memory restriction is there when the training set is tool

large.
 When a class list becomes too large performance of

SLIQ decreases.
 SPRINT
 constructs an attribute list data structure .

 SPRINT removes all memory restrictions.

 Designed to be easily parallelized.

January 27, 2015 Data Mining: Concepts and Techniques 31
Scalable Decision Tree Induction
Methods in Data Mining Studies
 PUBLIC
 integrates tree splitting and tree pruning: stop growing

the tree earlier

 RainForest
 separates the scalability aspects from the criteria that

determine the quality of the tree

 builds an AVC-list (attribute, value, class label)

 Rain forest report a speed up over SPRINT.

January 27, 2015 Data Mining: Concepts and Techniques 32

Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured
January 27, 2015 Data Mining: Concepts and Techniques 33
Bayesian Theorem: Basics

 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that
the hypothesis holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (posteriori probability), the probability of observing
the sample X, given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
January 27, 2015 Data Mining: Concepts and Techniques 34
Bayesian Theorem

 Given training data X, posteriori probability of a

hypothesis H, P(H|X), follows the Bayes theorem

P(H | X)  P(X | H )P(H )

P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
January 27, 2015 Data Mining: Concepts and Techniques 35
Towards Naïve Bayesian Classifier
 Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
 Since P(X) is constant for all classes, only
P(C | X)  P(X | C )P(C )
i i i
needs to be maximized

January 27, 2015 Data Mining: Concepts and Techniques 36

Derivation of Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes): n
P( X | C i)   P( x | C i)  P( x | C i)  P( x | C i)  ... P( x | C i)
k 1 2 n
k 1
 This greatly reduces the computation cost: Only counts
the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and
standard deviation σ 
( x ) 2
1
g ( x,  ,  )  e 2 2
2 
and P(xk|Ci) is
P(X | Ci)  g ( xk , Ci , Ci )
January 27, 2015 Data Mining: Concepts and Techniques 37
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
January 27, 2015 Data Mining: Concepts and Techniques 38
Naïve Bayesian Classifier: An Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

January 27, 2015 Data Mining: Concepts and Techniques 39
Example - 2
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
An unseen sample
overcast hot high false P
rain mild high false P X = <rain, hot, high,
rain cool normal false P false>
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
January 27, 2015 Data Mining: Concepts and Techniques 40
Play-tennis example: estimating
P(xi|C)
Outlook Temperature Humidity Windy Class outlook
sunny hot high false N P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high true N
overcast hot high false P P(overcast|p) = P(overcast|n) = 0
rain mild high false P 4/9
rain cool normal false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal true N temperature
overcast cool normal true P
sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5
sunny cool normal false P P(mild|p) = 4/9 P(mild|n) = 2/5
rain mild normal false P P(cool|p) = 3/9 P(cool|n) = 1/5
sunny mild normal true P
overcast mild high true P humidity
overcast hot normal false P P(high|p) = 3/9 P(high|n) = 4/5
rain mild high true N P(normal|p) = P(normal|n) =
6/9 2/5
P(p) = 9/14
windy
P(n) = 5/14 P(true|p) = 3/9 P(true|n) = 3/5
January 27, 2015 Data Mining: Concepts and Techniques 41
Play-tennis example: classifying X

 An unseen sample X = <rain, hot, high, false>

 Sample X is classified in class n (don’t play)

January 27, 2015 Data Mining: Concepts and Techniques 42

Naïve Bayesian Classifier: Comments
 Advantages
 Easy to implement

 Good results obtained in most of the cases

 Disadvantages
 Assumption: class conditional independence, therefore

loss of accuracy
 Practically, dependencies exist among variables

 E.g., hospitals: patients: Profile: age, family history, etc.

Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve

Bayesian Classifier
 How to deal with these dependencies?
 Bayesian Belief Networks
January 27, 2015 Data Mining: Concepts and Techniques 43
Bayesian Belief Networks

 Bayesian belief network allows a subset of the variables

conditionally independent
 A graphical model of casual relationships
 Represents dependency among the variables
 Gives a specification of joint probability distribution
 Nodes: random variables
 Links: dependency
X Y  X and Y are the parents of Z, and Y is
the parent of P
Z  No dependency between Z and P
P  Has no loops or cycles
January 27, 2015 Data Mining: Concepts and Techniques 44
Bayesian Belief Network: An Example

Family The conditional probability table

Smoker
History (CPT) for variable LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

~LC 0.2 0.5 0.3 0.9
LungCancer Emphysema
CPT shows the conditional probability for each
possible combination of its parents. The CPT for
a variable Z specifies the conditional distribution
P(Z/Parents(Z)).
P(Lungcancer=“yes” | FamilyHistory = “yes” ,
PositiveXRay Dyspnea smoker=“yes”)=0.8

Bayesian Belief Networks Derivation of the probability of a particular

combination of values of X, from CPT:
n
P( x1 ,..., xn )   P( xi | Parents( X i ))
January 27, 2015 i 1 45
Chapter 6. Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 46

What Is Prediction?
 (Numerical) prediction is similar to classification
 construct a model

 use model to predict continuous or ordered value for a given input

 Prediction is different from classification

 Classification refers to predict categorical class label

 Prediction models continuous-valued functions

 Major method for prediction: regression

 model the relationship between one or more independent or

predictor variables and a dependent or response variable

 Regression analysis
 Linear and multiple regression

 Non-linear regression

 Other regression methods: generalized linear model, Poisson

regression, log-linear models, regression trees

January 27, 2015 Data Mining: Concepts and Techniques 47
Linear Regression
 Linear regression: involves a response variable y and a single
predictor variable x
y = w0 + w 1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
| D|

 (x  x )( yi  y )
w  i 1
i
w  y w x
1 | D|
0 1
 (x
i 1
i  x )2

 Multiple linear regression: involves more than one predictor variable

 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method
 Many nonlinear functions can be transformed into the above
January 27, 2015 Data Mining: Concepts and Techniques 48
Regression - Example
 Table shows a set of X Y
paired data where X is Years Salary (in $
Experience 1000s)
the number of years of 3 30
work experience of a 8 57
college graduate and y 9 64
is the corresponding 13 72
salary of the graduate. 3 36
6 43
 Y = 23.6 + 3.5X 11 59
 Predict the salary for a 21 90
graduate with 10 yrs of 1 20
experience. 16 83

 Y = 58.6$
January 27, 2015 Data Mining: Concepts and Techniques 49
Nonlinear Regression
 Some nonlinear models can be modeled by a polynomial
function
 A polynomial regression model can be transformed into
linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
 Other functions, such as power function, can also be
transformed to linear model
 Some models are intractable nonlinear (e.g., sum of
exponential terms)
 possible to obtain least square estimates through

extensive calculation on more complex formulae

January 27, 2015 Data Mining: Concepts and Techniques 50
Chapter 6. Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 51

Evaluating the Accuracy of a Classifier
or Predictor (I)
 Holdout method
 Given data is randomly partitioned into two independent sets

 Training set (e.g., 2/3) for model construction

 Test set (e.g., 1/3) for accuracy estimation

Derive Estimate
Training Classifier Accuracy
set

Data

Test set
 Random sampling: a variation of holdout
 Repeat holdout k times, accuracy = avg. of the accuracies

obtained
January 27, 2015 Data Mining: Concepts and Techniques 52
Evaluating the Accuracy of a Classifier
or Predictor (I)
 Cross-validation (k-fold, where k = 10 is most popular)
 Randomly partition the data into k mutually exclusive subsets,

each approximately equal size

 At i-th iteration, use Di as test set and others as training set

 The accuracy estimate =

Overall number of correct classifications from the k iterations

Total number of samples in the initial data

 Leave-one-out: k folds where k = # of tuples, for small sized data

 Stratified cross-validation: folds are stratified so that class dist. in

each fold is approx. the same as that in the initial data.

January 27, 2015 Data Mining: Concepts and Techniques 53

Ensemble Methods: Increasing the Accuracy

 Ensemble methods
 Use a combination of models to increase accuracy

 Combine a series of k learned models, M1, M2, …, Mk,

with the aim of creating an improved model M*

 Popular ensemble methods
 Bagging: averaging the prediction over a collection of

classifiers
 Boosting: weighted vote with a collection of classifiers

 Ensemble: combining a set of heterogeneous classifiers

January 27, 2015 Data Mining: Concepts and Techniques 55

Bagging: Boostrap Aggregation
 Analogy: Diagnosis based on multiple doctors’ majority vote
 Training
 Given a set D of d tuples, at each iteration i, a training set Di of d

tuples is sampled with replacement from D (i.e., boostrap)

 A classifier model Mi is learned for each training set Di

 Classification: classify an unknown sample X

 Each classifier Mi returns its class prediction

 The bagged classifier M* counts the votes and assigns the class

with the most votes to X

 Prediction: can be applied to the prediction of continuous values by
taking the average value of each prediction for a given test tuple
 Accuracy
 Often significant better than a single classifier derived from D

 For noise data: not considerably worse, more robust

 Proved improved accuracy in prediction

January 27, 2015 Data Mining: Concepts and Techniques 56
Boosting
 Analogy: Consult several doctors, based on a combination of weighted
diagnoses—weight assigned based on the previous diagnosis accuracy
 How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier Mi is learned, the weights are updated to allow the
subsequent classifier, Mi+1, to pay more attention to the training
tuples that were misclassified by Mi
 The final M* combines the votes of each individual classifier, where
the weight of each classifier's vote is a function of its accuracy
 The boosting algorithm can be extended for the prediction of
continuous values
 Comparing with bagging: boosting tends to achieve greater accuracy,
but it also risks overfitting the model to misclassified data
January 27, 2015 Data Mining: Concepts and Techniques 57
Classifier Accuracy Measures and Confusion
matrix
 t_pos (Eg “cancer samples” that were correctly

classified as such)
 t_neg (“not_cancer” samples that were
correctly classified as such)
 False positives (“not_cancer” samples that were

incorrectly labeled as “cancer”)

 False negative(“cancer” samples that were
incorrectly labeled as “not_cancer”)
 pos is the number of positive C1 C2
samples C1 t_pos f_neg
 neg is the number of negative C2 f_pos t_neg
samples
January 27, 2015 Data Mining: Concepts and Techniques 58
Classifier Accuracy Measures

classes buy_computer = yes buy_computer = no total recognition(%)

buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
 Accuracy of a classifier M, acc(M): percentage of test set tuples that are
correctly classified by the model M
 Error rate (misclassification rate) of M = 1 – acc(M)

 Given m classes, CMi,j, an entry in a confusion matrix, indicates #

of tuples in class i that are labeled by the classifier as class j

 Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate */
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
 This model can also be used for cost-benefit analysis

January 27, 2015 Data Mining: Concepts and Techniques 59

Predictor Error Measures
 Measure predictor accuracy: measure how far off the predicted value is
from the actual known value
 Loss function: measures the error betw. yi and the predicted value yi’
 Absolute error: | yi – yi’|
 Squared error: (yi – yi’)2
 Test error (generalization error):
d
the average loss over the test
d
set
 Mean absolute error: | y
i 1
i  yi ' | Mean squared error: ( y  y ')
i 1
i i
2

d d
d

 Relative absolute error:  | y

i  yi ' |
Relative squared error:  ( yi  yi ' ) 2
i 1
i 1
d d
| y
i 1
i y|  ( y  y)
i 1
i
2

The mean squared-error exaggerates the presence of outliers

Popularly use (square) root mean-square error, similarly, root relative
squared error
January 27, 2015 Data Mining: Concepts and Techniques 60
January 27, 2015 Data Mining: Concepts and Techniques 61

Immediate Download The Art of Feature Engineering: Essentials For Machine Learning 1st Edition Pablo Duboue Ebooks 2024
100% (5)
Immediate Download The Art of Feature Engineering: Essentials For Machine Learning 1st Edition Pablo Duboue Ebooks 2024
52 pages
Nutritional Assessment of Critically Ill Patients
No ratings yet
Nutritional Assessment of Critically Ill Patients
8 pages
WaLIDD Score a New Tool to Diagnose Dysmenorrhea A
No ratings yet
WaLIDD Score a New Tool to Diagnose Dysmenorrhea A
11 pages
dm4
No ratings yet
dm4
68 pages
USP-NF 1251 Weighing On An Analytical Balance
100% (1)
USP-NF 1251 Weighing On An Analytical Balance
6 pages
Detecting Fake Images On Social Media Using Machine Learning
No ratings yet
Detecting Fake Images On Social Media Using Machine Learning
7 pages
Kx15dtam PDF
No ratings yet
Kx15dtam PDF
2 pages
Edge Detection and Color Mapping Based Diabetic Retinopathy From Fundus Images
No ratings yet
Edge Detection and Color Mapping Based Diabetic Retinopathy From Fundus Images
5 pages
3 Month BCPS Study Schedule
No ratings yet
3 Month BCPS Study Schedule
8 pages
Clinical Reasoning
No ratings yet
Clinical Reasoning
60 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Grip Strength Measured by Manual Muscle Testing Lacks Diagnostic Accuracy
No ratings yet
Grip Strength Measured by Manual Muscle Testing Lacks Diagnostic Accuracy
4 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
Synt BKR 23-07-10 11 en (Salmonella)
No ratings yet
Synt BKR 23-07-10 11 en (Salmonella)
139 pages
Classfication and Prediction
No ratings yet
Classfication and Prediction
133 pages
Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Lecture 3-5 - Analyzing Contingency Tables: Azadeh Alimadad. DANA 4820 Jan 17 - 24, 2022
No ratings yet
Lecture 3-5 - Analyzing Contingency Tables: Azadeh Alimadad. DANA 4820 Jan 17 - 24, 2022
25 pages
Module 4
No ratings yet
Module 4
99 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
CH 5
No ratings yet
CH 5
81 pages
A Neuro-Fuzzy Classifier For Customer Churn Prediction: Hossein Abbasimehr Mostafa Setak M. J. Tarokh
No ratings yet
A Neuro-Fuzzy Classifier For Customer Churn Prediction: Hossein Abbasimehr Mostafa Setak M. J. Tarokh
7 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Assignment A
0% (3)
Assignment A
2 pages
HLK-LD2410 Serial Communication Protocol V1.02
No ratings yet
HLK-LD2410 Serial Communication Protocol V1.02
20 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Adnexal Mass- Role of Serum Biomarkers in Diagnosing Epithelial Carcinoma of The
No ratings yet
Adnexal Mass- Role of Serum Biomarkers in Diagnosing Epithelial Carcinoma of The
30 pages
7 Class
No ratings yet
7 Class
72 pages
dm 3
No ratings yet
dm 3
37 pages
Multi-Objective Optimization Algorithms For Intrusion Detection in IoT Networks A Systematic Review
No ratings yet
Multi-Objective Optimization Algorithms For Intrusion Detection in IoT Networks A Systematic Review
10 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
05 Classification
No ratings yet
05 Classification
79 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
Lecture 4
No ratings yet
Lecture 4
79 pages
Classification
No ratings yet
Classification
45 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
Chapter 6 Classification and Prediction25.10.13
No ratings yet
Chapter 6 Classification and Prediction25.10.13
43 pages
7 Class
No ratings yet
7 Class
72 pages
LECTURE 8
No ratings yet
LECTURE 8
81 pages
MEHLMANMEDICAL Biostatistics Review 4
100% (1)
MEHLMANMEDICAL Biostatistics Review 4
60 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Unit V Classification
No ratings yet
Unit V Classification
69 pages
Project Report ML Team 3-1
No ratings yet
Project Report ML Team 3-1
37 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
AL Notes
No ratings yet
AL Notes
61 pages
A Revised Method For Identifying Dyslexia
No ratings yet
A Revised Method For Identifying Dyslexia
17 pages
Clinical Chemistry Ratio Corpuz
No ratings yet
Clinical Chemistry Ratio Corpuz
17 pages
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
No ratings yet
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
15 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
72 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Unit-IV Classification Part 1
No ratings yet
Unit-IV Classification Part 1
38 pages
Class Basic
No ratings yet
Class Basic
75 pages
A Data Mining Query Language
No ratings yet
A Data Mining Query Language
69 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Data Mining: UNIT-3 Classification
No ratings yet
Data Mining: UNIT-3 Classification
54 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Classification Prediction
No ratings yet
Classification Prediction
71 pages
Class Basic
No ratings yet
Class Basic
67 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Chapter 7. Classification and Prediction
No ratings yet
Chapter 7. Classification and Prediction
68 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Evaluation of Diagnostic and Screening 2020 - 21
No ratings yet
Evaluation of Diagnostic and Screening 2020 - 21
28 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
No ratings yet
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
23 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Chap 7
No ratings yet
Chap 7
71 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
C. Chemistry 1 L2 Quality Management Lecture
No ratings yet
C. Chemistry 1 L2 Quality Management Lecture
6 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
RCADS Clinical (Chorpita, Moffitt, Gray)
100% (1)
RCADS Clinical (Chorpita, Moffitt, Gray)
14 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet