0% found this document useful (0 votes)

18 views108 pages

BCA Semester VI Data Mining Module 3 (Presentation Kind of N

Uploaded by

Ishengoma Kakwezi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views108 pages

BCA Semester VI Data Mining Module 3 (Presentation Kind of N

Uploaded by

Ishengoma Kakwezi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

Association rules

Why association rules (in data mining)

Association rules are if/then statements that help uncover

relationships between seemingly unrelated data in a relational

database or other information repository.

An example of an association rule would be "If a customer buys

a dozen eggs, he is 80% likely to also purchase milk."

Market basket analysis
Rule support and confidence are two measures of rule .
Association rule are considered interesting if they
satisfy both min.support threshold and min.confidence
threshold .
Such threshold can be set by users or domain expert
An association rule has two parts, an antecedent (if) and a

consequent (then).

An antecedent is an item found in the data. A consequent is an

item that is found in combination with the antecedent.

In data mining, association rules are useful for analyzing and

predicting customer behavior. They play an important part in

shopping data analysis, product clustering, catalog design and

store layout etc….

An association rule

An itemset is a set of items.

E.g., {milk, bread, corn} is an itemset.

A k-itemset is an itemset with k items.

An association rule is about relationships between two

disjoint itemsets X and Y

X⇒Y

It presents the pattern when X occurs, Y also occurs

Use of Association Rules

Association rules do not represent any sort of

causality or correlation between the two itemsets.
X ⇒ Y does not mean X causes Y,

X ⇒ Y can be different from Y ⇒ X, unlike correlation

Association rules assist in marketing, targeted

advertising, floor planning, inventory control,…
Frequent Pattern Analysis
Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
Motivation: Finding inherent regularities in data
What products were often purchased together?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Market data analysis, cross-marketing, catalog design, sale
campaign analysis, Web log (click stream) analysis, and DNA
sequence analysis….
Basic Concepts: Frequent Patterns and Association Rules

Transaction-id Items bought Itemset X = {x1, …, xk}

10 A, B, D
Find all the rules X Y with minimum
20 A, C, D support and confidence
30 A, D, E
support, s, probability that a
40 B, E, F
transaction contains X ∪ Y
50 B, C, D, E, F
confidence, c, conditional
Customer Customer probability that a transaction
buys both buys milk
having X also contains Y
Let supmin = 50%, confmin = 50%
Association rules:
A D (60%, 100%)
Customer
buys egg
D A (60%, 75%)
Mining Association Rules

Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support ≥ minsup

2. Rule Generation
– Generate high confidence rules from each frequent
itemset, where each rule is a binary partitioning of a
frequent itemset

Frequent item set generation is still computationally

expensive
Frequent pattern mining can be classified in various ways,
based on the following criteria:

1.Based on the completeness of patterns to be mined.: closed

frequent item set, maximal frequent item set , constrained

set, approximate set etc…

2. Based on the levels of abstraction involved in the rule set eg:

i. buys(X, “computer”))=> buys(X, “HP printer”)

ii. buys(X, “laptop computer”))buys=>(X, “HP printer”)

3. Based on the number of data dimensions involved in
the rule : single dimension / multi dimension eg:
buys(X, “computer”))=>buys(X, “antivirus software”)

ii. age(X, “30: : :39”)^income(X, “42K: : :48K”))buys(X,

“high resolution TV”)
4. Based on the types of values handled in the rule: eg :
Boolean association rule/ quantitative association rule.

5. Based on the kinds of rules to be mined: eg: correlation rules

/strong gradient relationships

Eg: “The average sales from Sony Digital Camera increase over
16% when sold together with Sony Laptop Computer”: both
Sony Digital Camera and Sony Laptop Computer are siblings,
where the parent itemset is Sony.
6. Based on the kinds of patterns to be mined:
sequential pattern/ structured pattern eg : customers
may tend to first buy a PC, followed by a digital camera,
and then a memory card.
Scalable Methods for Mining Frequent Patterns

The downward closure property of frequent patterns

Any subset of a frequent itemset must be frequent
If {egg,milk,fruits} is frequent, so is {egg,milk}
i.e., every transaction having {egg,milk,fruits} also
contains {egg,milk}
Scalable mining methods: Three major approaches
Apriori
Freq. pattern growth
Vertical data format approach
Apriori: A Candidate Generation-and-Test Approach

Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!

Method:

Initially, scan DB once to get frequent 1-itemset

Generate length (k+1) candidate itemsets from length k frequent

itemsets

Test the candidates against DB

Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2
2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
Apriori Algorithm
Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k
frequent itemsets
Prune candidate itemsets containing subsets of length k
that are infrequent of each candidate by scanning the
DBCount the support
Eliminate candidates that are infrequent, leaving only
those that are frequent
Note: This algorithm makes several passes over the transaction list
Important Details of Apriori
How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
Challenges of Frequent Pattern Mining

Challenges
Multiple scans of transaction database
Huge number of candidates
Tedious workload of support counting for candidates
Improving Apriori: general ideas
Reduce passes of transaction database scans
Minimize number of candidates
Facilitate support counting of candidates
Hashing: Reduce the Number of Candidates

A k-itemset whose corresponding hashing bucket count is

below the threshold cannot be frequent

Transaction Reduction: Transactions does not contain any frequent

k-itemsets cannot contain any frequent k+1 itemset. Mark or

Remove such transaction from subsequent scan.

Partition: Scan Database Only Twice

Any item set that is potentially frequent in DB must be frequent

in at least one of the partitions of DB

Scan 1: Partition database and find local frequent patterns

Scan 2: Consolidate global frequent patterns

Sampling for Frequent Patterns

Select a sample of original database, mine frequent patterns

within sample using Apriori

Scan database once to verify frequent itemsets found in sample.

Scan database again to find missed frequent patterns in the data

set.
Dynamic Itemset Counting: Reduce Number of Scans

Data Base is portioned into blocks marked by

start points.

New candidate itemset can be added at any start

point.
Pattern-Growth Approach: Mining Frequent
Patterns Without Candidate Generation
Bottlenecks of the Apriori approach
Breadth-first (i.e., level-wise) search
Candidate generation and test
Often generates a huge number of candidates
The FPGrowth Approach
Depth-first search
Avoid explicit candidate generation
Major philosophy: Grow long patterns from short ones using local
frequent items only
“abc” is a frequent pattern
Get all transactions having “abc”, i.e., project DB on abc: DB|abc
“d” is a local frequent item in DB|abc abcd is a frequent pattern
25
Construct FP-tree from a Transaction Database
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 3
300 {b, f, h, j, o, w} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
b 3 a:3 p:1
frequency descending m 3
order, f-list p 3
m:2 b:1
3. Scan DB again, construct
F-list=f-c-a-b-m-p p:2 m:1
FP-tree
Find Patterns Having P From P-conditional Database

Starting at the frequent item header table in the FP-tree

Traverse the FP-tree by following the link of each frequent

item p

Accumulate all of transformed prefix paths of item p to form

p’s conditional pattern base

Mining Frequent Patterns With FP-trees

Method

For each frequent item, construct its conditional pattern-base,

and then its conditional FP-tree

Repeat the process on each newly created conditional FP-tree

Until the resulting FP-tree is empty, or it contains only one

path — single path will generate all the combinations of its

sub-paths, each of which is a frequent pattern

{}
Header Table
f:4 c:1
Item frequency head
f 4 c:3 b:1 b:1
c 4
a 3 a:3 p:1
b 3
m 3 m:2 b:1
p 3
p:2 m:1

Conditional (sub) pattern bases

Item Cond. pattern base Cond. FP-tree F P Generated
c f:3 f:3 fc:3
a fc:3 fc:3 fa:3, ca:3, fca:3
b fca:1, f:1, c:1
m fca:2, fcab:1 fca:3 fm:3, cm:3, am:3, fcm:3,
fam:3, cam:3, fcam:3
p fcam:2, cb:1 c:3 cp:3
Benefits of the FP-tree Structure

Completeness

Preserve complete information for frequent pattern mining

Never break a long pattern of any transaction

Compactness

Reduce irrelevant info—infrequent items are gone

Items in frequency descending order: the more frequently

occurring, the more likely to be shared

Never be larger than the original database (not count node-

links and the count field)
Step-1: Step-2:Arrange Transaction in descending order
Item Count TID List of item List of item
(Before) (After)
I1 6
T100 I1, I2, I5 I2, I2,I1,I5
I2 7
T200 I4 I2,I4
I3 6 T300 I2, I3 I2,I3
I4 2 T400 I1, I2, I4 I1, I2,I1,I4
T500 I3 I1,I3
I5 2
T600 I2, I3 I2,I3
T700 I1, I3 I1,I3
T800 I1, I2, I3, I5 I2,I1,I3,I5
T900 I1, I2, I3 I2,I1,I3
FP-TREE
Item Conditional Conditional Frequent Pattern
Pattern Base FP-tree Generated

I5 {{I2, I1:1}, {I2, (I2:2, I1:2) {I2, I5:2}, {I1,

I1, I3:1}} I5:2}, {I2, I1, I5:2}

I4 {{I2, I1:2}, (I2:2) {I2, I4:2}

{I2:1}}

I3 {{I2, I1:2}, (I2:4, I1:2), {I2, I3:4}, {I1,

{I2:2}, {I1:2}} (I1:2), I3:4}, {I2, I1, I3:2}

I1 {{I2:4}} (I2:4) {I2, I1:4}

Mining frequent itemsets using vertical
data format
Transforming the horizontal data format of the
transaction database D into a vertical data
format:
Itemset TID_set
I1 {T100, T400, T500, T700, T800, T900}
I2 {T100, T200, T300, T400, T600, T800, T900}
I3 {T300, T500, T600, T700, T800, T900}
I4 {T200, T400}
I5 {T100, T800}
Generating Association Rules from
Frequent Itemsets

Strong association rules satisfy both minimum

support and minimum confidence.
Mining Various Kinds of Association Rules

1. Mining multilevel association

2. Mining multidimensional association

3. Mining quantitative association

4. Mining interesting correlation patterns

1. Mining Multiple-Level Association Rules

Items often form hierarchies

Items at the lower level are expected to have lower
support

Food

milk bread

skim milma wheat white

Freezer Chilled
Multilevel Associatons

Mining multilevel associations can be of the following:

A top_down, progressive deepening approach:

First find high-level strong rules:

milk -> bread [20%,60%].

Then find their lower-level “weaker rules:

2% milk -> wheat bread[6%,50%].
Multilevel Associations

Mining multilevel associations can be of the following:

Variations at mining multiple-level association rules:

Level –crossed association rules:

2% milk -> Wonder wheat bread

Association rules with multiple, alternative hierarchies:

2% milk -> Wonder bread.
Multilevel Association: Uniform Support Vs Reduced
Support
Uniform Support: This means the same minimul support
for all levels.
There will be only one minimum support threshold.
No need to examine itemsets containing any item whose
ancestors do not have minimum support.
Level 1 Milk
Min_sup=5% [support = 10%]

Skim Milk
2% Milk
Level 2 [support = 4%]
[support = 6%]
Min_sup=5%
Multilevel Association: Uniform Support Vs Reduced
Support
Reduced Support: This means the minimum support will
be reduced minimum support at lower levels

Level 1 Milk
Min_sup=5% [support = 10%]

Skim Milk
2% Milk
Level 2 [support = 4%]
[support = 6%]
Min_sup=3%
Multilevel Association: Redundancy Filtering

Some rules may be redundant due to “ancestor”

relationships between items.
Consider the following example

milk -> wheat bread[support = 8%, confidence = 70%]

2%milk-> wheat bread [support = 2%, confidence = 72%]

Multilevel Association: Progressive Deepening

A top-down , progressive deepening approach can be considered

as follows:

First mine high-level frequent items:

milk(15%), bread(10%)

Then mine their lower-level “weaker” frequent itemsets:

2% milk(5%), wheat bread(4%)
2. Mining Multi-Dimensional Association

Single-dimensional rules:
buys(X, “milk”) ⇒ buys(X, “bread”)
Multi-dimensional rules: ≥ 2 dimensions or predicates
Inter-dimension assoc. rules (no repeated predicates)
age(X,”19-25”) ∧ occupation(X,“student”) ⇒ buys(X,
“coke”)
hybrid-dimension assoc. rules (repeated predicates)
age(X,”19-25”) ∧ buys(X, “popcorn”) ⇒ buys(X, “coke”)
Categorical Attributes: finite number of possible values, no
ordering among values – ie, Occupation, Brand, Colour
Quantitative Attributes: numeric values, ie, age , salary , ht, wt
etc….
3.Mining Quantitative Associations

1. Static discretization based on predefined concept

hierarchies (data cube methods)
2. Dynamic discretization based on data distribution
(quantitative rules)
3. Clustering
Static Discretization of Quantitative Attributes

Discretized prior to mining using concept hierarchy.

Numeric values are replaced by ranges.
Data cube is well suited for mining.
The cells of an n-dimensional
()

(age) (income) (buys)

cuboid correspond to the
predicate sets. (age, income) (age,buys) (income,buys)

(age,income,buys)
Quantitative Association Rules
Numeric attributes are dynamically discretized
Such that the confidence or compactness of the rules mined
is maximized
2-D quantitative association rules: Aquan1 ∧ Aquan2 ⇒ Acat
Cluster adjacent association rules
to form general rules using
a 2-D grid
Example

age(X,”34-35”) ∧ income(X,”30-50K”)
⇒ buys(X,”high resolution TV”)
Association Rule Mining to Correlation Analysis

Interestingness Measure: Correlations (Lift)

Consider following association rule
Buys(X,”Computer games”) ===> Buys(X,”videos”) [supp =
40%,conf=66%]
This rule is misleading because computer games and videos are
negatively associated.
Correlation measure used to augment the sup-conf for association rules
A===> B [support, confidence ,correlation]
Classification And
Prediction

Data Mining: Concepts and

April 13, 2021 Techniques 1
Classification

Predicts categorical class labels.

A data analysis task where a model or classifier is constructed to

predict categorical labels

Ex: a bank loan officer needs analysis of her data in order to

learn which loan applicants are “safe” and which are “risky” for
bank.

A medical researcher wants to analyze bone cancer data in order

to predict which one of three specific treatment a patient should
receive.
April 13, 2021 Data Mining: Concepts and Techniques 2
Prediction
Data analysis , predicts a continuous-valued functions, or
ordered values. .

Predicts unknown or missing values.

Regression analysis is a statistical methodology that is most

often used for numeric prediction.
Typical applications
Credit approval
Target marketing
Medical diagnosis
Fraud detection
April 13, 2021 Data Mining: Concepts and Techniques 3
Classification—A Two-Step Process

Learning step:

A classification algorithm built the classifier by analyzing or

”learning from” a training set made up of database tuple and
their associated class label.

Each tuple/sample is assumed to belong to a predefined

class, as determined by the class label attribute

The set of tuples used for model construction is training set

The model is represented as classification rules, decision

trees, or mathematical formulae
April 13, 2021 Data Mining: Concepts and Techniques 4
Model usage: In second step a model is used for classifying future
or unknown objects
Estimate accuracy of the model.
The known label of test sample is compared with the
classified result from the model.
Accuracy rate is the percentage of test set samples that are
correctly classified by the model.
Test set is independent of training set, otherwise over-fitting
will occur

If the accuracy is acceptable, use the model to classify data

tuples whose class labels are not known

April 13, 2021 Data Mining: Concepts and Techniques 5

Process (1): Model Construction

Classification
Algorithms
Training
Data

Classifier
(Model)

IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
April 13, 2021 Data Mining: Concepts and Techniques 6
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)

Tenured?

April 13, 2021 Data Mining: Concepts and Techniques 7

Supervised vs. Unsupervised Learning
Supervised learning (classification)
Supervision: The training data (observations, measurements,
etc.) are accompanied by labels indicating the class of the
observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim
of establishing the existence of classes or clusters in the data

April 13, 2021 Data Mining: Concepts and Techniques 8

Issues: Regarding classification and prediction

Preparing the data for classification and prediction

The following preprocessing steps may be applied to the data

to help improve the accuracy, efficiency, and scalability of the
classification or prediction process.

Data cleaning

Preprocess data in order to reduce noise and handle missing

values.

Eg. By replacing a missing value with the most commonly

occurring value for that attribute.
Data Mining: Concepts and Techniques 9
Issues: Regarding classification
and prediction
Relevance analysis

Many of the attribute in the data may be redundant.

Co relation analysis can be used to identify whether any two

given attributes are statistically related.

Remove the irrelevant or redundant attributes

April 13, 2021 Data Mining: Concepts and Techniques 10

Issues: Regarding classification and prediction

Data transformation

Data may be transformed by normalization, particularly when

neural network or methods involving distance measurements
are used in the learning step.

Data may be transformed by generalizing it to higher level

concepts. Concept hierarchy may used for that.

Generalize and/or normalize data

April 13, 2021 Data Mining: Concepts and Techniques 11

Issues: Evaluating Classification Methods

Classification and prediction can be compared and evaluated

according to the following criteria.

Accuracy :

The accuracy of a classifier refers to the ability of a given classifier

to correctly predict the class label of new or previously unseen
data.

The accuracy of predictor : refers to how well a given predictor

can guess value of predicted attribute for new or previously
unseen data.
April 13, 2021 Data Mining: Concepts and Techniques 12
Speed

time to construct the model (training time)

time to use the model (classification/prediction time)

Robustness: is the ability of the classifier or predictor to make

correct predictions , on handling noise and missing values

Scalability: refers to the ability to construct the classifier or

predictor efficiently given large amount of data.

Interpretability : refers to the level of understanding and insight

that is provided by the classifier or predictor.
April 13, 2021 Data Mining: Concepts and Techniques 13
Decision Tree Induction: Training Dataset

April 13, 2021 Data Mining: Concepts and Techniques 14

Output: A Decision Tree for “buys_computer”

age?

<=30
31..40
>40

student? yes credit rating?

no fair
yes excellent

no yes no yes

April 13, 2021 Data Mining: Concepts and Techniques 15

Classification by decision tree

Decision tree induction is the learning of decision from

class –labeled training tuples.
A decision tree is a tree like structure where each internal
node(non leaf node) denotes a test on an attribute , each
branch represents an outcome of the test, and each leaf
node a class label .
The top most node in a tree is the root node.
Internal node- rectangle
Leaf node –oval

April 13, 2021 Data Mining: Concepts and Techniques 16

How decision tree used for classification:

Given a tuple X for which the associated class label is

unknown , the attribute value of the tuple are tested against
the decision tree.
A path is traced from the root to a leaf node , which hold
the class prediction for that tuple. Decision tree can easily
be converted to classification rules.
The decision tree induction is so popular because the
construction of decision tree classifiers does not require any
domain knowledge or parameter setting.

April 13, 2021 Data Mining: Concepts and Techniques 17

The benefits of having a decision are as follows:

It does not require any domain knowledge

It is easy to comprehend

The learning and classification steps of a decision tree are

simple and fast.

April 13, 2021 Data Mining: Concepts and Techniques 18

Decision tree generation consists of two phases

Tree construction – At start all the training examples are

at the root. Partition examples recursively based on selected
attributes.

Tree pruning- Identify and remove branches that reflect

noise or outliers.

Use of decision tree- Classifying an unknown sample.

Also test the attribute values of the sample against decision
tree. 19
Attribute Selection Measure:
Information Gain (ID3/C4.5)
Select the attribute with the highest information gain
Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
Expected information (entropy) needed to classify a tuple
in D:

Information needed (after using A to split D into v

partitions) to classify D:

Information gained by branching on attribute A

April 13, 2021 Data Mining: Concepts and Techniques 20

Attribute Selection: Information Gain

g Class P: buys_computer = “yes”

g Class N: buys_computer = “no”

means “age <=30” has 5

out of 14 samples, with 2 yes’es
and 3 no’s. Hence

Similarly,

April 13, 2021 Data Mining: Concepts and Techniques 21

Computing Information-Gain for
Continuous-Value Attributes
Let attribute A be a continuous-valued attribute
Must determine the best split point for A
Sort the value A in increasing order
Typically, the midpoint between each pair of adjacent
values is considered as a possible split point
(ai+ai+1)/2 is the midpoint between the values of ai and ai+1
The point with the minimum expected information
requirement for A is selected as the split-point for A
Split:
D1 is the set of tuples in D satisfying A ≤ split-point, and
D2 is the set of tuples in D satisfying A > split-point
April 13, 2021 Data Mining: Concepts and Techniques 22
Gain Ratio for Attribute Selection (C4.5)

Information gain measure is biased towards attributes

with a large number of values
C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)

GainRatio(A) = Gain(A)/SplitInfo(A)
Ex.
gain_ratio(income) = 0.029/0.926 = 0.031
The attribute with the maximum gain ratio is selected as
the splitting attribute
April 13, 2021 Data Mining: Concepts and Techniques 23
Gini index (CART, IBM IntelligentMiner)
If a data set D contains examples from n classes, gini index, gini(D) is
defined as

where pj is the relative frequency of class j in D

If a data set D is split on A into two subsets D1 and D2, the gini index
gini(D) is defined as

Reduction in Impurity:

The attribute provides the smallest ginisplit(D) (or the largest reduction
in impurity) is chosen to split the node (need to enumerate all the
possible splitting points for each attribute)
April 13, 2021 Data Mining: Concepts and Techniques 24
Comparing Attribute Selection Measures

The three measures, in general, return good results but

Information gain:
biased towards multivalued attributes
Gain ratio:
tends to prefer unbalanced splits in which one
partition is much smaller than the others
Gini index:
biased to multivalued attributes
has difficulty when # of classes is large
tends to favor tests that result in equal-sized
partitions and purity in both partitions
April 13, 2021 Data Mining: Concepts and Techniques 25
Overfitting and Tree Pruning
Overfitting: An induced tree may overfit the training data
Too many branches, some may reflect anomalies due to noise or
outliers
Poor accuracy for unseen samples
Two approaches to avoid overfitting
Prepruning: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold
Difficult to choose an appropriate threshold
Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
Use a set of data different from the training data to decide which
is the “best pruned tree”

April 13, 2021 Data Mining: Concepts and Techniques 26

Enhancements to Basic Decision Tree Induction

Allow for continuous-valued attributes

Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete
set of intervals
Handle missing attribute values
Assign the most common value of the attribute
Assign probability to each of the possible values
Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
April 13, 2021 Data Mining: Concepts and Techniques 27
Bayesian Classification: Why?
A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
Foundation: Based on Bayes’ Theorem.
Performance: A simple Bayesian classifier, naïve Bayesian classifier, has
comparable performance with decision tree and selected neural
network classifiers
Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured

28
Bayesian Theorem: Basics

Let X be a data sample (“evidence”): class label is unknown

Let H be a hypothesis that X belongs to class C
Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
P(H) (prior probability), the initial probability
E.g., X will buy computer, regardless of age, income, …
P(X): probability that sample data is observed
P(X|H) (posteriori probability), the probability of observing the sample
X, given that the hypothesis holds
E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
April 13, 2021 Data Mining: Concepts and Techniques 29
Bayesian Theorem

Given training data X, posteriori probability of a hypothesis H, P(H|X),

follows the Bayes theorem

Informally, this can be written as

posteriori = likelihood x prior/evidence
Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
April 13, 2021 Data Mining: Concepts and Techniques 30
NAIVE BAYES CLASSIFIER

Naive Bayes is a kind of classifier which uses the Bayes Theorem.

It predicts membership probabilities for each class such as the

probability that given record or data point belongs to a particular
class.

The class with the highest probability is considered as the most

likely class.

This is also known as Maximum A Posteriori (P(Ci|X).

April 13, 2021 Data Mining: Concepts and Techniques 31

This can be derived from Bayes’ theorem
P(Ci|X) = P(X|Ci)P(Ci)/P(X)
Assumption

Naive Bayes classifier assumes that all the features are unrelated
to each other. Presence or absence of a feature does not
influence the presence or absence of any other feature.

April 13, 2021 Data Mining: Concepts and Techniques 32

Let’s consider a training dataset with 1500 records and 3 classes.
We presume that there are no missing values in our data. We have
We have 3 classes associated with Animal Types:
Parrot,
Dog,
Fish.

The Predictor features set consists of 4 features as

●Swim ●Wings ●GreenColor ●Dangerous Teeth.

All the features are categorical variables with either of the 2 values:
T(True) or F( False).
April 13, 2021 Data Mining: Concepts and Techniques 33
April 13, 2021 Data Mining: Concepts and Techniques 34
Parrots have 50(10%) value for Swim, i.e., 10% parrot can swim
according to our data, 500 out of 500(100%) parrots have wings,
400 out of 500(80%) parrots are Green and 0(0%) parrots have
Dangerous Teeth.

Classes with Animal type Dogs shows that 450 out of 500(90%)
can swim, 0(0%) dogs have wings, 0(0%) dogs are of Green color
and 500 out of 500(100%) dogs have Dangerous Teeth.

Classes with Animal type Fishes shows that 500 out of 500(100%)
can swim, 0(0%) fishes have wings, 100(20%) fishes are of Green
color and 50 out of 500(10%) Fishes have Dangerous Teeth.

April 13, 2021 Data Mining: Concepts and Techniques 35

Now, it’s time to work on predict classes using the Naive
Bayes model. We have taken 2 records that have values in
their feature set, but the target variable needs to predicted
Swim Wings Green Teeth

We have to predict animal type using the feature values.

We have to predict whether the animal is a Dog, a Parrot
or a Fish
April 13, 2021 Data Mining: Concepts and Techniques 36
April 13, 2021 Data Mining: Concepts and Techniques 37
April 13, 2021 Data Mining: Concepts and Techniques 38
Using IF-THEN Rules for Classification

Represent the knowledge in the form of IF-THEN rules

R: IF age = youth AND student = yes THEN buys_computer = yes
Rule antecedent/precondition vs. rule consequent

Assessment of a rule: coverage and accuracy

ncovers = # of tuples covered by R
ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers

April 13, 2021 Data Mining: Concepts and Techniques 39

Rule Extraction from a Decision Tree
age?

Rules are easier to understand than

<=30
31..40 >40
large trees
One rule is created for each path student?
yes
credit rating?

from the root to a leaf

no yes excellent fair
Rules are mutually exclusive and
exhaustive no yes yes

Example: Rule extraction from our buys_computer decision-tree

IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
SVM—Support Vector Machines
A new classification method for both linear and nonlinear data
It uses a nonlinear mapping to transform the original training data
into a higher dimension
With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”)
With an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated by a
hyperplane
SVM finds this hyperplane using support vectors (“essential”
training tuples) and margins (defined by the support vectors)

April 13, 2021 Data Mining: Concepts and Techniques 41

100

155 160 165 170 175 180

April 13, 2021 Data Mining: Concepts and Techniques 42

SVM—General Philosophy

Small Margin Large Margin

Support Vectors

April 13, 2021 Data Mining: Concepts and Techniques 43

SVM—Margins and Support Vectors

April 13, 2021 Data Mining: Concepts and Techniques 44

SVM—Linearly Separable

SVM find the optimal separating hyperplane which maximizes the

margin of the training data.
A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
For 2-D it can be written as
w0 + w1 x 1 + w2 x 2 = 0
The hyperplane defining the sides of the margin:
H 1 : w0 + w1 x 1 + w2 x 2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors

April 13, 2021 Data Mining: Concepts and Techniques 45

SVM—Linearly Inseparable

Transform the original input data into a higher dimensional

space

Search for a linear separating hyperplane in the new space

April 13, 2021 Data Mining: Concepts and Techniques 46

Lazy vs. Eager Learning
Lazy vs. eager learning
Lazy learning (e.g., instance-based learning): Simply stores training data
(or only minor processing) and waits until it is given a test tuple
Eager learning (the above discussed methods): Given a set of training
set, constructs a classification model before receiving new (e.g., test)
data to classify
Lazy: less time in training but more time in predicting
Accuracy
Lazy method effectively uses a richer hypothesis space since it uses
many local linear functions to form its implicit global approximation
to the target function
Eager: must commit to a single hypothesis that covers the entire
instance space

April 13, 2021 Data Mining: Concepts and Techniques 47

Lazy Learner: Instance-Based Methods

Instance-based learning:
Store training examples and delay the processing (“lazy
evaluation”) until a new instance must be classified
Typical approaches
k-nearest neighbor approach
Instances represented as points in a Euclidean space.
Locally weighted regression
Constructs local approximation
Case-based reasoning
Uses symbolic representations and knowledge-based inference

April 13, 2021 Data Mining: Concepts and Techniques 48

The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D space

The nearest neighbor are defined in terms of Euclidean distance,

dist(X1, X2)

Target function could be discrete- or real- valued

For discrete-valued, k-NN returns the most common value

among the k training examples nearest to xq

April 13, 2021 49

k-NN Algorithm

1. Calculate “d(x,xi)” i=1,2,….,n; where d denotes the Euclidean

distance between the points.
2. Arrange the calculated n Euclidean distances in non decreasing
order.
3. Let k be a +ve integer, take the first k distances from this sorted
list.
4. Find those k-points corresponding to these k-distances.
5. Let ki denotes the number of points belonging to the ith class
among k points i.e.k>=0
6. If ki>kj for all i!=j, put x in class i.

April 13, 2021 Data Mining: Concepts and Techniques 50

Example

If k=3(green inner circle) it is assigned to triangles .

If k=5(yellow circle) it is assigned to squares

April 13, 2021 Data Mining: Concepts and Techniques 51

Choosing the right value for k

Run KNN algorithm several times with different values of

K and choose the K that reduces the number of errors .
As we decrease the value of K to 1, our predictions
become less stable.
Inversely, as we increase the value of K, our predictions
become more stable due to majority voting /averaging.
An increase in errors occur when the value of K is too
high.
In cases to take a majority vote among labels make k an
odd number to have a tiebreaker.

April 13, 2021 Data Mining: Concepts and Techniques 52

Advantages
The algorithm is simple and easy to implement.
There’s no need to build a model, tune several
parameters, or make additional assumptions.
The algorithm is versatile. It can be used for
classification, regression and search.
Disadvantages
The algorithm gets significantly slower as the number
of examples and/or predictors/independent variables
increase.

April 13, 2021 Data Mining: Concepts and Techniques 53

What Is Prediction?
(Numerical) prediction is similar to classification
construct a model using training set.
use model to predict continuous or ordered value for a given input
Prediction is different from classification
Classification refers to predict categorical class label
Prediction models continuous-valued functions
Major method for prediction: regression
model the relationship between one or more independent or
predictor variables and a dependent or response variable
Regression analysis
Linear regression
Multiple regression
Non-linear regression

April 13, 2021 Data Mining: Concepts and Techniques 54

Linear Regression

Linear regression: involves a response variable y and a single

predictor variable x

y = w0 + w1 x

where w0 (y-intercept) and w1 (slope) are regression coefficients

Method of least squares: estimates the best-fitting straight line as the

one that minimizes the error between the actual data and estimate of

the line.

Data Mining: Concepts and Techniques

Example X years Y salary(in
The given 2 D data can be graphed experience 1000s)
on a scatter plot as follows: 3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83

April 13, 2021 Data Mining: Concepts and Techniques 56

The plot suggests a linear relationship between the two
variables, x and y. model the relationship of salary with
number of years of work with the equation y=w0+w1x
mean of x=9.1 and mean of y =55.4 in case of above
data.
Substitute in above equation w1=3.5 and w0=23.6
Equation of the least squares line is estimated by
y=23.6+3.5x
We can predict the salary of a person with 10 years of
experience is 58,600/- with above eqation

April 13, 2021 Data Mining: Concepts and Techniques 57

Multiple linear regression: involves more than one
predictor variable
Training data is of the form (X1, y1), (X2, y2),…, (X|D|,
y|D|)
Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2
x2
Solvable by extension of least square method

April 13, 2021 Data Mining: Concepts and Techniques 58

Nonlinear Regression
Many nonlinear functions can be transformed into the
above equation.
Some nonlinear models can be modeled by a polynomial function.

A polynomial regression model can be transformed into linear

regression model. For example,

y = w0 + w1 x + w2 x2 + w3 x3

convertible to linear with new variables: x2 = x2, x3= x3

y = w0 + w1 x + w2 x2 + w3 x3
This can be solved using method of least squares

Time Series Analysis by State Space Methods
100% (10)
Time Series Analysis by State Space Methods
369 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Structural and System Reliability
0% (1)
Structural and System Reliability
2 pages
3 Schervish-1995
100% (1)
3 Schervish-1995
718 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Bayesian Analysis of Time Series - Broemeling L. D. (CRC 2019) (1st Ed.)
100% (5)
Bayesian Analysis of Time Series - Broemeling L. D. (CRC 2019) (1st Ed.)
293 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Unit 2
No ratings yet
Unit 2
65 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Week 3
No ratings yet
Week 3
56 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Unit 3
No ratings yet
Unit 3
62 pages
Unit II
No ratings yet
Unit II
22 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Association
No ratings yet
Association
40 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Literature Survey On Various Frequent Pattern Mining Algorithm
No ratings yet
Literature Survey On Various Frequent Pattern Mining Algorithm
7 pages
DM 2
No ratings yet
DM 2
71 pages
CH - 5
No ratings yet
CH - 5
43 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
From Everand
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
Manish Soni
No ratings yet
How To Develop A World-Class Business Strategy in The Age of Disruption V2
No ratings yet
How To Develop A World-Class Business Strategy in The Age of Disruption V2
34 pages
Book Review: Bayesian Statistics The Fun Way: Understanding Statistics and Probability With Star Wars, Lego, and Rubber Ducks
No ratings yet
Book Review: Bayesian Statistics The Fun Way: Understanding Statistics and Probability With Star Wars, Lego, and Rubber Ducks
4 pages
Supervised
No ratings yet
Supervised
5 pages
Chapter 3 Test Bank
No ratings yet
Chapter 3 Test Bank
55 pages
Mediation 4
No ratings yet
Mediation 4
95 pages
Tayal Aditya
No ratings yet
Tayal Aditya
150 pages
0412055511MarkovChain
100% (5)
0412055511MarkovChain
508 pages
Utstat - Toronto.edu-Graduate Course Offerings
No ratings yet
Utstat - Toronto.edu-Graduate Course Offerings
16 pages
Analytics of Observational Data Lec 10
No ratings yet
Analytics of Observational Data Lec 10
23 pages
Chapter 4
No ratings yet
Chapter 4
52 pages
Full Download Scientific Method in Brief First Edition Hugh G. Gauch PDF
100% (1)
Full Download Scientific Method in Brief First Edition Hugh G. Gauch PDF
48 pages
Unit 5
No ratings yet
Unit 5
41 pages
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
0% (1)
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
537 pages
Comparisons of Faulting-Based Pavement Performance Prediction Models
No ratings yet
Comparisons of Faulting-Based Pavement Performance Prediction Models
11 pages
Structural Safety: Johan Spross, Fredrik Johansson
No ratings yet
Structural Safety: Johan Spross, Fredrik Johansson
10 pages
Scientific Reasoning The Bayesian Approach 3rd Edition Colin Howson Instant Download
No ratings yet
Scientific Reasoning The Bayesian Approach 3rd Edition Colin Howson Instant Download
53 pages
Critical Care Nutrition: A Bayesian Re-Analysis of Trial Data Natal
No ratings yet
Critical Care Nutrition: A Bayesian Re-Analysis of Trial Data Natal
8 pages
Mit18 05 s22 Class11 Pset Sol
No ratings yet
Mit18 05 s22 Class11 Pset Sol
9 pages
Book PDF
No ratings yet
Book PDF
582 pages
Expert Systems: 5.1 Overview
No ratings yet
Expert Systems: 5.1 Overview
11 pages
Tutorial MixtureModel Brms
No ratings yet
Tutorial MixtureModel Brms
69 pages
Thermal Resistance Field Estimations From IR Thermography Using Multiscale Bayesian Inference
No ratings yet
Thermal Resistance Field Estimations From IR Thermography Using Multiscale Bayesian Inference
14 pages
Response Surface Methodology A Retrospective
No ratings yet
Response Surface Methodology A Retrospective
26 pages
CS1 Study Guide 2025
No ratings yet
CS1 Study Guide 2025
16 pages
KE Unit 1 Notes
No ratings yet
KE Unit 1 Notes
82 pages
Jianye Ching - Bayesian Machine Learning in Geotechnical Site Characterization (Challenges in Geotechnical and Rock Engineering) - CRC Press (2024)
No ratings yet
Jianye Ching - Bayesian Machine Learning in Geotechnical Site Characterization (Challenges in Geotechnical and Rock Engineering) - CRC Press (2024)
189 pages