0% found this document useful (0 votes)
110 views4 pages

DWM Unit 2

Association rule mining searches for interesting relationships among items in a dataset. Association rules are considered interesting if they satisfy minimum support and confidence thresholds set by users. The Apriori algorithm is an influential algorithm that uses an anti-monotone property to efficiently find frequent itemsets in transactional data for generating association rules. Decision trees are models that classify data by building a tree structure, and use attribute selection measures and pruning methods to build accurate trees.

Uploaded by

Vijay Krish
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views4 pages

DWM Unit 2

Association rule mining searches for interesting relationships among items in a dataset. Association rules are considered interesting if they satisfy minimum support and confidence thresholds set by users. The Apriori algorithm is an influential algorithm that uses an anti-monotone property to efficiently find frequent itemsets in transactional data for generating association rules. Decision trees are models that classify data by building a tree structure, and use attribute selection measures and pruning methods to build accurate trees.

Uploaded by

Vijay Krish
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Unit 2:

1. Define Association Rule Mining.


Association rule mining searches for interesting relationships among
items in
a given data set
2. When we can say the association rules are interesting?
Association rules are considered interesting if they satisfy both a
minimum
support threshold and a minimum confidence threshold. Users or
domain experts
can set such thresholds.
3. Explain Association rule in mathematical notations.
Let I-{i1,i2,…..,im} be a set of items
Let D, the task relevant data be a set of database transaction T is a set
of
items
An association rule is an implication of the form A=>B where A C I, B C
I,
and An B=φ . The rule A=>B contains in the transaction set D with
support s,
where s is the percentage of transactions in D that contain AUB. The
Rule A=> B
has confidence c in the transaction set D if c is the percentage of
transactions in D
containing A that also contain B.
4. Define support and confidence in Association rule mining.
Support S is the percentage of transactions in D that contain AUB.
Confidence c is the percentage of transactions in D containing A that
also contain
B.
Support ( A=>B)= P(AUB)
Confidence (A=>B)=P(B/A)
5. How are association rules mined from large databases?
I step: Find all frequent item sets:
II step: Generate strong association rules from frequent item sets
6. Describe the different classifications of Association rule
mining.
Based on types of values handled in the Rule
i. Boolean association rule
ii. Quantitative association rule
Based on the dimensions of data involved
i. Single dimensional association rule
ii. Multidimensional association rule
Based on the levels of abstraction involved
i. Multilevel association rule
ii. Single level association rule
Based on various extensions
i. Correlation analysis
ii. Mining max patterns
7. What is the purpose of Apriori Algorithm?
Apriori algorithm is an influential algorithm for mining frequent item
sets for
Boolean association rules. The name of the algorithm is based on the
fact that the
algorithm uses prior knowledge of frequent item set properties.
8. Define anti-monotone property.
If a set cannot pass a test, all of its supersets will fail the same test as
well.
9. How to generate association rules from frequent item sets?
Association rules can be generated as follows
For each frequent item set1, generate all non empty subsets of 1.
For every non empty subsets s of 1, output the rule “S=>(1-s)”if
Support count(1)
=min_conf,
Support_count(s)
Where min_conf is the minimum confidence threshold.
10. Give few techniques to improve the efficiency of Apriori
algorithm.
Hash based technique
Transaction Reduction
Portioning
Sampling
Dynamic item counting
11. What are the things suffering the performance of Apriori
candidate
generation technique.
Need to generate a huge number of candidate sets
Need to repeatedly scan the scan the database and check a large
set of
candidates by pattern matching
12. Describe the method of generating frequent item sets
without candidate
generation.
Frequent-pattern growth(or FP Growth) adopts divide-and-conquer
strategy.
Steps:
Compress the database representing frequent items into a frequent
pattern tree
or FP tree
Divide the compressed database into a set of conditional database
Mine each conditional database separately
13. Define Iceberg query.
It computes an aggregate function over an attribute or set of attributes
in
order to find aggregate values above some specified threshold.
Given relation R with attributes a1,a2,…..,an and b, and an aggregate
function,
agg_f, an iceberg query is the form
Select R.a1,R.a2,…..R.an,agg_f(R,b)
From relation R
Group by R.a1,R.a2,….,R.an
Having agg_f(R.b)>=threhold
14. Mention few approaches to mining Multilevel Association
Rules
Uniform minimum support for all levels(or uniform support)
Using reduced minimum support at lower levels(or reduced support)
Level-by-level independent
Level-cross filtering by single item
Level-cross filtering by k-item set
15. What are multidimensional association rules?
Association rules that involve two or more dimensions or predicates
Interdimension association rule: Multidimensional association rule
with no
repeated predicate or dimension
Hybrid-dimension association rule: Multidimensional association rule
with
multiple occurrences of some predicates or dimensions.
16. Define constraint-Based Association Mining.
Mining is performed under the guidance of various kinds of constraints
provided by the user.
The constraints include the following
Knowledge type constraints
Data constraints
Dimension/level constraints
Interestingness constraints
Rule constraints.
17. Define the concept of classification.
Two step process
A model is built describing a predefined set of data classes or
concepts.
The model is constructed by analyzing database tuples described by
attributes.
The model is used for classification.
18. What is Decision tree?
A decision tree is a flow chart like tree structures, where each internal
node denotes a test on an attribute, each branch represents an
outcome of the test,
and leaf nodes represent classes or class distributions. The top most in
a tree is the
root node.
19. What is Attribute Selection Measure?
The information Gain measure is used to select the test attribute at
each node
in the decision tree. Such a measure is referred to as an attribute
selection measure
or a measure of the goodness of split.
20. Describe Tree pruning methods.
When a decision tree is built, many of the branches will reflect
anomalies in
the training data due to noise or outlier. Tree pruning methods address
this
problem of over fitting the data.
Approaches:
Pre pruning
Post pruning
21. Define Pre Pruning
A tree is pruned by halting its construction early. Upon halting, the
node
becomes a leaf. The leaf may hold the most frequent class among the
subset
samples.
22. Define Post Pruning.
Post pruning removes branches from a “Fully grown” tree. A tree node
is
pruned by removing its branches.
Eg: Cost Complexity Algorithm
23. What is meant by Pattern?
Pattern represents the knowledge.
24. Define the concept of prediction.
Prediction can be viewed as the construction and use of a model to
assess the
class of an unlabeled sample or to assess the value or value ranges of
an attribute
that a given sample is likely to have.

You might also like