0% found this document useful (0 votes)

38 views9 pages

DWM 5

The document discusses market basket analysis and the Apriori algorithm for finding frequent itemsets using candidate generation. It also defines and describes cluster analysis, providing requirements and an example using K-means clustering. Applications of cluster analysis are also stated.

Uploaded by

waghjayesh07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views9 pages

DWM 5

Uploaded by

waghjayesh07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Chapter no.

5 – Mining Frequent Patterns and Cluster

Analysis

(SOLUTION)

1. Explain market basket analysis with example

OR
ANS: Market Basket Analysis:
 A typical example of frequent itemset mining is market basket
analysis. This process analyzes customer buying habits by finding
associations between the different items that customers place in their
“shopping baskets”.
 The discovery of these associations can help retailers to develop
marketing strategies by gaining insight into which items are
frequently purchased together by customers.
 Example: If customers are buying milk, how likely they also buy
bread (and what kind of bread) on the same trip to the supermarket?
 This information can lead to increase in sales by helping retailers
do selective marketing and plan their shelf space.

2. Explain Apriori algorithms for frequent itemset using

candidate generation.
OR

3. Explain finding frequent item sets using candidate generation

ANS: Frequent Itemset Mining:
 Finding frequent patterns, associations, correlations, or causal
structures among sets of items or objects in transaction databases,
relational databases, and other information repositories.
 There are Several algorithms for generating rules have been used.
Like Apriori Algorithm and FP Growth algorithm for generating the
frequent itemsets.
 Apriori algorithm finds interesting association along with a huge set
of data items. The association rule mining problem was firstly given
by Apriori.

Any suitable example:

Dhrupesh Sir 9699692059 DWM

Consider the given database D and minimum support 50%. Apply the
Apriori algorithm and find frequent itemsets with confidence greater than
70%

Solution:
Calculate min_supp=0.5*4=2
(0.5: given minimum support, 4: total transactions in database D)
Step 1: Generate candidate list C1 from D
C1=

Step 2: Scan D for count of each candidate and find the support.
C1=

Step 3: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L1=

Step 4: Generate candidate list C1 from L1

Dhrupesh Sir 9699692059 DWM

(k-itemsets converted to k+1 itemsets)
C2 =

Step 5: Scan D for count of each candidate and find the support.
C2=

Step 6: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L2=

Step 7: Generate candidate list C1 from L2 (k-itemsets converted to k+1

itemsets)
C3=

Dhrupesh Sir 9699692059 DWM

Step 8: Scan D for count of each candidate and find the support.
C3=

Step 9: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L3=

Frequent itemset is {2,3,5}

4. Describe association rule of data mining.

ANS:
 Frequent pattern mining is also called as association rule mining.
 It is an analytical process that finds frequent patterns, associations, or
causal structures from data sets found in various kinds of databases
such as relational databases, transactional databases, and other data
repositories.
 Association rule mining searches for interesting relationships among
items in a given dataset. The strengths of association rule analysis are:

1. It produces clear and understandable results.

2. It supports undirected data mining.
3. It works on variable-length data.
4. The computations it uses are simple to understandable

Example:
 Suppose, a marketing manager of Electronic shop, would like to
determine which items are frequently purchased together within the
same transactions.
 An example of such a rule,
buys (X; "computer") → buys(X; "Antivirus software") [support = 1%;
confidence = 50%]
where, X is a variable representing a customer.
 Here, support and confidence are two measures of rule interestingness.

Dhrupesh Sir 9699692059 DWM

 A Confidence, or Certainty, of 50% means that if a customer buys a
computer, there is a 50% chance that customer will buy antivirus
software as well.
 A 1% Support means that 1% of all of the transactions under analysis
showed that computer and antivirus software were purchased together.
This association rule involves a single attribute or predicate (i.e., buys)
that repeats.

5. Define and decribe cluster Analysis.

ANS:
 Clustering is the process of grouping a set of data objects into
multiple groups/clusters so that objects within a cluster have high
similarity, but are very dissimilar to objects in other clusters.

 Clustering is an unsupervised machine learning algorithm that divides

a data into meaningful sub- groups, called clusters.

Description of Cluster Analysis:

 Clustering is a data mining technique used to place data elements into
related groups without advance knowledge.
 Clustering is the process of grouping a set of data objects into
multiple groups or clusters so that objects within a cluster have high
similarity, but are very dissimilar to objects in other clusters.
 Dissimilarities and similarities are assessed based on the attribute
values describing the objects and often involve distance measures.
 Cluster analysis or simply clustering is the process of partitioning a
set of data objects (or observations) into subsets.
 Each subset is a cluster, such that objects in a cluster are similar to
one another, yet dissimilar to objects in other clusters. The set of
clusters resulting from a cluster analysis can be referred to as a
clustering.

Dhrupesh Sir 9699692059 DWM

Requirements of Cluster Analysis:
 Scalability: Need highly scalable clustering algorithms to deal with
large databases.
 Ability to deal with different kinds of attributes: Algorithms
should be capable to be applied on any kind of data such as interval-
based (numerical) data, categorical, and binary data.

 Discovery of clusters with attribute shape: The clustering

algorithm should be capable of detecting clusters of arbitrary shape.
They should not be bounded to only distance measures that tend to
find spherical cluster of small sizes.
 High dimensionality: the clustering algorithm should not only be
able to handle low-dimensional data but also the high dimensional
space.
 Ability to deal with noisy data: Databases contain noisy, missing or
erroneous data. Some algorithms are sensitive to such data and may
lead to poor quality clusters.
 Interpretability: The clustering results should be interpretable,
comprehensible, and usable.

Example: K-means (any relevant example like this)

k-means algorithm to create 3 clusters for given set of values:
{2,3,6,8,9,12,15,18,22}

Answer:
Set of values: 2,3,6,8,9,12,15,18,22
1. Break given set of values randomly in to 3 clusters and calculate the
mean value.
K1: 2,8,15 mean=8.3
K2: 3,9,18 mean=10
K3: 6,12,22 mean=13.3

2. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 2,3,6,8,9 mean=5.6
K2: mean=0
K3: 12,15,18,22 mean=16.75

3. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 3,6,8,9 mean=6.5
K2: 2 mean=2
K3: 12,15,18,22 mean=16.75

Dhrupesh Sir 9699692059 DWM

4. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75
5. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75

6. Mean of all three clusters remaikn same.

Final 3 clusters are {6,8,9}, {2,3}, {12,15,18,22}

6. State Application of cluster analysis or clustering.

ANS:
Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
Following are the possible area where clustering is used for research
activity:
1. Clustering can help in many fields such as in biology, plants, and
animals classified by their properties. In biology, it is used for the
determination of plant and animal taxonomies, for the categorization
of genes with similar functionality and for insight into population-
inherent structures.
2. Clustering can also help marketers discover distinct groups in their
customer base. And they can characterize their customer groups based
on the purchasing patterns. Clustering can also help advertisers in
their customer base to find different groups. And their customer
groups can be defined by buying patterns.
3. In an earth observation database, clustering also makes it easier to
find areas of similar use in the land. It helps to identify groups of
houses and apartments by type, value, and destination of houses.
Clustering is also helpful in Earthquake studies where clustering
observed earthquake epicenters to identify dangerous zones.
4. Clustering is also used in outlier detection applications such as
detection of credit card fraud, intrusion detection system.
5. The clustering of documents on the web is also helpful for the
discovery of information.
6. Clustering is used in insurance companies for identifying groups of
motor insurance policy holders with a high average claim cost;
identifying frauds.

Dhrupesh Sir 9699692059 DWM

7. Clustering is helpful in Astronomy where It helps to find groups of
similar stars and galaxies

7. List clustering Methods explain any two.

ANS: There are various clustering techniques/algorithms in data mining
organized into following categories:

Sr. Clustering Explanation Algorithm

No Methods
1 Partitioning Partitioning based k-mean,
Method clustering algorithms Partitioning
divide the dataset into Around
initial 'k' clusters and Medoids
iteratively improve the (PAM),
clustering quality based on CLARA,
an objective function. CLARANS,
Expectation
Maximization
(EM).
2 Hierarchical Hierarchical clustering Agglomerative
Method algorithms seek to build a hierarchical
hierarchy of cluster. They clustering,
start with some initial Divisive
clusters and gradually hierarchical
converge to the solution Algorithms
either in top-down or clustering.
bottom-up approach.
3 Density Based Density based clustering DBSCAN.
Method algorithms make an
assumption that clusters are
dense regions in space
separated by regions of
lower density.
A dense cluster is a region
which is "density
connected", i.e. the density
of points in that region is
greater than a minimum.
Since, these algorithms
expand
clusters based on dense
connectivity, they can find
clusters of arbitrary shapes.

Dhrupesh Sir 9699692059 DWM

4 Grid Based In grid based clustering CLIQUE
Method algorithm. the entire dataset
is overlaid by a regular
hypergrid. The clusters are
then formed by combining
dense cells.

8. Describe the requirement of clustering in data mining.

ANS: Clustering is a fast growing and challenging research field. In this
section, we will learn about the why clustering is required in data mining:
1. Scalability: We need highly scalable clustering algorithms to
deal with large databases. Clustering on only a sample of a given
large data set may lead to biased results. Therefore, highly scalable
clustering algorithms are needed.
2. Ability to deal with different kinds of Attributes: Many
algorithms are designed to cluster numeric (interval-based) data.
However, applications may require clustering other data types,
such as binary, nominal (categorical) and ordinal data or mixtures
of these data types. Recently, more and more applications need
clustering techniques for complex data types such as graphs,
sequences, images, and documents.
3. Discovery of Clusters with Attribute Shape: The clustering
algorithm should be capable of detecting clusters of arbitrary shape.
They should not be bounded to only distance measures that tend to
find spherical cluster of small sizes.
4. High Dimensionality: The clustering algorithm should not only
be able to handle low-dimensional data but also the high
dimensional space.
5. Ability to deal with Noisy Data: Most real-world data sets
contain outliers and/or missing, unknown, or erroneous data.. Some
algorithms are sensitive to such data and may lead to poor quality
clusters. Therefore, we need clustering methods that are robust to
noise.
6. Interpretability and Usability: The clustering results should be
interpretable, comprehensible, and usable. i.e.; clustering may need
to be tied in with specific semantic interpretations and applications.
It is important to study how an application goal may influence the
selection of clustering features and clustering methods.

Dhrupesh Sir 9699692059 DWM

DA Unit 4
100% (1)
DA Unit 4
125 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Data Cube Computation and Data Generation
No ratings yet
Data Cube Computation and Data Generation
54 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
No ratings yet
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
118 pages
Unit 3
No ratings yet
Unit 3
62 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
CH - 5
No ratings yet
CH - 5
43 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Unit 5
No ratings yet
Unit 5
40 pages
DM 2
No ratings yet
DM 2
71 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Association Rules
No ratings yet
Association Rules
48 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Introduction To Data Mining - Lecture03
No ratings yet
Introduction To Data Mining - Lecture03
23 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Unit II
No ratings yet
Unit II
22 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Contents
No ratings yet
Contents
59 pages
Unit - III
No ratings yet
Unit - III
38 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
Lecture 6 - Other Data Science Tasks and Techniques
No ratings yet
Lecture 6 - Other Data Science Tasks and Techniques
60 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Model Question Paper and Solution - DWDM
No ratings yet
Model Question Paper and Solution - DWDM
57 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Chap 6
No ratings yet
Chap 6
77 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Unit 3 Data Science
No ratings yet
Unit 3 Data Science
15 pages
Implementation of Apriori Algorithm For Analysis o
No ratings yet
Implementation of Apriori Algorithm For Analysis o
9 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
DWM Unit-4 Sem Ans
No ratings yet
DWM Unit-4 Sem Ans
9 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
U3 FDS 1
No ratings yet
U3 FDS 1
17 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
p132 Closet
No ratings yet
p132 Closet
11 pages
Biw Basics
100% (1)
Biw Basics
109 pages
Roles in Database Environment
67% (15)
Roles in Database Environment
25 pages
Practical-7: To Prepare ER Diagram For Library Management System
No ratings yet
Practical-7: To Prepare ER Diagram For Library Management System
4 pages
Questions On Rman
No ratings yet
Questions On Rman
9 pages
Power BI MCQs
No ratings yet
Power BI MCQs
25 pages
BCA 202 Database Management System2
No ratings yet
BCA 202 Database Management System2
137 pages
Qlikview Interview Questions
0% (2)
Qlikview Interview Questions
19 pages
ABAP5
No ratings yet
ABAP5
2 pages
Pro SQL Server Relational Database Design and Implementation 5th Edition Louis Davidson Download
No ratings yet
Pro SQL Server Relational Database Design and Implementation 5th Edition Louis Davidson Download
59 pages
Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
29 pages
Noemi
No ratings yet
Noemi
3 pages
Beyond XP - Cmdshell Owning SQL
No ratings yet
Beyond XP - Cmdshell Owning SQL
131 pages
Experiment Number 10
No ratings yet
Experiment Number 10
4 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
12 pages
OS Lab Manual - Docx - Removed
No ratings yet
OS Lab Manual - Docx - Removed
35 pages
RMAN: Basic RMAN Commands: Show Command
No ratings yet
RMAN: Basic RMAN Commands: Show Command
4 pages
Flashback Technologies
No ratings yet
Flashback Technologies
3 pages
Simulacro de Examen
No ratings yet
Simulacro de Examen
6 pages
Co Unit-Iii
No ratings yet
Co Unit-Iii
12 pages
Hifm Dataset - TTL
No ratings yet
Hifm Dataset - TTL
547 pages
TcRA OOTB Guidelines
No ratings yet
TcRA OOTB Guidelines
5 pages
List of Aws Security Labs by Pwnedlabs 1728618372
No ratings yet
List of Aws Security Labs by Pwnedlabs 1728618372
7 pages
Experiment Number 8
No ratings yet
Experiment Number 8
4 pages
Experiment Number 13
No ratings yet
Experiment Number 13
4 pages
Faculty Book System Abstract
No ratings yet
Faculty Book System Abstract
2 pages
Silabus - Document Control & DFS - For Trainer
No ratings yet
Silabus - Document Control & DFS - For Trainer
2 pages
Tsizepro Manual
No ratings yet
Tsizepro Manual
15 pages
Lakhmani Resume MS AD WW
No ratings yet
Lakhmani Resume MS AD WW
2 pages
Normalization Vs DeNormalization
No ratings yet
Normalization Vs DeNormalization
26 pages
LMRP - LCD Data Dictionary
No ratings yet
LMRP - LCD Data Dictionary
51 pages
Database: ICT and Internet Engineering Instructor: Andrea Giglio Andrea - [email protected]
No ratings yet
Database: ICT and Internet Engineering Instructor: Andrea Giglio Andrea - [email protected]
6 pages
Error Based SQL Injection in Order by Clause (MSSQL) PDF
No ratings yet
Error Based SQL Injection in Order by Clause (MSSQL) PDF
10 pages
Model Question Paper-R16
No ratings yet
Model Question Paper-R16
1 page
Avamar Frequently Asked Questions: Q. I Recently Deleted A Lot of Backups. How Do I Tell How Much Space I Will Get Back?
No ratings yet
Avamar Frequently Asked Questions: Q. I Recently Deleted A Lot of Backups. How Do I Tell How Much Space I Will Get Back?
2 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

DWM 5

Uploaded by

DWM 5

Uploaded by

Chapter no.

5 – Mining Frequent Patterns and Cluster

1. Explain market basket analysis with example

2. Explain Apriori algorithms for frequent itemset using

3. Explain finding frequent item sets using candidate generation

Dhrupesh Sir 9699692059 DWM

Step 3: Compare candidate support count with min_supp (i.e. 2)

Step 4: Generate candidate list C1 from L1

Dhrupesh Sir 9699692059 DWM

Step 6: Compare candidate support count with min_supp (i.e. 2)

Step 7: Generate candidate list C1 from L2 (k-itemsets converted to k+1

Dhrupesh Sir 9699692059 DWM

Step 9: Compare candidate support count with min_supp (i.e. 2)

Frequent itemset is {2,3,5}

4. Describe association rule of data mining.

1. It produces clear and understandable results.

Dhrupesh Sir 9699692059 DWM

5. Define and decribe cluster Analysis.

 Clustering is an unsupervised machine learning algorithm that divides

Description of Cluster Analysis:

Dhrupesh Sir 9699692059 DWM

 Discovery of clusters with attribute shape: The clustering

Example: K-means (any relevant example like this)

Dhrupesh Sir 9699692059 DWM

6. Mean of all three clusters remaikn same.

6. State Application of cluster analysis or clustering.

Dhrupesh Sir 9699692059 DWM

7. List clustering Methods explain any two.

Sr. Clustering Explanation Algorithm

Dhrupesh Sir 9699692059 DWM

8. Describe the requirement of clustering in data mining.

Dhrupesh Sir 9699692059 DWM

You might also like