0% found this document useful (0 votes)
38 views9 pages

DWM 5

The document discusses market basket analysis and the Apriori algorithm for finding frequent itemsets using candidate generation. It also defines and describes cluster analysis, providing requirements and an example using K-means clustering. Applications of cluster analysis are also stated.

Uploaded by

waghjayesh07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views9 pages

DWM 5

The document discusses market basket analysis and the Apriori algorithm for finding frequent itemsets using candidate generation. It also defines and describes cluster analysis, providing requirements and an example using K-means clustering. Applications of cluster analysis are also stated.

Uploaded by

waghjayesh07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter no.

5 – Mining Frequent Patterns and Cluster


Analysis

(SOLUTION)

1. Explain market basket analysis with example


OR
ANS: Market Basket Analysis:
 A typical example of frequent itemset mining is market basket
analysis. This process analyzes customer buying habits by finding
associations between the different items that customers place in their
“shopping baskets”.
 The discovery of these associations can help retailers to develop
marketing strategies by gaining insight into which items are
frequently purchased together by customers.
 Example: If customers are buying milk, how likely they also buy
bread (and what kind of bread) on the same trip to the supermarket?
 This information can lead to increase in sales by helping retailers
do selective marketing and plan their shelf space.

2. Explain Apriori algorithms for frequent itemset using


candidate generation.
OR

3. Explain finding frequent item sets using candidate generation


ANS: Frequent Itemset Mining:
 Finding frequent patterns, associations, correlations, or causal
structures among sets of items or objects in transaction databases,
relational databases, and other information repositories.
 There are Several algorithms for generating rules have been used.
Like Apriori Algorithm and FP Growth algorithm for generating the
frequent itemsets.
 Apriori algorithm finds interesting association along with a huge set
of data items. The association rule mining problem was firstly given
by Apriori.

Any suitable example:

Dhrupesh Sir 9699692059 DWM


Consider the given database D and minimum support 50%. Apply the
Apriori algorithm and find frequent itemsets with confidence greater than
70%

Solution:
Calculate min_supp=0.5*4=2
(0.5: given minimum support, 4: total transactions in database D)
Step 1: Generate candidate list C1 from D
C1=

Step 2: Scan D for count of each candidate and find the support.
C1=

Step 3: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L1=

Step 4: Generate candidate list C1 from L1

Dhrupesh Sir 9699692059 DWM


(k-itemsets converted to k+1 itemsets)
C2 =

Step 5: Scan D for count of each candidate and find the support.
C2=

Step 6: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L2=

Step 7: Generate candidate list C1 from L2 (k-itemsets converted to k+1


itemsets)
C3=

Dhrupesh Sir 9699692059 DWM


Step 8: Scan D for count of each candidate and find the support.
C3=

Step 9: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than
min_supp i.e. 2)
L3=

Frequent itemset is {2,3,5}

4. Describe association rule of data mining.


ANS:
 Frequent pattern mining is also called as association rule mining.
 It is an analytical process that finds frequent patterns, associations, or
causal structures from data sets found in various kinds of databases
such as relational databases, transactional databases, and other data
repositories.
 Association rule mining searches for interesting relationships among
items in a given dataset. The strengths of association rule analysis are:

1. It produces clear and understandable results.


2. It supports undirected data mining.
3. It works on variable-length data.
4. The computations it uses are simple to understandable

Example:
 Suppose, a marketing manager of Electronic shop, would like to
determine which items are frequently purchased together within the
same transactions.
 An example of such a rule,
buys (X; "computer") → buys(X; "Antivirus software") [support = 1%;
confidence = 50%]
where, X is a variable representing a customer.
 Here, support and confidence are two measures of rule interestingness.

Dhrupesh Sir 9699692059 DWM


 A Confidence, or Certainty, of 50% means that if a customer buys a
computer, there is a 50% chance that customer will buy antivirus
software as well.
 A 1% Support means that 1% of all of the transactions under analysis
showed that computer and antivirus software were purchased together.
This association rule involves a single attribute or predicate (i.e., buys)
that repeats.

5. Define and decribe cluster Analysis.


ANS:
 Clustering is the process of grouping a set of data objects into
multiple groups/clusters so that objects within a cluster have high
similarity, but are very dissimilar to objects in other clusters.

 Clustering is an unsupervised machine learning algorithm that divides


a data into meaningful sub- groups, called clusters.

Description of Cluster Analysis:


 Clustering is a data mining technique used to place data elements into
related groups without advance knowledge.
 Clustering is the process of grouping a set of data objects into
multiple groups or clusters so that objects within a cluster have high
similarity, but are very dissimilar to objects in other clusters.
 Dissimilarities and similarities are assessed based on the attribute
values describing the objects and often involve distance measures.
 Cluster analysis or simply clustering is the process of partitioning a
set of data objects (or observations) into subsets.
 Each subset is a cluster, such that objects in a cluster are similar to
one another, yet dissimilar to objects in other clusters. The set of
clusters resulting from a cluster analysis can be referred to as a
clustering.

Dhrupesh Sir 9699692059 DWM


Requirements of Cluster Analysis:
 Scalability: Need highly scalable clustering algorithms to deal with
large databases.
 Ability to deal with different kinds of attributes: Algorithms
should be capable to be applied on any kind of data such as interval-
based (numerical) data, categorical, and binary data.

 Discovery of clusters with attribute shape: The clustering


algorithm should be capable of detecting clusters of arbitrary shape.
They should not be bounded to only distance measures that tend to
find spherical cluster of small sizes.
 High dimensionality: the clustering algorithm should not only be
able to handle low-dimensional data but also the high dimensional
space.
 Ability to deal with noisy data: Databases contain noisy, missing or
erroneous data. Some algorithms are sensitive to such data and may
lead to poor quality clusters.
 Interpretability: The clustering results should be interpretable,
comprehensible, and usable.

Example: K-means (any relevant example like this)


k-means algorithm to create 3 clusters for given set of values:
{2,3,6,8,9,12,15,18,22}

Answer:
Set of values: 2,3,6,8,9,12,15,18,22
1. Break given set of values randomly in to 3 clusters and calculate the
mean value.
K1: 2,8,15 mean=8.3
K2: 3,9,18 mean=10
K3: 6,12,22 mean=13.3

2. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 2,3,6,8,9 mean=5.6
K2: mean=0
K3: 12,15,18,22 mean=16.75

3. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 3,6,8,9 mean=6.5
K2: 2 mean=2
K3: 12,15,18,22 mean=16.75

Dhrupesh Sir 9699692059 DWM


4. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75
5. Reassign the values to clusters as per the mean calculated and calculate
the mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75

6. Mean of all three clusters remaikn same.


Final 3 clusters are {6,8,9}, {2,3}, {12,15,18,22}

6. State Application of cluster analysis or clustering.


ANS:
Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
Following are the possible area where clustering is used for research
activity:
1. Clustering can help in many fields such as in biology, plants, and
animals classified by their properties. In biology, it is used for the
determination of plant and animal taxonomies, for the categorization
of genes with similar functionality and for insight into population-
inherent structures.
2. Clustering can also help marketers discover distinct groups in their
customer base. And they can characterize their customer groups based
on the purchasing patterns. Clustering can also help advertisers in
their customer base to find different groups. And their customer
groups can be defined by buying patterns.
3. In an earth observation database, clustering also makes it easier to
find areas of similar use in the land. It helps to identify groups of
houses and apartments by type, value, and destination of houses.
Clustering is also helpful in Earthquake studies where clustering
observed earthquake epicenters to identify dangerous zones.
4. Clustering is also used in outlier detection applications such as
detection of credit card fraud, intrusion detection system.
5. The clustering of documents on the web is also helpful for the
discovery of information.
6. Clustering is used in insurance companies for identifying groups of
motor insurance policy holders with a high average claim cost;
identifying frauds.

Dhrupesh Sir 9699692059 DWM


7. Clustering is helpful in Astronomy where It helps to find groups of
similar stars and galaxies

7. List clustering Methods explain any two.


ANS: There are various clustering techniques/algorithms in data mining
organized into following categories:

Sr. Clustering Explanation Algorithm


No Methods
1 Partitioning Partitioning based k-mean,
Method clustering algorithms Partitioning
divide the dataset into Around
initial 'k' clusters and Medoids
iteratively improve the (PAM),
clustering quality based on CLARA,
an objective function. CLARANS,
Expectation
Maximization
(EM).
2 Hierarchical Hierarchical clustering Agglomerative
Method algorithms seek to build a hierarchical
hierarchy of cluster. They clustering,
start with some initial Divisive
clusters and gradually hierarchical
converge to the solution Algorithms
either in top-down or clustering.
bottom-up approach.
3 Density Based Density based clustering DBSCAN.
Method algorithms make an
assumption that clusters are
dense regions in space
separated by regions of
lower density.
A dense cluster is a region
which is "density
connected", i.e. the density
of points in that region is
greater than a minimum.
Since, these algorithms
expand
clusters based on dense
connectivity, they can find
clusters of arbitrary shapes.

Dhrupesh Sir 9699692059 DWM


4 Grid Based In grid based clustering CLIQUE
Method algorithm. the entire dataset
is overlaid by a regular
hypergrid. The clusters are
then formed by combining
dense cells.

8. Describe the requirement of clustering in data mining.


ANS: Clustering is a fast growing and challenging research field. In this
section, we will learn about the why clustering is required in data mining:
1. Scalability: We need highly scalable clustering algorithms to
deal with large databases. Clustering on only a sample of a given
large data set may lead to biased results. Therefore, highly scalable
clustering algorithms are needed.
2. Ability to deal with different kinds of Attributes: Many
algorithms are designed to cluster numeric (interval-based) data.
However, applications may require clustering other data types,
such as binary, nominal (categorical) and ordinal data or mixtures
of these data types. Recently, more and more applications need
clustering techniques for complex data types such as graphs,
sequences, images, and documents.
3. Discovery of Clusters with Attribute Shape: The clustering
algorithm should be capable of detecting clusters of arbitrary shape.
They should not be bounded to only distance measures that tend to
find spherical cluster of small sizes.
4. High Dimensionality: The clustering algorithm should not only
be able to handle low-dimensional data but also the high
dimensional space.
5. Ability to deal with Noisy Data: Most real-world data sets
contain outliers and/or missing, unknown, or erroneous data.. Some
algorithms are sensitive to such data and may lead to poor quality
clusters. Therefore, we need clustering methods that are robust to
noise.
6. Interpretability and Usability: The clustering results should be
interpretable, comprehensible, and usable. i.e.; clustering may need
to be tied in with specific semantic interpretations and applications.
It is important to study how an application goal may influence the
selection of clustering features and clustering methods.

Dhrupesh Sir 9699692059 DWM

You might also like