0% found this document useful (0 votes)
31 views13 pages

Mining Frequent Patterns and Associations

Uploaded by

dhruu2503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views13 pages

Mining Frequent Patterns and Associations

Uploaded by

dhruu2503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Mining frequent patterns and associations

1.1 Market Basket Analysis:


A data mining technique that is used to uncover purchase patterns in any retail setting is
known as Market Basket Analysis. In simple terms Basically, Market basket analysis in data
mining is to analyze the combination of products which been bought together.

This is a technique that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent purchase items by customers.
This analysis can help to promote deals, offers, sale by the companies, and data mining
techniques helps to achieve this analysis task. Example:

● Data mining concepts are in use for Sales and marketing to provide better

customer service, to improve cross-selling opportunities, to increase direct mail

response rates.

● Customer Retention in the form of pattern identification and prediction of likely

defections is possible by Data mining.

● Risk Assessment and Fraud area also use the data-mining concept for

identifying inappropriate or unusual behavior etc.

Market basket analysis mainly works with the ASSOCIATION RULE {IF} -> {THEN}.

● IF means Antecedent: An antecedent is an item found within the data

● THEN means Consequent: A consequent is an item found in combination with

the antecedent.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:

1. Descriptive market basket analysis: This sort of analysis looks for patterns and

connections in the data that exist between the components of a market basket.

This kind of study is mostly used to understand consumer behavior, including

what products are purchased in combination and what the most typical item

combinations. Retailers can place products in their stores more profitably by


understanding which products are frequently bought together with the aid of

descriptive market basket analysis.

2. Predictive Market Basket Analysis: Market basket analysis that predicts future

purchases based on past purchasing patterns is known as predictive market

basket analysis. Large volumes of data are analyzed using machine learning

algorithms in this sort of analysis in order to create predictions about which

products are most likely to be bought together in the future. Retailers may make

data-driven decisions about which products to carry, how to price them, and

how to optimize shop layouts with the use of predictive market basket research.

3. Differential Market Basket Analysis: Differential market basket analysis

analyses two sets of market basket data to identify variations between them.

Comparing the behavior of various client segments or the behavior of customers

over time is a common usage for this kind of study. Retailers can respond to

shifting consumer behavior by modifying their marketing and sales tactics with

the help of differential market basket analysis.

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding: Market basket research offers insights into

customer behavior, including what products they buy together and which

products they buy the most frequently. Retailers can use this information to

better understand their customers and make informed decisions.

2. Improved Inventory Management: By examining market basket data, retailers

can determine which products are sluggish sellers and which ones are

commonly bought together. Retailers can use this information to make

well-informed choices about what products to stock and how to manage their

inventory most effectively.

3. Better Pricing Strategies: A better understanding of the connection between

product prices and consumer behavior might help merchants develop better
pricing strategies. Using this knowledge, pricing plans that boost sales and

profitability can be created.

4. Sales Growth: Market basket analysis can assist businesses in determining

which products are most frequently bought together and where they should be

positioned in the store to grow sales. Retailers may boost revenue and enhance

customer shopping experiences by improving store layouts and product

positioning.

Applications of Market Basket Analysis

1. Retail: Market basket research is frequently used in the retail sector to examine

consumer buying patterns and inform decisions about product placement,

inventory management, and pricing tactics. Retailers can utilize market basket

research to identify which items are sluggish sellers and which ones are

commonly bought together, and then modify their inventory management

strategy accordingly.

2. E-commerce: Market basket analysis can help online merchants better

understand the customer buying habits and make data-driven decisions about

product recommendations and targeted advertising campaigns. The behaviour of

visitors to a website can be examined using market basket analysis to pinpoint

problem areas.

3. Finance: Market basket analysis can be used to evaluate investor behaviour and

forecast the types of investment items that investors will likely buy in the future.

The performance of investment portfolios can be enhanced by using this

information to create tailored investment strategies.

4. Telecommunications: To evaluate consumer behaviour and make data-driven

decisions about which goods and services to provide, the telecommunications

business might employ market basket analysis. The usage of this data can

enhance client happiness and the shopping experience.


5. Manufacturing: To evaluate consumer behaviour and make data-driven

decisions about which products to produce and which materials to employ in the

production process, the manufacturing sector might use market basket analysis.

Utilizing this knowledge will increase effectiveness and cut costs.

1.2 Frequent Item sets, Closed Item sets, and Association Rule:

Frequent itemset mining is a market basket analysis methodology that helps to find
patterns in the shopping behaviors of the users across different shopping platforms. These
relationships are represented in the form of association rules. Frequent element set or
pattern mining is widely used due to its wide applications in pattern mining, correlations, and
constraints that are based on frequent patterns, sequential patterns, and many other data
mining tasks. Talking specifically, this technique is used to find sets of products that are
frequently bought together.

Closed Item set mining is a frequent itemset that is both closed and its support is greater

than or equal to minimum support.

An itemset is closed in a data set if there exists no superset that has the same support count
as this original itemset.

Association Rules

Association Rules search for frequent patterns, associations, correlations, or causal


structures between sets of items or objects in transaction databases, relational databases,
and other available information repositories.

Applications

● Analysis of banking data


● Cross-marketing (e.g. put chocolates next to the strawberries)
● Catalog design

Association rules help to predict the occurrence of one item based on the occurrences of
other items in a set of transactions.
Examples

● People who buy bread will also buy milk


● People who buy milk will also buy eggs
● People who bought soda will also buy potato chips
● People who buy bread will also buy jam

1.3 Frequent Pattern Mining:

Finding recurrent patterns or item sets in huge datasets is the goal of frequent
pattern mining, a crucial data mining approach. It looks for groups of objects
that regularly appear together in order to expose underlying relationships and
interdependence. Market basket analysis, web usage mining, and bioinformatics
are a few areas where this method is important.

It helps organizations comprehend client preferences, optimize cross−selling


tactics, and improve recommendation systems by revealing patterns of
consumer behavior.
Techniques for Frequent Pattern Mining

Apriori Algorithm

One of the most popular methods, the Apriori algorithm, uses a step−by−step
procedure to find frequent item sets. It starts by creating candidate itemsets of
length 1, determining their support, and eliminating any that fall below the
predetermined cutoff. The method then joins the frequent itemsets from the
previous phase to produce bigger itemsets repeatedly.

Once no more common item sets can be located, the procedure is repeated. The
Apriori approach is commonly used because of its efficiency and simplicity, but
because it requires numerous database scans for big datasets, it can be
computationally inefficient.

FP−growth Algorithm

A different strategy for frequent pattern mining is provided by the FP−growth


algorithm. It creates a small data structure known as the FP−tree that effectively
describes the dataset without creating candidate itemsets. The FP−growth
algorithm constructs the FP−tree recursively and then directly mines frequent
item sets from it.

FP−growth can be much quicker than Apriori by skipping the construction of


candidate itemsets, which lowers the number of runs over the dataset. It is very
helpful for sparse and huge datasets.

Eclat Algorithm

Equivalence Class Clustering and bottom−up Lattice Traversal are the acronyms
for the Eclat algorithm, a well−liked frequent pattern mining method. It explores
the itemset lattice using a depth−first search approach, concentrating on the
representation of vertical data formats.

Transaction identifiers (TIDs) are effectively used by Eclat to locate intersections


between item sets. This technique is renowned for its ease of use and little
memory requirements, making it appropriate for mining frequent itemsets in
vertical databases.

Applications of Frequent Pattern Mining

Market Basket Analysis

Market basket analysis frequently mines patterns to comprehend consumer


buying patterns. Businesses get knowledge about product associations by
recognizing itemsets that commonly appear together in transactions. This
knowledge enables companies to improve recommendation systems and
cross−sell efforts. Retailers can use this program to assist them in making
data−driven decisions that will enhance customer happiness and boost sales.

Web usage mining

Web usage mining is examining user navigation patterns to learn more about
how people use websites. In order to personalize websites and enhance their
performance, frequent pattern mining makes it possible to identify recurrent
navigation patterns and session patterns. Businesses can change content,
layout, and navigation to improve user experience and boost engagement by
studying how consumers interact with a website.

Bioinformatics

The identification of relevant DNA patterns in the field of bioinformatics is made


possible by often occurring pattern mining. Researchers can get insights into
genetic variants, illness connections, and drug development by examining big
genomic databases for recurrent patterns. In order to diagnose diseases,
practice personalized medicine, and create innovative therapeutic strategies,
frequent pattern mining algorithms help uncover important DNA sequences and
patterns.

1.4 Association Rule Mining:


Given the mininum threshold confidence, Generating association rules by

going through all possible combinations of frequent item sets and pruning the

rules according to confidence criterion.

Following are the steps for strong Association Rule Generation:

● Generate all nonempty subsets for each frequent itemset

● For every nonempty subset S of Itemset I , output of the rule:

○ S --> (I - S )

○ If support_count (I) / support_count (S) > = minimum confidence

threshold then rule is a strong Association Rule.

1.5 Improving the Efficiency of Apriori:

Methods To Improve Apriori Efficiency

1. Hash-Based Technique: This method uses a hash-based structure


called a hash table for generating the k-itemsets and its corresponding
count. It uses a hash function for generating the table.

2. Transaction Reduction: This method reduces the number of


transactions scanning in iterations. The transactions which do not
contain frequent items are marked or removed.

3. Partitioning: This method requires only two database scans to mine


the frequent itemsets. It says that for any itemset to be potentially
frequent in the database, it should be frequent in at least one of the
partitions of the database.

4. Sampling: This method picks a random sample S from Database D


and then searches for frequent itemset in S. It may be possible to lose a
global frequent itemset. This can be reduced by lowering the min_sup.

5. Dynamic Itemset Counting: This technique can add new candidate


itemsets at any marked start point of the database during the scanning
of the database.

Applications Of Apriori Algorithm


Some fields where Apriori is used:
1. In Education Field: Extracting association rules in data mining of
admitted students through characteristics and specialties.
2. In the Medical field: For example Analysis of the patient’s database.
3. In Forestry: Analysis of probability and intensity of forest fire with the
forest fire data.
4. Apriori is used by many companies like Amazon in the Recommender
System and by Google for the auto-complete feature.

1.6 Multilevel Association Rule and Multidimensional Association Rule:


Multilevel Association Rule :
Association rules created from mining information at different degrees of reflection are
called various level or staggered association rules.
Multilevel association rules can be mined effectively utilising idea progressions under a
help certainty system.
Rules at a high idea level may add to good judgement while rules at a low idea level may
not be valuable consistently.

Utilising uniform least help for all levels :

● At the point when a uniform least help edge is utilised, the pursuit system is

rearranged.

● The technique is likewise straightforward, in that clients are needed to indicate

just a single least help edge.

● A similar least help edge is utilised when mining at each degree of deliberation.

(for example for mining from “PC” down to “PC”). Both “PC” and “PC” are

discovered to be incessant, while “PC” isn’t.

Approaches to multilevel association rule mining :


1. Uniform Support –

At the point when a uniform least help edge is used, the search methodology is

simplified. The technique is likewise basic in that clients are needed to determine

just a single least help threshold. An advancement technique can be adopted,

based on the information that a progenitor is a superset of its descendant. the

search keeps away from analyzing item sets containing anything that doesn’t

have minimum support. The uniform support approach however has some

difficulties. It is unlikely that items at lower levels of abstraction will occur as

frequently as those at higher levels of abstraction. If the minimum support

threshold is set too high it could miss several meaningful associations occurring
at low abstraction levels. This provides the motivation for the following

approach.

2. Reduce Support –

For mining various level relationship with diminished support, there are various

elective hunt techniques as follows.

● Level-by-Level independence –

This is a full-broadness search, where no foundation information on

regular item sets is utilized for pruning. Each hub is examined,

regardless of whether its parent hub is discovered to be incessant.

● Level – cross-separating by single thing –

A thing at the I level is inspected if and just if its parent hub at the

(I-1) level is regular .all in all, we research a more explicit relationship

from a more broad one. If a hub is frequent, its kids will be examined;

otherwise, its descendant is pruned from the inquiry.

● Level-cross separating by – K-itemset –

A-itemset at the I level is inspected if and just if it’s For mining

various level relationship with diminished support, there are various

elective hunt techniques.

● Level-by-Level independence –

This is a full-broadness search, where no foundation information on

regular item sets is utilized for pruning. Each hub is examined,

regardless of whether its parent hub is discovered to be incessant.

● Level – cross-separating by single thing –

A thing at the 1st level is inspected if and just if its parent hub at the

(I-1) the level is regular .all in all, we research a more explicit

relationship from a more broad one. If a hub is frequent, its kids will

be examined otherwise, its descendant is pruned from the inquiry.


● Level-cross separating by – K-item set –

A-item set at the I level is inspected if and just if its corresponding

parents A item set (i-1) level is frequent.

3. Group-based support –

The group-wise threshold value for support and confidence is input by the user

or expert. The group is selected based on a product price or item set because

often expert has insight as to which groups are more important than others.

Example –

For e.g. Experts are interested in purchase patterns of laptops or clothes in the

non and electronic category. Therefore low support threshold is set for this

group to give attention to these items’ purchase patterns.

Multidimensional Association Rules :

In Multi dimensional association rule Qualities can be absolute or quantitative.

● Quantitative characteristics are numeric and consolidates order.

● Numeric traits should be discretized.

● Multi dimensional affiliation rule comprises of more than one measurement.

● Example –buys(X, “IBM Laptop computer”)buys(X, “HP Inkjet Printer”)

Approaches in mining multi dimensional affiliation rules :

Three approaches in mining multi dimensional affiliation rules are as following.

1. Using static discretization of quantitative qualities :

● Discretization is static and happens preceding mining.

● Discretized ascribes are treated as unmitigated.

● Use apriori calculation to locate all k-regular predicate sets(this

requires k or k+1 table outputs). Each subset of regular predicate set

should be continuous.
Example –

If in an information block the 3D cuboid (age, pay, purchases) is continuous suggests

(age, pay), (age, purchases), (pay, purchases) are likewise regular.

Note –

Information blocks are appropriate for mining since they make mining quicker. The

cells of an n-dimensional information cuboid relate to the predicate cells.

2. Using powerful discretization of quantitative traits :

● Known as mining Quantitative Association Rules.

● Numeric properties are progressively discretized.

Example –:
age(X, "20..25") Λ income(X, "30K..41K")buys ( X, "Laptop
Computer")

3. Grid FOR TUPLES :

Using distance based discretization with bunching –

This id dynamic discretization measure that considers the distance between

information focuses. It includes a two stage mining measure as following.

● Perform bunching to discover the time period included.

● Get affiliation rules via looking for gatherings of groups that happen

together.

The resultant guidelines may fulfill –

● Bunches in the standard precursor are unequivocally connected with

groups of rules in the subsequent.

● Bunches in the forerunner happen together.

● Bunches in the ensuing happen together.

You might also like