0% found this document useful (0 votes)

19 views7 pages

Vertical Data Format For Frequent Pattern Mining

Frequent Pattern Mining (FPM) is a data mining technique that identifies recurring patterns within datasets, crucial for applications like market basket analysis and fraud detection. The vertical data format enhances FPM by representing transactional data in a way that allows efficient pattern mining through TID-lists, reducing computational overhead. While it offers scalability and efficiency, challenges include high memory usage and inefficiencies with sparse datasets, suggesting a need for hybrid approaches and optimizations.

Uploaded by

elshaday desalegn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Vertical Data Format For Frequent Pattern Mining

Uploaded by

elshaday desalegn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1 Introduction

1.1 What is Frequent Pattern Mining?

Frequent Pattern Mining (FPM) is a technique within data mining, which is the process
of extracting meaningful patterns, trends, and insights from large datasets. Data mining
uses methods from statistics, machine learning, and database systems to uncover hidden
relationships and support decision-making across various domains.

FPM identifies recurring patterns, associations, or structures within a dataset. These

patterns often represent relationships among items, events, or attributes that appear to-
gether frequently. FPM is fundamental for discovering useful insights in transactional
databases, time-series data, and other structured datasets. It serves as the basis for ad-
vanced mining techniques like association rule mining, sequential pattern mining, and
graph mining.

The significance of FPM lies in its ability to extract actionable knowledge from large
datasets. It is widely used in areas such as market basket analysis, recommendation
systems, fraud detection, and bioinformatics. For example, it can identify frequently
purchased product combinations in retail, which businesses can leverage for cross-selling
strategies (Fournier-Viger et al., 2022).

1.2 What is vertical data format?

The vertical data format is a representation of transactional data where each item is
associated with a list of transaction IDs (TIDs) in which it appears. This format differs
from the horizontal format, where transactions are listed as collections of items. The ver-
tical format is especially beneficial for algorithms that rely on intersections of TID lists to
identify frequent itemsets, as it reduces database scans and computational overhead. The
vertical data format transforms transactional data into a structure that enables efficient
pattern mining through set intersections, offering significant performance gains in dense
datasets (Leung et al., 2018).

The vertical data format is widely used in frequent pattern mining due to its efficiency

1
and scalability. It represents each item as a Transaction ID List (TID-list), simplifying
support counting by performing intersections of TID-lists instead of repeatedly scanning
the database. This format reduces computational overhead and is particularly beneficial
for dense datasets, where frequent items are numerous. Algorithms like ECLAT leverage
the vertical format to discover frequent itemsets efficiently, making it suitable for applica-
tions such as market basket analysis, web usage mining, and bioinformatics. Additionally,
the vertical format supports parallelization and incremental mining, enabling scalability
for large or dynamic datasets. Its compact representation and compatibility with hybrid
approaches further enhance its utility in frequent pattern mining tasks (Dwivedi & Satti,
2015; Meenakshi, 2015)..

2 How Vertical Data Format works

The vertical data format algorithm for frequent pattern mining involves the following
steps:

2.1 Convert Dataset to Vertical Format

The horizontal transaction dataset is transformed into a vertical format. Each item is
associated with a Transaction ID List (TID-list), which contains all transactions where
the item appears. For example:

• Item A: {1, 2, 3}

• Item B: {1, 3, 4}

• Item C: {2, 3, 4}

2.2 Identify Frequent Single Items

1. Calculate the support for each item by counting the number of transactions in its
TID-list.

2. Retain items with support greater than or equal to the minimum support threshold
(min-sup).

2
2.2.1 How to choose minimum support threshold

When choosing the minimum support threshold, balancing the trade-off between the
following two is needed:

• A lower minimum support threshold may yield too many patterns, many of which
might be irrelevant.

• A higher minimum support threshold reduces computational complexity but risks

missing important patterns.

2.3 Generate Candidate Itemsets

1. Combine frequent k-itemsets to generate candidate (k + 1)-itemsets.

2. Compute the TID-list for each candidate by intersecting the TID-lists of its subsets.

2.4 Prune Non-Frequent Itemsets

For each candidate (k + 1)-itemset, calculate its support by determining the size of its
TID-list. Retain only those candidates that meet the minimum support threshold.

2.5 Repeat Until No More Candidates Are Generated

The process continues iteratively, generating larger itemsets and pruning non-frequent
ones, until no further frequent itemsets can be identified.

3 Demonstration of how Vertical Data Format works

Let’s take an online retail store (like Amazon) tracking what customers add to their
shopping carts. The store wants to identify frequently purchased product combinations
to improve recommendations and promotions.

Dataset (Horizontal Format)

Here’s a dataset showing what customers purchased in each transaction:

3
Transaction ID (TID) Items
1 Laptop, Mouse, Headset
2 Laptop, Mouse
3 Laptop, Headset
4 Mouse, Headset
5 Laptop, Mouse, Headset

Table 1: Example dataset in horizontal format

Step 1: Convert to Vertical Format

Transform the dataset into a vertical format where each product is associated with the
transactions it appears in.

Item TID-List
Laptop {1, 2, 3, 5}
Mouse {1, 2, 4, 5}
Headset {1, 3, 4, 5}

Table 2: Dataset converted to vertical format

Step 2: Identify Frequent Single Items

Calculate the support (number of transactions) for each item. Assume min-sup = 2.

Item Support Frequent?

Laptop 4 Yes
Mouse 4 Yes
Headset 4 Yes

Table 3: Frequent single items

Step 3: Generate Frequent 2-Itemsets

Combine frequent single items into pairs and calculate support by intersecting their TID-
lists.

4
Itemset TID-List (Intersection) Support Frequent?
{Laptop, Mouse} {1, 2, 5} 3 Yes
{Laptop, Headset} {1, 3, 5} 3 Yes
{Mouse, Headset} {1, 4, 5} 3 Yes

Table 4: Frequent 2-itemsets

Step 4: Generate Frequent 3-Itemsets

Combine frequent 2-itemsets into 3-itemsets and calculate support by intersecting their
TID-lists.

Itemset TID-List (Intersection) Support Frequent?

{Laptop, Mouse, Headset} {1, 5} 2 Yes

Table 5: Frequent 3-itemsets

Step 5: Final Frequent Patterns

The frequent patterns (with support) are:

• Single items: {Laptop: 4}, {Mouse: 4}, {Headset: 4}

• 2-itemsets: {Laptop, Mouse: 3}, {Laptop, Headset: 3}, {Mouse, Headset: 3}

• 3-itemset: {Laptop, Mouse, Headset: 2}

Step 6: How this helps

The store can now use these patterns for actionable insights:

• Product Recommendations: If a customer buys a Laptop and Mouse, recom-

mend a Headset.

• Bundling Discounts: Offer a discount for buying Laptop, Mouse, and Headset
together, as they frequently co-occur.

• Stock Optimization: Ensure the Mouse and Headset are available alongside Lap-
tops to meet customer demand.

5
4 Conclusion

The vertical data format in frequent pattern mining offers several strengths. It enables
efficient support counting by using TID-lists and set intersections, reducing the need for
multiple database scans and improving computational efficiency, particularly for dense
datasets. It also provides a compact representation, minimizing redundancy by associ-
ating each item only with the transactions in which it appears. This format is scalable,
well-suited for parallel processing, and effective for dense datasets, where it avoids unnec-
essary computations on infrequent items. Additionally, it supports incremental mining,
adapting easily to changes in data.

However, it has some weaknesses. Storing TID-lists for every item can lead to high mem-
ory usage, particularly for large datasets. In sparse datasets, TID-list intersections can
become inefficient, as they involve many short or empty lists, which increases computa-
tional costs. Moreover, for long frequent itemsets, the number of candidate combinations
and TID-list intersections grows exponentially, resulting in high computational overhead.

The way forward for the vertical data format involves adopting hybrid approaches that
combine it with other formats to balance memory and efficiency, using optimization
techniques like compressed TID-lists, and using parallel computing frameworks for large
datasets. Additionally, adapting algorithms for sparse data and expanding the format to
support advanced mining tasks could enhance its applicability and performance.

5 References

1. Leung, C. K., Zhang, H., Souza, J., Lee, W. (2018). Scalable vertical mining for big
data analytics of frequent itemsets. In Database and Expert Systems Applications:
29th International Conference, DEXA 2018, Regensburg, Germany, September 3–6,
2018, Proceedings, Part I 29 (pp. 3-17). Springer International Publishing.

2. Dwivedi, N., Satti, S. R. (2015). Vertical-format Based Frequent Pattern Mining-A

Hybrid Approach. Journal of Intelligent Computing, 6(4), 119.

3. Mohsin, M., Ahmed, M. R., Ahmed, T. (2016). Closed frequent pattern mining

6
using vertical data format: depth first approach. IJSSET, 2(3), 230-238.

4. Meenakshi, A. (2015). Survey of frequent pattern mining algorithms in horizontal

and vertical data layouts. Int J Adv Comput Sci Technol, 4(4).

5. Fournier-Viger, P., Gan, W., Wu, Y., Nouioua, M., Song, W., Truong, T., Duong,
H. (2022, April). Pattern mining: Current challenges and opportunities. In Inter-
national Conference on Database Systems for Advanced Applications (pp. 34-49).
Cham: Springer International Publishing.

E Advanced Service Functional Blocks C-Arm C-Arm
100% (3)
E Advanced Service Functional Blocks C-Arm C-Arm
78 pages
Functions: Coding Blocks
100% (1)
Functions: Coding Blocks
50 pages
Matlab 6
No ratings yet
Matlab 6
296 pages
150,000+ Free Fonts - Download Now - FFonts
No ratings yet
150,000+ Free Fonts - Download Now - FFonts
5 pages
Mining Frequent Itemsets Using Vertical Data Format
No ratings yet
Mining Frequent Itemsets Using Vertical Data Format
14 pages
Nokia IP Routing Portfolio Poster Graphic en
No ratings yet
Nokia IP Routing Portfolio Poster Graphic en
1 page
New, Changed, and Deprecated Features For Microsoft Dynamics AX 2012
No ratings yet
New, Changed, and Deprecated Features For Microsoft Dynamics AX 2012
207 pages
Optical Transmission Modes, Layers and Protocols: Synchronous Networks
No ratings yet
Optical Transmission Modes, Layers and Protocols: Synchronous Networks
15 pages
Analog Testing 02
0% (1)
Analog Testing 02
39 pages
Case Study Sony Corporation
100% (2)
Case Study Sony Corporation
5 pages
LNCS 3181 Improving Direct Counting For Frequent Itemset Mining 1st Edition by Adriana Prado, Cristiane Targa, Alexandre Plastino ISBN 354022937X 978-3540229377 Instant Download
No ratings yet
LNCS 3181 Improving Direct Counting For Frequent Itemset Mining 1st Edition by Adriana Prado, Cristiane Targa, Alexandre Plastino ISBN 354022937X 978-3540229377 Instant Download
44 pages
XII - IP - 2013-14 - Guwahati Region
No ratings yet
XII - IP - 2013-14 - Guwahati Region
137 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
C TS4FI 2023-Demo
No ratings yet
C TS4FI 2023-Demo
5 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
tms320f28377d (데이터시트)
No ratings yet
tms320f28377d (데이터시트)
253 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Unit 2
No ratings yet
Unit 2
65 pages
Photoshop Cc2014 Shortcuts PC
No ratings yet
Photoshop Cc2014 Shortcuts PC
1 page
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
DWDS Unit 4
No ratings yet
DWDS Unit 4
56 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
DM 2
No ratings yet
DM 2
71 pages
Module 3 - Part 2 - Frequency Pattern Mining
No ratings yet
Module 3 - Part 2 - Frequency Pattern Mining
51 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Survey - Itemset - Mining
No ratings yet
Survey - Itemset - Mining
41 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
Plke VF Manteinance
No ratings yet
Plke VF Manteinance
17 pages
Unit 3
No ratings yet
Unit 3
62 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
2007 Jiawei Han FP Mining
No ratings yet
2007 Jiawei Han FP Mining
32 pages
Association
No ratings yet
Association
40 pages
Chapter 3
No ratings yet
Chapter 3
32 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Query Processing and Optimization: Dessalegn Mequanint
No ratings yet
Query Processing and Optimization: Dessalegn Mequanint
31 pages
Lab: Title: Lab Objectives:: Int Agefrequency (Totalyears) //reserves Memory For 100 Ints
100% (1)
Lab: Title: Lab Objectives:: Int Agefrequency (Totalyears) //reserves Memory For 100 Ints
22 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Cs3401 - Algorithm Lab Record
No ratings yet
Cs3401 - Algorithm Lab Record
57 pages
Week 3
No ratings yet
Week 3
56 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Association Rules
No ratings yet
Association Rules
20 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
TM-ENS-003 FD322 Basic Training Presentation Rev A
No ratings yet
TM-ENS-003 FD322 Basic Training Presentation Rev A
94 pages
Introduction To Data Mining - Lecture03
No ratings yet
Introduction To Data Mining - Lecture03
23 pages
Change Log
No ratings yet
Change Log
75 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Frequent Itemset Mining A 25 Years Review-2019
No ratings yet
Frequent Itemset Mining A 25 Years Review-2019
15 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Case Study - Alembic Pharma
No ratings yet
Case Study - Alembic Pharma
3 pages
Modified Frequent Pattern Mining From Data Stream
No ratings yet
Modified Frequent Pattern Mining From Data Stream
38 pages
Algorithms For Frequent Itemset Mining: A Literature Review
No ratings yet
Algorithms For Frequent Itemset Mining: A Literature Review
19 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Manual - Cable Reels 1400 Series
No ratings yet
Manual - Cable Reels 1400 Series
26 pages
Microvision2 Hardware Manual SP101015.100
No ratings yet
Microvision2 Hardware Manual SP101015.100
50 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
A Survey of Correlated High Utility Pattern Mining
No ratings yet
A Survey of Correlated High Utility Pattern Mining
15 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Data 07 00011
No ratings yet
Data 07 00011
22 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
Httpsmygju - Gju.edu - Jofacescourse Portfoliocourse Syllabuscourse Syllabus - XHTML 2
No ratings yet
Httpsmygju - Gju.edu - Jofacescourse Portfoliocourse Syllabuscourse Syllabus - XHTML 2
15 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Customer One Reason Codes Definitions
No ratings yet
Customer One Reason Codes Definitions
22 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Veloso Sbac03
No ratings yet
Veloso Sbac03
8 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Discover Frequent Items in Small Stationary
No ratings yet
Discover Frequent Items in Small Stationary
16 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Unit-03 DW&DM Notes Ashish Singh PDF 11
No ratings yet
Unit-03 DW&DM Notes Ashish Singh PDF 11
8 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Efficient Frequent Itemset Mining Mechanism Using Support Count
No ratings yet
Efficient Frequent Itemset Mining Mechanism Using Support Count
7 pages
Research 02
No ratings yet
Research 02
9 pages
Instalacion 4090 9121
No ratings yet
Instalacion 4090 9121
4 pages
Lab: Title: Lab Objectives:: Int Main Pow SQRT #Include
No ratings yet
Lab: Title: Lab Objectives:: Int Main Pow SQRT #Include
19 pages
Template Module Content
No ratings yet
Template Module Content
2 pages
Lab: Title: Lab Objectives:: Pointer Variables
No ratings yet
Lab: Title: Lab Objectives:: Pointer Variables
14 pages
04 Snehal Fadale Assignment 2
No ratings yet
04 Snehal Fadale Assignment 2
10 pages
Live Migration
No ratings yet
Live Migration
10 pages
Lab: Title: Lab Objectives:: Variable Definition Information Held
No ratings yet
Lab: Title: Lab Objectives:: Variable Definition Information Held
10 pages
Csci5561 Spring2025 hw3
No ratings yet
Csci5561 Spring2025 hw3
8 pages
Lab: Title: Lab Objectives:: Type Function - Name (Optional Parameter List) (Function Code Return Value )
No ratings yet
Lab: Title: Lab Objectives:: Type Function - Name (Optional Parameter List) (Function Code Return Value )
8 pages
Internet Safety Theory Assessment
No ratings yet
Internet Safety Theory Assessment
4 pages
A Study On Consumer Buying Behaviour and Level of User Satisfaction of Laptop With Special Reference To Coimbatore City
No ratings yet
A Study On Consumer Buying Behaviour and Level of User Satisfaction of Laptop With Special Reference To Coimbatore City
7 pages
Three (3) PHD Positions in Positron Emission Tomography Quantification Department
No ratings yet
Three (3) PHD Positions in Positron Emission Tomography Quantification Department
1 page
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Vertical Data Format For Frequent Pattern Mining

Uploaded by

Vertical Data Format For Frequent Pattern Mining

Uploaded by

1 Introduction

1.1 What is Frequent Pattern Mining?

FPM identifies recurring patterns, associations, or structures within a dataset. These

1.2 What is vertical data format?

2 How Vertical Data Format works

2.1 Convert Dataset to Vertical Format

2.2 Identify Frequent Single Items

• A higher minimum support threshold reduces computational complexity but risks

2.3 Generate Candidate Itemsets

1. Combine frequent k-itemsets to generate candidate (k + 1)-itemsets.

2.4 Prune Non-Frequent Itemsets

2.5 Repeat Until No More Candidates Are Generated

3 Demonstration of how Vertical Data Format works

Dataset (Horizontal Format)

Here’s a dataset showing what customers purchased in each transaction:

Table 1: Example dataset in horizontal format

Step 1: Convert to Vertical Format

Table 2: Dataset converted to vertical format

Step 2: Identify Frequent Single Items

Item Support Frequent?

Table 3: Frequent single items

Step 3: Generate Frequent 2-Itemsets

Table 4: Frequent 2-itemsets

Step 4: Generate Frequent 3-Itemsets

Itemset TID-List (Intersection) Support Frequent?

Table 5: Frequent 3-itemsets

Step 5: Final Frequent Patterns

The frequent patterns (with support) are:

• Single items: {Laptop: 4}, {Mouse: 4}, {Headset: 4}

• 2-itemsets: {Laptop, Mouse: 3}, {Laptop, Headset: 3}, {Mouse, Headset: 3}

• 3-itemset: {Laptop, Mouse, Headset: 2}

Step 6: How this helps

• Product Recommendations: If a customer buys a Laptop and Mouse, recom-

2. Dwivedi, N., Satti, S. R. (2015). Vertical-format Based Frequent Pattern Mining-A

4. Meenakshi, A. (2015). Survey of frequent pattern mining algorithms in horizontal

You might also like