0% found this document useful (0 votes)
69 views6 pages

A Novel Weighted FP-Stream Algorithm For IoT Data Streams

This document summarizes a research paper presented at the 2020 IEEE International Conference on Big Data about a novel weighted FP-Stream algorithm for analyzing IoT data streams. The paper proposes enhancements to the conventional FP-Stream algorithm to make it more adaptive to concept drifts while retaining applicability to data streams. Specifically, it adds weights during pattern pruning based on pattern freshness, prioritizing newer patterns and allowing older patterns to be forgotten more quickly. The performance of the proposed algorithm is evaluated using data from an IoT testbed and is shown to perform better than conventional FP-Stream at handling concept drifts in streaming IoT data.

Uploaded by

戴积文
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views6 pages

A Novel Weighted FP-Stream Algorithm For IoT Data Streams

This document summarizes a research paper presented at the 2020 IEEE International Conference on Big Data about a novel weighted FP-Stream algorithm for analyzing IoT data streams. The paper proposes enhancements to the conventional FP-Stream algorithm to make it more adaptive to concept drifts while retaining applicability to data streams. Specifically, it adds weights during pattern pruning based on pattern freshness, prioritizing newer patterns and allowing older patterns to be forgotten more quickly. The performance of the proposed algorithm is evaluated using data from an IoT testbed and is shown to perform better than conventional FP-Stream at handling concept drifts in streaming IoT data.

Uploaded by

戴积文
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2020 IEEE International Conference on Big Data (Big Data)

A Novel Weighted FP-Stream Algorithm for IoT Data


Streams
Halil Ibrahim DEDE Cemile TIMURKAAN Metehan GUZEL Suat OZDEMIR
Computer Engineering Dept. Computer Engineering Dept. Graduate School of Natural and Computer Engineering Dept.
2020 IEEE International Conference on Big Data (Big Data) | 978-1-7281-6251-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/BigData50022.2020.9378069

Gazi University Gazi University Applied Sciences Hacettepe University


Ankara, Turkey Ankara, Turkey Gazi University Ankara, Turkey
[email protected] [email protected] Ankara, Turkey [email protected]
[email protected]

Abstract—The Internet of Things (IoT) is a technology that is numerous concepts are introduced to literature, such as data
being widely used in daily life. This technology makes it easier for stream mining, big data, stream data analysis [1].
devices to connect with each other. As a result of the high
connectivity between devices, enormous volumes of data are being IoT networks create a virtual representation of the real world
collected. Such data is called big streaming data which can be used by using numerous sensors. Detection of recurring events can
to curate useful information by data mining techniques. One of the be used for predicting events or errors. By exploring meaningful
most used processing methods is called Frequent Itemset (Pattern) relations between events, more complex events can be
Mining (FIM) which detects recurring and common patterns over formulated. For this purpose, association rules are
data streams. In this paper, a new algorithm based on frequently used. Association rule mining structures have been created to
used FP-Stream algorithm is presented. The proposed algorithm ensure sensitive systems to work at high performance. It ensures
enhances conventional FP-Stream algorithm to make it more that the data is related to each other. It is absolutely necessary
adaptive to concept drifts when retaining its applicability to data to work on a database. The minimum support and minimum
streams. Conventional FP-Stream algorithms store all detected confidence values determine how strong the relationship is.
patterns. By adding weights during the pruning process based on Association rules can produce a single output to be used or an
pattern freshness, the proposed algorithm prioritizes newer output that can be an input to other mining operations. One of
patterns thereby learns new patterns and forgets older one swiftly. the oldest and one of the most frequently used algorithms for
Performance evaluations are performed using data acquired from association rule mining is the Apriori algorithm.
an IoT testbed established in KAVEM Lab of Gazi University.
Evaluation results indicate that the proposed algorithm performs Apriori algorithm requires multiple passes on data to be able
better than conventional FP-Stream significantly. to detect relations in-between. But in data streams where data
volume is great and data streams are continuous, it is not
Keywords—streaming data mining; frequent patterns; feasible to process a data point multiple times. Therefore, for
logarithmic tilted-time window; internet of things; tail pruning; streaming data, algorithms that require minimal number of
weighting passes on data are needed. For this purpose, FP-Growth
I. INTRODUCTION algorithm and data structure used for FP-Growth, namely FP-
Tree is proposed [2]. By performing reduction operations with
Internet of Things (IoT) is a technology that helps all objects FP-Growth algorithm, higher performance and more frequent
communicate with each other. IoT aims to improve the quality patterns are obtained [3].
of life. IoT obtains data from related objects and contributes for
users with meaningful information [1]. IoT is a term used for a A search of the literature revealed that for datasets which
network composed of numerous objects that are highly include identical transactions with high numbers, pruning of old
connected and penetrates into daily life in a pervasive manner. patterns takes a considerable high time in addition to lack of
Services and applications developed upon IoT increases the ability to detect new ones. To overcome this problem, a weight
quality of life for humans. It is estimated that by 2020 there will parameter is introduced to the conventional FP-Stream
be 50 billion connected devices in the Internet of Things (IoT) algorithm. In addition to the weight parameter, the proposed
networks [1]. By connecting all these devices together, it Weighted FP-Stream algorithm decreases storage used for
becomes easier to process and obtain information. Using these patterns, therefore effectively reduces memory and time
capabilities, numerous smart applications are introduced to our complexities and prioritizes recent transactions. In short, the
lives, like smart applications, smart homes, smart cities etc. proposed algorithm is faster and possesses higher ability to keep
Since IoT systems are always running, large volumes of data up with concept drifts. This research is made specifically for
are generated continuously. The processing of this data is data streaming mining applications over IoT. Data used to test
crucial to improve life quality further. For this purpose, the Weighted FP-Stream algorithm are collected from KAVEM
Lab at Gazi University. Data are preprocessed and used to
compare the proposed algorithm to the conventional FP-Stream

978-1-7281-6251-5/20/$31.00 ©2020 IEEE 4553


Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
algorithm. Test results indicate that the proposed algorithm is property. With this research, the process of mining with
able to detect recent patterns with higher accuracy in a shorter dynamic data has accelerated.
time and using less memory.
In another research about frequent pattern mining in
The rest of the paper is organized as follows. In Section 2, streaming data [9], by weighting method is achieved high
we summarize the related works. In Section 3, we introduce our performance in noisy data. In frequent pattern mining that
proposed Weighted FP-Stream algorithm. In Section 4, we provides anti monotonic work, there is an early pruning of
briefly introduce the dataset used in our research and data infrequent patterns. Owing to the weighting, stricter rules are
preprocessing performed. In Section 5, we compare FP-Stream applied to the noisy data and the patterns obtained are
and Weighted FP-Stream algorithms. In Section 6, we evaluate frequently patterned. The research is done in both noisy and
the Weighted FP-Stream algorithm’s performance and finally noiseless data and high performance is achieved in both data.
in Section 7, we conclude our work.
Tail pruning is removed and less mining processing is
II. RELATED WORKS performed in the study called "Shaking Points Structure" [10]
for developing the FP-Stream algorithm. However, it has
Important research work has been done to improve the FP- increased the memory requirement of the system. On the other
Stream algorithm and to develop other algorithms using the hand, the time for the prune process is reduced, so the process
weight parameter. We have summarized the important ones in cost is balanced. The prune operation is done by keeping the
this section. frequency information of each node in the tree structure and
In association rule mining problems, the weighted removing the values below the determined threshold value from
downward closure feature has emerged along with the idea of the structure. Threshold value is updated by reducing if the
using the weight parameter. Owing to this, the rules are related pattern is not found in the incoming transaction. In this
prioritized. The minimum support value is prevented from structure, a sliding window method is used. In another FP-Tree
being large values and transactions are made with the weight development research [11], the FP-Tree algorithm is developed
parameter. Thus, more effective and beneficial results are by using normalized weight values. This research has similar
obtained. This method makes it possible to enable post- features to the work we have done. However, the storage of
processing. In the research, maintenance is performed [4]. In transactions is made by using the sliding window structure. In
the research of WMFP (Weighted Maximal Frequent Item) [5], addition, sub-frequent patterns were not examined. Ranges to
only the patterns obtained through streaming data mining were normalize the weight parameter are given by the user as input.
selected for the purpose. In this research, conducted with In the SSM algorithm research [12], data streaming mining
decreasing minimum support value, frequent patterns were operations were performed for sequential patterns. In this
observed to increase. With the MFP (Maximum Frequent research, D-List, PLWAP-Tree and FSP-Tree structures are
Pattern) algorithm, various compressions are performed for used together. Similar patterns are obtained with the FP-Stream
frequent patterns without adding the weight parameter. This algorithm. It is thought that performance would increase by
provides a performance increase by preventing the obtaining of using sliding window structure.
patterns that are more likely not to be used. Nodes could have
gone up or down in some situations. There is not only prune III. THE PROPOSED ALGORITHM
operation on the nodes, but also the displacement of the nodes. The aim of the FP-Stream algorithm is mining frequent
In this research, which is based on MWS (Maximal Frequent patterns more efficiently. FP-Stream algorithm scans each data
Pattern Mining with Weight Conditions over Data Streams), point once and counts the number of items. Count of the item is
weighting is performed for patterns and high performance is called the support value. By comparing each items’ support,
obtained by reducing the number of scans. Especially in single minimum support and maximum error thresholds, frequency of
path studies has increased performance considerably. In the each item is calculated. FP-Stream algorithm mines only sub-
single-pass Weighted Frequent Pattern Mining research [6], frequent and frequent items [13]. By using the weight
weighting is performed in ascending weight order (IWFPTWA) parameter, recent items are made more dominant than older
and decreasing frequency order (IWFPTFD). They have ones. Introduction of weight parameters results with higher
provided performance increase by creating candidate patterns. performance and less memory usage.
With these processes, scalability has increased in structures
using incremental databases. In another research, using the In this section, information is given about the algorithm
weighting method [7], WFPMDS (Weighted Frequent Pattern used. Preliminaries, FP-Stream algorithm and the proposed
Mining over Data Stream) technique is used to obtain frequent Weighted FP-Stream algorithm are examined.
patterns in streaming data mining. With this technique, frequent A. Preliminaries
patterns are prioritized, eliminating the need for multiple scans FP-Stream is an algorithm for mining streaming data. This
as new data arrives. Single-pass and sliding windows structures algorithm, detects frequent patterns and creates a structure
are used in this structure. The DSWFP model [8] is developed which updates itself dynamically. The terms and their
for the development of the WPF algorithm. Sliding window explanations in the FP-Stream algorithm [13] that will be used
structure is used. Unlike other researches, it has been searched in the Weighted FP-Stream algorithm and the other terms are in
to keep up with the streaming data rate and to be more stable. It Table I. Weight, average window weight and average weight
is developed especially to keep up with the speed of streaming parameters are used only for the Weighted FP-Stream
sensor and web data. With its processed weight information, it algorithm. The other parameters are used by both algorithms.
performs double pruning that provides downward closure Weight parameter is normalized within the range [0-1]. By this

4554
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
way, it makes it easier to work with outlier values. The average TABLE III. LOGARITHMIC TILTED-TIME WINDOW
weight parameter is 0.5 which is the average of minimum and f(1,1) []
maximum weight values. Weight parameter updates f(2,2) [] f(1,1) []
dynamically based on scanning as a new batch arrives. f(2,2)
f(3,3) []
[f(1,1)]
TABLE I. PRELIMINARIES OF FP-STREAM AND WEIGHTED FP-STREAM
Term Explanation f(4,4) [] f(3,3) [] f(2,1) []
I Itemset, combination of single items ia f(4,4)
f(5,5) [] f(2,1) []
σ Minimum support [f(3,3)]
ε Maximum support error f(4,3)
f(6,6) [] f(5,5) []
[f(2,1)]
T Time period
f(6,6) f(4,3)
F Frequency f(7,7) []
[f(5,5)] [f(2,1)]
w Weight
f(8,8) [] f(7,7) [] f(6,5) [] f(4,1) []
wi Window
f(8,8)
wi w Average window weight f(9,9) [] f(6,5) [] f(4,1) []
[f(7,7)]
w Average weight f(8,7)
f(10,10) [] f(9,9) [] f(4,1) []
[f(6,5)]
f(10,10) f(8,7)
In the FP-Stream structure, frequent and sub-frequent f(11,11) [] f(4,1) []
[f(9,9)] [f(6,5)]
patterns are captured using FP-Tree structure and the f(8,5)
f(12,12) [] f(11,11) [] f(10,9) []
logarithmic tilted-time windows store these patterns. In Table [f(4,1)]
II, the frequency conditions of the patterns are given. The f(12,12) f(8,5)
f(13,13) [] f(10,9) []
purpose of storing sub-frequent patterns is the possibility that [f(11,11)] [f(4,1)]
f(12,11) f(8,5)
these patterns may become frequent in the future. f(14,14) [] f(13,13) []
[f(10,9)] [f(4,1)]
TABLE II. PATTERN CATEGORIZATION ON FP-STREAM f(14,14) f(12,11) f(8,5)
f(15,15) []
Pattern Categories [f(13,13)] [f(10,9)] [f(4,1)]
f(16,16) [] f(15,15) [] f(14,13) [] f(12,9) [] f(8,1) []
Frequent support>
f(16,16)
f(17,17) [] f(14,13) [] f(12,9) [] f(8,1) []
Sub-Frequent support < and support ≥ [f(15,15)]
Infrequent support<
The first added transaction is held alone in the first unit. At
the next level, if the buffer is empty for the next unit, the old
FP-Stream algorithm includes FP-Tree and logarithmic tilted- first transaction is transferred directly to that unit and the
time window structures. With the logarithmic tilted-time intermediate buffer in the unit is transferred to the buffer. If
window, frequent patterns are stored with certain compressions there is no free space, the batch to be transferred and the next
to save memory space. The logarithmic reduction of the number batch are compressed and transferred to the next unit. This
of units held in the structure is stored by keeping the windows continues until all the batches have settled. The formula for
in a logarithmic manner. For example, 366 x 24 x 4 = 35,136 finding the number of units in Equation 1, including the
units are needed in a natural tilted-time window for an annual frequency value of n, is used [13].
data retention. Instead of this, the logarithmic tilted-time
window structure can perform the same operation as ⌈log N+1⌉ (1)
log (365x24x4) + 1 ≈ 17 units. For each division operation,
fixed size batches are used. Tail pruning is done with the T
Incoming batch is transferred to the FP-Tree structure and
information and ε parameter, and mining operations are done
patterns are determined in accordance with the FP-Growth
on the FP-Tree with the FP-Growth algorithm [13].
algorithm. With the f_list formation, a structure is created that
1) FP-Stream algorithm: The FP-Stream algorithm aims keeps information on the usage frequency of the data and the
to find frequent patterns in data streams. It includes FP-Tree and data sequences accordingly. If all data in the incoming batch are
FP-Growth algorithms. FP-Stream trees contain the tilted-time added and if the incoming itemset is in the FP-Tree structure,
window and support value information of that value in each the corresponding batch is added to the logarithmic tilted-time
node. According to the minimum support and maximum table for the related itemset. Tail pruning is performed. If the
support error values, it is decided whether the items in the table is empty, the mining process is completed as a result of
related itemset are frequent or not. The processed data are the FP-Growth algorithm. Thus, FP-Stream structure is created.
collected in batches according to their frequency and with the Depth-first search is performed in the created structure and if
mining is not performed in the incoming batch, zero is added to
structure called tilted-time window, the data in batches are kept
the related itemset. Tail pruning continues. When the processes
in memory [13]. are completed, frequent patterns are observed in the created
The use of a tilted-time window allows the units held in structure [13].
memory to be reduced. This window uses buffering. Ease of
operation and increased performance are provided by keeping 2) FP-Tree algorithm: It is an algorithm that provides
the batches together. Batches are held logarithmically. They are frequent pattern finding with the FP-Tree structure established
given in Table III. to reduce the number of scans of the Apriori algorithm. The
Apriori algorithm cannot achieve accurate results with a small

4555
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
amount of data. Cartesian products are used in the Apriori b. If I is not in the structure
algorithm, so it increases the cost of calculating and storing the i. The frequency of that batch is not less than
patterns obtained. The FP-Tree algorithm responds to the need maximum support error and the batch size, then
to create a structure that can be updated dynamically to find insert I into the structure. Otherwise stop mining
frequent patterns. It is quite easy to sort the frequency of the and use tail pruning.
patterns [14, 15]. ii. Scan the structure by depth-first search. If
For using this structure, first of all, a blank FP-Tree structure any of the nodes has no children, it will be a leaf
is created. The minimum support value is determined. Then, all node.
data in the database are scanned once and support value is found
for each item. Items that are not smaller than the minimum Fig 1. Pseudocode for Weighted FP-Stream algorithm
support value are frequent and listed in f_list in descending
order. Every transaction in the f_list is added to the tree IV. DATA PREPROCESSING
structure according to the item frequencies. For each added The collected sensor data have included the timestamp and
item, the value of that node increases by one [15]. the corresponding sensor values. In order to find frequent
patterns by FP-Stream algorithm, these data should be
3) FP-Growth algorithm: The mining process of the FP- categorized. The sensors have different measurement
Tree structure is performed with the FP-Growth algorithm. frequencies. Therefore, data have grouped in 10-minute time
Operations begin with the item with the least frequency in the units, averaging the same type of sensor readings. Grouping has
f_list structure. If the related item is on a single path, the been done by taking into account the standard deviation of each
frequent pattern created by all items up to the root node is taken sensor data. Indoor air quality sensor, temperature, humidity,
and the lowest support value becomes the conditional pattern- light density, sharp and PIR sensors’ readings are kept in
base value of that pattern. If there is more than one branch in streaming batches. Standard deviations were found according to
that item, the conditional pattern-base is equal to the number of the values of each sensor feature, and the data of each feature
branches formed and the same operations are performed for were grouped and categorized. Prepared data were given as an
each branch. For each item, the values of all items in the input to the encoded FP-Stream algorithm and frequent patterns
conditional pattern-base are examined. Patterns are larger than were found.
the minimum support value form the conditional FP-tree As mentioned before in this paper, real sensor data are
structure. FP-Growth algorithm uses divide-and-conquer collected from KAVEM lab of Gazi University is used. The
method. So, data mining provides higher performance operation collection of unprocessed data were in Figure 2. Basically, there
than the Apriori algorithm [15]. were time, sensor and sensor values. Temperature, indoor air
B. Weighted FP-Stream Algorithm quality, light density, humidity, PIR and sharp sensors have
held together in the related data. During the data processing,
The purpose of this algorithm is to increase the importance records containing 2 sensor values are separated. Noise and
of the current batch of frequent patterns. In the FP-Stream outlier data were determined and necessary arrangements were
algorithm, it is observed that after the frequent patterns obtained made.
with large amounts of data, the algorithm should run millions
of times in order that fresh patterns have been more dominant
than the old patterns. By using the weight parameter of the tail
pruning process, which will prevent this situation and ensure
the freshness of the data, it is provided to accept more current
data as a frequent pattern. The algorithm is presented in Figure
1.
1. An FP-Stream structure
Fig 2. Unprocessed sensor data
INPUT 2. σ, ε and batch size thresholds
3. Incoming batch to store transactions
There were some outliers on light density sensor data. They
1. Updated FP-Stream structure had to be greater than 0 but there were 6 values less than zero.
OUTPUT
2. Frequent patterns It is caused by sensor failure. These values are dropped.
1. Initialize an empty FP-tree.
Grouping the relevant data at 10-minute intervals and
2. Sort each item with their frequency on f_list
creating a feature for each sensor data were performed. If a large
and insert all transactions on FP-tree structure (only
amount of data were collected from a sensor in 10 minutes, their
f_list items are inserted) by FP-Growth algorithm. averages were taken. Then, considering the standard deviations
3. Use tail pruning on FP-tree structure of the relevant sensor data, the data were categorized and
METHOD a. If I is in structure; itemsets to be used in the study were obtained. For
i. Add the frequency of I to the tilted categorization by grouping, value ranges used for each sensor
time window. data is given in Table IV.
ii. Conduct tail pruning by using weight factor
TABLE IV. CATEGORY RANGES OF SENSOR DATA
iii. If the table is empty, stop mining of Sensor Value Ranges
supersets, else continue to mine supersets of I.

4556
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
Indoor Air Quality 0-6
Light Density 0-25
Celcius 0-6
Humidity 0-8
Sharp 0-1
Pir 0-1

The data have been obtained as a result of combining the


data in 10-minute time intervals, creating attributes for each
sensor and categorizing the data according to standard
deviations for related sensors are in Figure 3.

Fig. 3. Processed sensor data

Fig. 4. Processed data graph


The number of values obtained for the new attributes
created in line with the value ranges in Table IV were in Table
V. The date attribute is not used in the algorithm. After
preprocessing, the data are ready for the algorithm. Number of V. THE COMPARISON OF FP-STREAM AND WEIGHTED FP-
acquired sensor records for each sensor type is given in Table STREAM ALGORITHMS
V. The weighted FP-Stream algorithm contains a weight value
TABLE V. VALUE COUNTS IN SENSOR DATA unlike the original FP-Stream algorithm. Weighting process is
Sensor Non-null Data done by normalization between 0 and 1 values. These weights
date 39641 are multiplied by the frequency values of the patterns, and the
indoor air quality 39376 minimum support and maximum support error values are
light density 32889 multiplied by an average weight value of 0.5 and tail pruning
celsius 33188 takes place. Thus, the algorithm allows more frequent patterns
humidity 33190 to be kept in the pruning process. The original tail pruning
sharp 8298
formula is in Equation 2.
pir 11229

In Figure 4, the values between 1000 and 2000 indexes data “∃ , ∀ , , fI(ti) wi AND
were shown with a graph. As it could be seen in Figure 4, the ∀ ′, ′ ,∑ fI(ti)< ∑ wi.” (2)
fact that all transactions have no all sensor types caused some
disconnections in the graphic drawings. Categorizing the data Equation 2 is actually a combination of two different
at larger intervals instead of 10-minute intervals would prevent equations and their operator. However, pruning takes place with
these disconnections at a good level. The values that all the data the fulfillment of both conditions. And true if all frequency
have taken according to the relevant transaction have shown in values of an itemset are less than the product of the respective
the Jupyter Notebook study where data were preprocessed.
window size and value in the left part of the operator. And to
the right of his operator is true if the sum of all the frequency
values of an itemset is less than the product of the total window
size and value. In the event that both conditions are true, tail
pruning is performed.
The tail pruning formula in the weighted FP-Stream
algorithm developed will be in Equation 3.
“∃ , ∀ , ,fI(ti)wi wi w AND
∀ ′, ′ ,∑ fI(ti) wi ∑ wi w.” ( 3)

In the Weighted FP-Stream tree pruning formula,


frequencies are multiplied by the weight of the relevant pattern,
minimum support value and maximum support error values.
Weights are adjusted to be higher in current data and lower in

4557
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
older data. Thus, the availability of current frequent patterns has algorithm. The weight parameter is used for the tail pruning
been made more dominant in the algorithm. process to keep tree structure fresh. Weight values between 0
and 1 are used to indicate recentness of transactions. To show
VI. PERFORMANCE EVALUATION effectiveness of the Weighted FP-Stream, the algorithm is
The proposed algorithm is coded using Java Programming compared to conventional FP-Stream using the same
language. Data preprocessing parts are coded using Python parameters. Using FP-Stream algorithm 710 frequent patterns,
programming language and then all the processed data are and using our Weighted FP-Stream algorithm 58 frequent
divided into batches. The performance evaluations are done by patterns are obtained. In the Weighted FP-Stream algorithm,
these batches. The properties of the environment include a 2.60 less and more current frequent patterns are obtained as
GHz CPU, 16 GB RAM and Windows 10 OS. expected. 53 of the patterns are common in both algorithms’
As a result of using the same data and parameters, the results. The proposed algorithm can keep more recent frequent
frequent pattern amount obtained with the FP-Stream algorithm patterns with higher performance and less memory usage which
and the Weighted FP-Stream algorithm are given in Table VI. is important when the data size is big and resources are limited.
Minimum support threshold is fixed by 0.1, maximum support ACKNOWLEDGMENT
error threshold is fixed by 0.01. Batch size is fixed by 1000 and
39 batches are used. As expected, all common patterns derived This work was supported by the Scientific and
from the Weighted FP-Stream algorithm are also included in the Technological Research Council of Turkey under Grant
original FP-Stream algorithm. 118E212.
TABLE VI. FREQUENT PATTERN NUMBERS OBTAINED BY FP-
REFERENCES
STREAM AND WEIGHTED FP-STREAM ALGORITHM
Weighted FP-Stream
Algorithm FP-Stream Algorithm
Algorithm [1] Kök, İ., Şimşek, M. U., & Özdemir, S. (2017, December). A deep learning
Number of Frequent model for air quality prediction in smart cities. In 2017 IEEE International
710 58 Conference on Big Data (Big Data) (pp. 1983-1990). IEEE.
Patterns
[2] Ezeife, C. I., & Su, Y. (2002, May). Mining incremental association rules
with generalized FP-tree. In Conference of the Canadian Society for
By using the FP-Stream algorithm, there are 710 frequent Computational Studies of Intelligence (pp. 147-160). Springer, Berlin,
Heidelberg.
patterns and by using the Weighted FP-Stream algorithm, there [3] Zhang, W., Liao, H., & Zhao, N. (2008, December). Research on the FP
are 58 frequent patterns found. 53 patterns are common in both growth algorithm about association rule mining. In 2008 International
Seminar on Business and Information Management (Vol. 1, pp. 315-318).
algorithms. IEEE.
[4] Tao, F., Murtagh, F., & Farid, M. (2003, August). Weighted association
rule mining using weighted support and significance framework. In
Proceedings of the ninth ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 661-666).
[5] Yun, U., Lee, G., & Ryu, K. H. (2014). Mining maximal frequent patterns
by considering weight conditions over data streams. Knowledge-Based
Systems, 55, 49-65.
[6] Ahmed, C. F., Tanbeer, S. K., Jeong, B. S., Lee, Y. K., & Choi, H. J.
(2012). Single-pass incremental and interactive mining for weighted
frequent patterns. Expert Systems with Applications, 39(9), 7976-7994.
[7] Ahmed, C. F., Tanbeer, S. K., & Jeong, B. S. (2009, June). Efficient
mining of weighted frequent patterns over data streams. In 2009 11th
IEEE International Conference on High Performance Computing and
Communications (pp. 400-406). IEEE.
[8] J. Wang and Y. Zeng, "DSWFP: Efficient mining of weighted frequent
pattern over data streams," 2011 Eighth International Conference on
Fuzzy Systems and Knowledge Discovery (FSKD), Shanghai, 2011, pp.
942-946, doi: 10.1109/FSKD.2011.6019763.
[9] Yun, U., & Ryu, K. H. (2011). Approximate weighted frequent pattern
Fig. 5. Freshness of the patterns mining with/without noisy environments. Knowledge-Based Systems,
24(1), 73-82.
When Weighted FP-Stream and FP-Stream algorithms are [10] Gouider, M. S., & Zarrouk, M. (2012). Frequent Patterns mining in time-
examined as Figure 5, we can see that only the fresh patterns sensitive Data Stream. International Journal of Computer Science Issues
(IJCSI), 9(4), 117.
obtained in the Weighted FP-Stream algorithm and the old [11] Kim, Y. H., Kim, W. Y., & Kim, U. M. (2010). Mining frequent itemsets
patterns are deleted from the memory. Because, while the with normalized weight in continuous data streams. Journal of
number of patterns found by FP-Stream is increasing per unit information processing systems, 6(1), 79-90.
time, there is no such relation by Weighted FP-Stream because [12] C. I. Ezeife and M. Monwar, "SSM : A Frequent Sequential Data Stream
Patterns Miner," 2007 IEEE Symposium on Computational Intelligence
of the weight parameter. This is an important property when and Data Mining, Honolulu, HI, 2007, pp. 120-126, doi:
10.1109/CIDM.2007.368862.
then the data size is big and the user is interested in only the
[13] Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2003). Mining
fresh patterns. frequent patterns in data streams at multiple time granularities. Next
generation data mining, 212, 191-212.
VII. CONCLUSION [14] Internet: T-61.6020: Popular Algorithms in Data Mining and Machine
Learning, http://www.cis.hut.fi/Opinnot/T-61.6020/2008/fptree.pdf
In this article, the proposed weighted FP-Stream algorithm (23.05.2020)
aims to eliminate outdated data and give priority to fresh data. [15] Qiu, Y., Lan, Y. J., & Xie, Q. S. (2004, August). An improved algorithm
of mining from FP-tree. In Proceedings of 2004 International Conference
For this purpose, a weight parameter is introduced to the on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826) (Vol. 3,
pp. 1665-1670). IEEE.

4558
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.

You might also like