A Novel Weighted FP-Stream Algorithm For IoT Data Streams

This document summarizes a research paper presented at the 2020 IEEE International Conference on Big Data about a novel weighted FP-Stream algorithm for analyzing IoT data streams. The paper proposes enhancements to the conventional FP-Stream algorithm to make it more adaptive to concept drifts while retaining applicability to data streams. Specifically, it adds weights during pattern pruning based on pattern freshness, prioritizing newer patterns and allowing older patterns to be forgotten more quickly. The performance of the proposed algorithm is evaluated using data from an IoT testbed and is shown to perform better than conventional FP-Stream at handling concept drifts in streaming IoT data.

Uploaded by

戴积文

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views6 pages

A Novel Weighted FP-Stream Algorithm For IoT Data Streams

Uploaded by

戴积文

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2020 IEEE International Conference on Big Data (Big Data)

A Novel Weighted FP-Stream Algorithm for IoT Data

Streams
Halil Ibrahim DEDE Cemile TIMURKAAN Metehan GUZEL Suat OZDEMIR
Computer Engineering Dept. Computer Engineering Dept. Graduate School of Natural and Computer Engineering Dept.
2020 IEEE International Conference on Big Data (Big Data) | 978-1-7281-6251-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/BigData50022.2020.9378069

Gazi University Gazi University Applied Sciences Hacettepe University

Ankara, Turkey Ankara, Turkey Gazi University Ankara, Turkey
[email protected] [email protected] Ankara, Turkey [email protected]
[email protected]

Abstract—The Internet of Things (IoT) is a technology that is numerous concepts are introduced to literature, such as data
being widely used in daily life. This technology makes it easier for stream mining, big data, stream data analysis [1].
devices to connect with each other. As a result of the high
connectivity between devices, enormous volumes of data are being IoT networks create a virtual representation of the real world
collected. Such data is called big streaming data which can be used by using numerous sensors. Detection of recurring events can
to curate useful information by data mining techniques. One of the be used for predicting events or errors. By exploring meaningful
most used processing methods is called Frequent Itemset (Pattern) relations between events, more complex events can be
Mining (FIM) which detects recurring and common patterns over formulated. For this purpose, association rules are
data streams. In this paper, a new algorithm based on frequently used. Association rule mining structures have been created to
used FP-Stream algorithm is presented. The proposed algorithm ensure sensitive systems to work at high performance. It ensures
enhances conventional FP-Stream algorithm to make it more that the data is related to each other. It is absolutely necessary
adaptive to concept drifts when retaining its applicability to data to work on a database. The minimum support and minimum
streams. Conventional FP-Stream algorithms store all detected confidence values determine how strong the relationship is.
patterns. By adding weights during the pruning process based on Association rules can produce a single output to be used or an
pattern freshness, the proposed algorithm prioritizes newer output that can be an input to other mining operations. One of
patterns thereby learns new patterns and forgets older one swiftly. the oldest and one of the most frequently used algorithms for
Performance evaluations are performed using data acquired from association rule mining is the Apriori algorithm.
an IoT testbed established in KAVEM Lab of Gazi University.
Evaluation results indicate that the proposed algorithm performs Apriori algorithm requires multiple passes on data to be able
better than conventional FP-Stream significantly. to detect relations in-between. But in data streams where data
volume is great and data streams are continuous, it is not
Keywords—streaming data mining; frequent patterns; feasible to process a data point multiple times. Therefore, for
logarithmic tilted-time window; internet of things; tail pruning; streaming data, algorithms that require minimal number of
weighting passes on data are needed. For this purpose, FP-Growth
I. INTRODUCTION algorithm and data structure used for FP-Growth, namely FP-
Tree is proposed [2]. By performing reduction operations with
Internet of Things (IoT) is a technology that helps all objects FP-Growth algorithm, higher performance and more frequent
communicate with each other. IoT aims to improve the quality patterns are obtained [3].
of life. IoT obtains data from related objects and contributes for
users with meaningful information [1]. IoT is a term used for a A search of the literature revealed that for datasets which
network composed of numerous objects that are highly include identical transactions with high numbers, pruning of old
connected and penetrates into daily life in a pervasive manner. patterns takes a considerable high time in addition to lack of
Services and applications developed upon IoT increases the ability to detect new ones. To overcome this problem, a weight
quality of life for humans. It is estimated that by 2020 there will parameter is introduced to the conventional FP-Stream
be 50 billion connected devices in the Internet of Things (IoT) algorithm. In addition to the weight parameter, the proposed
networks [1]. By connecting all these devices together, it Weighted FP-Stream algorithm decreases storage used for
becomes easier to process and obtain information. Using these patterns, therefore effectively reduces memory and time
capabilities, numerous smart applications are introduced to our complexities and prioritizes recent transactions. In short, the
lives, like smart applications, smart homes, smart cities etc. proposed algorithm is faster and possesses higher ability to keep
Since IoT systems are always running, large volumes of data up with concept drifts. This research is made specifically for
are generated continuously. The processing of this data is data streaming mining applications over IoT. Data used to test
crucial to improve life quality further. For this purpose, the Weighted FP-Stream algorithm are collected from KAVEM
Lab at Gazi University. Data are preprocessed and used to
compare the proposed algorithm to the conventional FP-Stream

978-1-7281-6251-5/20/$31.00 ©2020 IEEE 4553

Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
algorithm. Test results indicate that the proposed algorithm is property. With this research, the process of mining with
able to detect recent patterns with higher accuracy in a shorter dynamic data has accelerated.
time and using less memory.
In another research about frequent pattern mining in
The rest of the paper is organized as follows. In Section 2, streaming data [9], by weighting method is achieved high
we summarize the related works. In Section 3, we introduce our performance in noisy data. In frequent pattern mining that
proposed Weighted FP-Stream algorithm. In Section 4, we provides anti monotonic work, there is an early pruning of
briefly introduce the dataset used in our research and data infrequent patterns. Owing to the weighting, stricter rules are
preprocessing performed. In Section 5, we compare FP-Stream applied to the noisy data and the patterns obtained are
and Weighted FP-Stream algorithms. In Section 6, we evaluate frequently patterned. The research is done in both noisy and
the Weighted FP-Stream algorithm’s performance and finally noiseless data and high performance is achieved in both data.
in Section 7, we conclude our work.
Tail pruning is removed and less mining processing is
II. RELATED WORKS performed in the study called "Shaking Points Structure" [10]
for developing the FP-Stream algorithm. However, it has
Important research work has been done to improve the FP- increased the memory requirement of the system. On the other
Stream algorithm and to develop other algorithms using the hand, the time for the prune process is reduced, so the process
weight parameter. We have summarized the important ones in cost is balanced. The prune operation is done by keeping the
this section. frequency information of each node in the tree structure and
In association rule mining problems, the weighted removing the values below the determined threshold value from
downward closure feature has emerged along with the idea of the structure. Threshold value is updated by reducing if the
using the weight parameter. Owing to this, the rules are related pattern is not found in the incoming transaction. In this
prioritized. The minimum support value is prevented from structure, a sliding window method is used. In another FP-Tree
being large values and transactions are made with the weight development research [11], the FP-Tree algorithm is developed
parameter. Thus, more effective and beneficial results are by using normalized weight values. This research has similar
obtained. This method makes it possible to enable post- features to the work we have done. However, the storage of
processing. In the research, maintenance is performed [4]. In transactions is made by using the sliding window structure. In
the research of WMFP (Weighted Maximal Frequent Item) [5], addition, sub-frequent patterns were not examined. Ranges to
only the patterns obtained through streaming data mining were normalize the weight parameter are given by the user as input.
selected for the purpose. In this research, conducted with In the SSM algorithm research [12], data streaming mining
decreasing minimum support value, frequent patterns were operations were performed for sequential patterns. In this
observed to increase. With the MFP (Maximum Frequent research, D-List, PLWAP-Tree and FSP-Tree structures are
Pattern) algorithm, various compressions are performed for used together. Similar patterns are obtained with the FP-Stream
frequent patterns without adding the weight parameter. This algorithm. It is thought that performance would increase by
provides a performance increase by preventing the obtaining of using sliding window structure.
patterns that are more likely not to be used. Nodes could have
gone up or down in some situations. There is not only prune III. THE PROPOSED ALGORITHM
operation on the nodes, but also the displacement of the nodes. The aim of the FP-Stream algorithm is mining frequent
In this research, which is based on MWS (Maximal Frequent patterns more efficiently. FP-Stream algorithm scans each data
Pattern Mining with Weight Conditions over Data Streams), point once and counts the number of items. Count of the item is
weighting is performed for patterns and high performance is called the support value. By comparing each items’ support,
obtained by reducing the number of scans. Especially in single minimum support and maximum error thresholds, frequency of
path studies has increased performance considerably. In the each item is calculated. FP-Stream algorithm mines only sub-
single-pass Weighted Frequent Pattern Mining research [6], frequent and frequent items [13]. By using the weight
weighting is performed in ascending weight order (IWFPTWA) parameter, recent items are made more dominant than older
and decreasing frequency order (IWFPTFD). They have ones. Introduction of weight parameters results with higher
provided performance increase by creating candidate patterns. performance and less memory usage.
With these processes, scalability has increased in structures
using incremental databases. In another research, using the In this section, information is given about the algorithm
weighting method [7], WFPMDS (Weighted Frequent Pattern used. Preliminaries, FP-Stream algorithm and the proposed
Mining over Data Stream) technique is used to obtain frequent Weighted FP-Stream algorithm are examined.
patterns in streaming data mining. With this technique, frequent A. Preliminaries
patterns are prioritized, eliminating the need for multiple scans FP-Stream is an algorithm for mining streaming data. This
as new data arrives. Single-pass and sliding windows structures algorithm, detects frequent patterns and creates a structure
are used in this structure. The DSWFP model [8] is developed which updates itself dynamically. The terms and their
for the development of the WPF algorithm. Sliding window explanations in the FP-Stream algorithm [13] that will be used
structure is used. Unlike other researches, it has been searched in the Weighted FP-Stream algorithm and the other terms are in
to keep up with the streaming data rate and to be more stable. It Table I. Weight, average window weight and average weight
is developed especially to keep up with the speed of streaming parameters are used only for the Weighted FP-Stream
sensor and web data. With its processed weight information, it algorithm. The other parameters are used by both algorithms.
performs double pruning that provides downward closure Weight parameter is normalized within the range [0-1]. By this

4554
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
way, it makes it easier to work with outlier values. The average TABLE III. LOGARITHMIC TILTED-TIME WINDOW
weight parameter is 0.5 which is the average of minimum and f(1,1) []
maximum weight values. Weight parameter updates f(2,2) [] f(1,1) []
dynamically based on scanning as a new batch arrives. f(2,2)
f(3,3) []
[f(1,1)]
TABLE I. PRELIMINARIES OF FP-STREAM AND WEIGHTED FP-STREAM
Term Explanation f(4,4) [] f(3,3) [] f(2,1) []
I Itemset, combination of single items ia f(4,4)
f(5,5) [] f(2,1) []
σ Minimum support [f(3,3)]
ε Maximum support error f(4,3)
f(6,6) [] f(5,5) []
[f(2,1)]
T Time period
f(6,6) f(4,3)
F Frequency f(7,7) []
[f(5,5)] [f(2,1)]
w Weight
f(8,8) [] f(7,7) [] f(6,5) [] f(4,1) []
wi Window
f(8,8)
wi w Average window weight f(9,9) [] f(6,5) [] f(4,1) []
[f(7,7)]
w Average weight f(8,7)
f(10,10) [] f(9,9) [] f(4,1) []
[f(6,5)]
f(10,10) f(8,7)
In the FP-Stream structure, frequent and sub-frequent f(11,11) [] f(4,1) []
[f(9,9)] [f(6,5)]
patterns are captured using FP-Tree structure and the f(8,5)
f(12,12) [] f(11,11) [] f(10,9) []
logarithmic tilted-time windows store these patterns. In Table [f(4,1)]
II, the frequency conditions of the patterns are given. The f(12,12) f(8,5)
f(13,13) [] f(10,9) []
purpose of storing sub-frequent patterns is the possibility that [f(11,11)] [f(4,1)]
f(12,11) f(8,5)
these patterns may become frequent in the future. f(14,14) [] f(13,13) []
[f(10,9)] [f(4,1)]
TABLE II. PATTERN CATEGORIZATION ON FP-STREAM f(14,14) f(12,11) f(8,5)
f(15,15) []
Pattern Categories [f(13,13)] [f(10,9)] [f(4,1)]
f(16,16) [] f(15,15) [] f(14,13) [] f(12,9) [] f(8,1) []
Frequent support>
f(16,16)
f(17,17) [] f(14,13) [] f(12,9) [] f(8,1) []
Sub-Frequent support < and support ≥ [f(15,15)]
Infrequent support<
The first added transaction is held alone in the first unit. At
the next level, if the buffer is empty for the next unit, the old
FP-Stream algorithm includes FP-Tree and logarithmic tilted- first transaction is transferred directly to that unit and the
time window structures. With the logarithmic tilted-time intermediate buffer in the unit is transferred to the buffer. If
window, frequent patterns are stored with certain compressions there is no free space, the batch to be transferred and the next
to save memory space. The logarithmic reduction of the number batch are compressed and transferred to the next unit. This
of units held in the structure is stored by keeping the windows continues until all the batches have settled. The formula for
in a logarithmic manner. For example, 366 x 24 x 4 = 35,136 finding the number of units in Equation 1, including the
units are needed in a natural tilted-time window for an annual frequency value of n, is used [13].
data retention. Instead of this, the logarithmic tilted-time
window structure can perform the same operation as ⌈log N+1⌉ (1)
log (365x24x4) + 1 ≈ 17 units. For each division operation,
fixed size batches are used. Tail pruning is done with the T
Incoming batch is transferred to the FP-Tree structure and
information and ε parameter, and mining operations are done
patterns are determined in accordance with the FP-Growth
on the FP-Tree with the FP-Growth algorithm [13].
algorithm. With the f_list formation, a structure is created that
1) FP-Stream algorithm: The FP-Stream algorithm aims keeps information on the usage frequency of the data and the
to find frequent patterns in data streams. It includes FP-Tree and data sequences accordingly. If all data in the incoming batch are
FP-Growth algorithms. FP-Stream trees contain the tilted-time added and if the incoming itemset is in the FP-Tree structure,
window and support value information of that value in each the corresponding batch is added to the logarithmic tilted-time
node. According to the minimum support and maximum table for the related itemset. Tail pruning is performed. If the
support error values, it is decided whether the items in the table is empty, the mining process is completed as a result of
related itemset are frequent or not. The processed data are the FP-Growth algorithm. Thus, FP-Stream structure is created.
collected in batches according to their frequency and with the Depth-first search is performed in the created structure and if
mining is not performed in the incoming batch, zero is added to
structure called tilted-time window, the data in batches are kept
the related itemset. Tail pruning continues. When the processes
in memory [13]. are completed, frequent patterns are observed in the created
The use of a tilted-time window allows the units held in structure [13].
memory to be reduced. This window uses buffering. Ease of
operation and increased performance are provided by keeping 2) FP-Tree algorithm: It is an algorithm that provides
the batches together. Batches are held logarithmically. They are frequent pattern finding with the FP-Tree structure established
given in Table III. to reduce the number of scans of the Apriori algorithm. The
Apriori algorithm cannot achieve accurate results with a small

4555
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
amount of data. Cartesian products are used in the Apriori b. If I is not in the structure
algorithm, so it increases the cost of calculating and storing the i. The frequency of that batch is not less than
patterns obtained. The FP-Tree algorithm responds to the need maximum support error and the batch size, then
to create a structure that can be updated dynamically to find insert I into the structure. Otherwise stop mining
frequent patterns. It is quite easy to sort the frequency of the and use tail pruning.
patterns [14, 15]. ii. Scan the structure by depth-first search. If
For using this structure, first of all, a blank FP-Tree structure any of the nodes has no children, it will be a leaf
is created. The minimum support value is determined. Then, all node.
data in the database are scanned once and support value is found
for each item. Items that are not smaller than the minimum Fig 1. Pseudocode for Weighted FP-Stream algorithm
support value are frequent and listed in f_list in descending
order. Every transaction in the f_list is added to the tree IV. DATA PREPROCESSING
structure according to the item frequencies. For each added The collected sensor data have included the timestamp and
item, the value of that node increases by one [15]. the corresponding sensor values. In order to find frequent
patterns by FP-Stream algorithm, these data should be
3) FP-Growth algorithm: The mining process of the FP- categorized. The sensors have different measurement
Tree structure is performed with the FP-Growth algorithm. frequencies. Therefore, data have grouped in 10-minute time
Operations begin with the item with the least frequency in the units, averaging the same type of sensor readings. Grouping has
f_list structure. If the related item is on a single path, the been done by taking into account the standard deviation of each
frequent pattern created by all items up to the root node is taken sensor data. Indoor air quality sensor, temperature, humidity,
and the lowest support value becomes the conditional pattern- light density, sharp and PIR sensors’ readings are kept in
base value of that pattern. If there is more than one branch in streaming batches. Standard deviations were found according to
that item, the conditional pattern-base is equal to the number of the values of each sensor feature, and the data of each feature
branches formed and the same operations are performed for were grouped and categorized. Prepared data were given as an
each branch. For each item, the values of all items in the input to the encoded FP-Stream algorithm and frequent patterns
conditional pattern-base are examined. Patterns are larger than were found.
the minimum support value form the conditional FP-tree As mentioned before in this paper, real sensor data are
structure. FP-Growth algorithm uses divide-and-conquer collected from KAVEM lab of Gazi University is used. The
method. So, data mining provides higher performance operation collection of unprocessed data were in Figure 2. Basically, there
than the Apriori algorithm [15]. were time, sensor and sensor values. Temperature, indoor air
B. Weighted FP-Stream Algorithm quality, light density, humidity, PIR and sharp sensors have
held together in the related data. During the data processing,
The purpose of this algorithm is to increase the importance records containing 2 sensor values are separated. Noise and
of the current batch of frequent patterns. In the FP-Stream outlier data were determined and necessary arrangements were
algorithm, it is observed that after the frequent patterns obtained made.
with large amounts of data, the algorithm should run millions
of times in order that fresh patterns have been more dominant
than the old patterns. By using the weight parameter of the tail
pruning process, which will prevent this situation and ensure
the freshness of the data, it is provided to accept more current
data as a frequent pattern. The algorithm is presented in Figure
1.
1. An FP-Stream structure
Fig 2. Unprocessed sensor data
INPUT 2. σ, ε and batch size thresholds
3. Incoming batch to store transactions
There were some outliers on light density sensor data. They
1. Updated FP-Stream structure had to be greater than 0 but there were 6 values less than zero.
OUTPUT
2. Frequent patterns It is caused by sensor failure. These values are dropped.
1. Initialize an empty FP-tree.
Grouping the relevant data at 10-minute intervals and
2. Sort each item with their frequency on f_list
creating a feature for each sensor data were performed. If a large
and insert all transactions on FP-tree structure (only
amount of data were collected from a sensor in 10 minutes, their
f_list items are inserted) by FP-Growth algorithm. averages were taken. Then, considering the standard deviations
3. Use tail pruning on FP-tree structure of the relevant sensor data, the data were categorized and
METHOD a. If I is in structure; itemsets to be used in the study were obtained. For
i. Add the frequency of I to the tilted categorization by grouping, value ranges used for each sensor
time window. data is given in Table IV.
ii. Conduct tail pruning by using weight factor
TABLE IV. CATEGORY RANGES OF SENSOR DATA
iii. If the table is empty, stop mining of Sensor Value Ranges
supersets, else continue to mine supersets of I.

4556
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
Indoor Air Quality 0-6
Light Density 0-25
Celcius 0-6
Humidity 0-8
Sharp 0-1
Pir 0-1

The data have been obtained as a result of combining the

data in 10-minute time intervals, creating attributes for each
sensor and categorizing the data according to standard
deviations for related sensors are in Figure 3.

Fig. 3. Processed sensor data

Fig. 4. Processed data graph

The number of values obtained for the new attributes
created in line with the value ranges in Table IV were in Table
V. The date attribute is not used in the algorithm. After
preprocessing, the data are ready for the algorithm. Number of V. THE COMPARISON OF FP-STREAM AND WEIGHTED FP-
acquired sensor records for each sensor type is given in Table STREAM ALGORITHMS
V. The weighted FP-Stream algorithm contains a weight value
TABLE V. VALUE COUNTS IN SENSOR DATA unlike the original FP-Stream algorithm. Weighting process is
Sensor Non-null Data done by normalization between 0 and 1 values. These weights
date 39641 are multiplied by the frequency values of the patterns, and the
indoor air quality 39376 minimum support and maximum support error values are
light density 32889 multiplied by an average weight value of 0.5 and tail pruning
celsius 33188 takes place. Thus, the algorithm allows more frequent patterns
humidity 33190 to be kept in the pruning process. The original tail pruning
sharp 8298
formula is in Equation 2.
pir 11229

In Figure 4, the values between 1000 and 2000 indexes data “∃ , ∀ , , fI(ti) wi AND
were shown with a graph. As it could be seen in Figure 4, the ∀ ′, ′ ,∑ fI(ti)< ∑ wi.” (2)
fact that all transactions have no all sensor types caused some
disconnections in the graphic drawings. Categorizing the data Equation 2 is actually a combination of two different
at larger intervals instead of 10-minute intervals would prevent equations and their operator. However, pruning takes place with
these disconnections at a good level. The values that all the data the fulfillment of both conditions. And true if all frequency
have taken according to the relevant transaction have shown in values of an itemset are less than the product of the respective
the Jupyter Notebook study where data were preprocessed.
window size and value in the left part of the operator. And to
the right of his operator is true if the sum of all the frequency
values of an itemset is less than the product of the total window
size and value. In the event that both conditions are true, tail
pruning is performed.
The tail pruning formula in the weighted FP-Stream
algorithm developed will be in Equation 3.
“∃ , ∀ , ,fI(ti)wi wi w AND
∀ ′, ′ ,∑ fI(ti) wi ∑ wi w.” ( 3)

In the Weighted FP-Stream tree pruning formula,

frequencies are multiplied by the weight of the relevant pattern,
minimum support value and maximum support error values.
Weights are adjusted to be higher in current data and lower in

4557
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.
older data. Thus, the availability of current frequent patterns has algorithm. The weight parameter is used for the tail pruning
been made more dominant in the algorithm. process to keep tree structure fresh. Weight values between 0
and 1 are used to indicate recentness of transactions. To show
VI. PERFORMANCE EVALUATION effectiveness of the Weighted FP-Stream, the algorithm is
The proposed algorithm is coded using Java Programming compared to conventional FP-Stream using the same
language. Data preprocessing parts are coded using Python parameters. Using FP-Stream algorithm 710 frequent patterns,
programming language and then all the processed data are and using our Weighted FP-Stream algorithm 58 frequent
divided into batches. The performance evaluations are done by patterns are obtained. In the Weighted FP-Stream algorithm,
these batches. The properties of the environment include a 2.60 less and more current frequent patterns are obtained as
GHz CPU, 16 GB RAM and Windows 10 OS. expected. 53 of the patterns are common in both algorithms’
As a result of using the same data and parameters, the results. The proposed algorithm can keep more recent frequent
frequent pattern amount obtained with the FP-Stream algorithm patterns with higher performance and less memory usage which
and the Weighted FP-Stream algorithm are given in Table VI. is important when the data size is big and resources are limited.
Minimum support threshold is fixed by 0.1, maximum support ACKNOWLEDGMENT
error threshold is fixed by 0.01. Batch size is fixed by 1000 and
39 batches are used. As expected, all common patterns derived This work was supported by the Scientific and
from the Weighted FP-Stream algorithm are also included in the Technological Research Council of Turkey under Grant
original FP-Stream algorithm. 118E212.
TABLE VI. FREQUENT PATTERN NUMBERS OBTAINED BY FP-
REFERENCES
STREAM AND WEIGHTED FP-STREAM ALGORITHM
Weighted FP-Stream
Algorithm FP-Stream Algorithm
Algorithm [1] Kök, İ., Şimşek, M. U., & Özdemir, S. (2017, December). A deep learning
Number of Frequent model for air quality prediction in smart cities. In 2017 IEEE International
710 58 Conference on Big Data (Big Data) (pp. 1983-1990). IEEE.
Patterns
[2] Ezeife, C. I., & Su, Y. (2002, May). Mining incremental association rules
with generalized FP-tree. In Conference of the Canadian Society for
By using the FP-Stream algorithm, there are 710 frequent Computational Studies of Intelligence (pp. 147-160). Springer, Berlin,
Heidelberg.
patterns and by using the Weighted FP-Stream algorithm, there [3] Zhang, W., Liao, H., & Zhao, N. (2008, December). Research on the FP
are 58 frequent patterns found. 53 patterns are common in both growth algorithm about association rule mining. In 2008 International
Seminar on Business and Information Management (Vol. 1, pp. 315-318).
algorithms. IEEE.
[4] Tao, F., Murtagh, F., & Farid, M. (2003, August). Weighted association
rule mining using weighted support and significance framework. In
Proceedings of the ninth ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 661-666).
[5] Yun, U., Lee, G., & Ryu, K. H. (2014). Mining maximal frequent patterns
by considering weight conditions over data streams. Knowledge-Based
Systems, 55, 49-65.
[6] Ahmed, C. F., Tanbeer, S. K., Jeong, B. S., Lee, Y. K., & Choi, H. J.
(2012). Single-pass incremental and interactive mining for weighted
frequent patterns. Expert Systems with Applications, 39(9), 7976-7994.
[7] Ahmed, C. F., Tanbeer, S. K., & Jeong, B. S. (2009, June). Efficient
mining of weighted frequent patterns over data streams. In 2009 11th
IEEE International Conference on High Performance Computing and
Communications (pp. 400-406). IEEE.
[8] J. Wang and Y. Zeng, "DSWFP: Efficient mining of weighted frequent
pattern over data streams," 2011 Eighth International Conference on
Fuzzy Systems and Knowledge Discovery (FSKD), Shanghai, 2011, pp.
942-946, doi: 10.1109/FSKD.2011.6019763.
[9] Yun, U., & Ryu, K. H. (2011). Approximate weighted frequent pattern
Fig. 5. Freshness of the patterns mining with/without noisy environments. Knowledge-Based Systems,
24(1), 73-82.
When Weighted FP-Stream and FP-Stream algorithms are [10] Gouider, M. S., & Zarrouk, M. (2012). Frequent Patterns mining in time-
examined as Figure 5, we can see that only the fresh patterns sensitive Data Stream. International Journal of Computer Science Issues
(IJCSI), 9(4), 117.
obtained in the Weighted FP-Stream algorithm and the old [11] Kim, Y. H., Kim, W. Y., & Kim, U. M. (2010). Mining frequent itemsets
patterns are deleted from the memory. Because, while the with normalized weight in continuous data streams. Journal of
number of patterns found by FP-Stream is increasing per unit information processing systems, 6(1), 79-90.
time, there is no such relation by Weighted FP-Stream because [12] C. I. Ezeife and M. Monwar, "SSM : A Frequent Sequential Data Stream
Patterns Miner," 2007 IEEE Symposium on Computational Intelligence
of the weight parameter. This is an important property when and Data Mining, Honolulu, HI, 2007, pp. 120-126, doi:
10.1109/CIDM.2007.368862.
then the data size is big and the user is interested in only the
[13] Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2003). Mining
fresh patterns. frequent patterns in data streams at multiple time granularities. Next
generation data mining, 212, 191-212.
VII. CONCLUSION [14] Internet: T-61.6020: Popular Algorithms in Data Mining and Machine
Learning, http://www.cis.hut.fi/Opinnot/T-61.6020/2008/fptree.pdf
In this article, the proposed weighted FP-Stream algorithm (23.05.2020)
aims to eliminate outdated data and give priority to fresh data. [15] Qiu, Y., Lan, Y. J., & Xie, Q. S. (2004, August). An improved algorithm
of mining from FP-tree. In Proceedings of 2004 International Conference
For this purpose, a weight parameter is introduced to the on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826) (Vol. 3,
pp. 1665-1670). IEEE.

4558
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 29,2023 at 01:29:11 UTC from IEEE Xplore. Restrictions apply.

Adaptive Clustering For Dynamic IoT Data Streams
No ratings yet
Adaptive Clustering For Dynamic IoT Data Streams
11 pages
Comp Sci - Ijcse - Improve Frequent Patteren Mining in Data - Himanshu - Opaid
No ratings yet
Comp Sci - Ijcse - Improve Frequent Patteren Mining in Data - Himanshu - Opaid
12 pages
(IJCST-V12I2P8) :sheel Shalini, Nayancy Kumari
No ratings yet
(IJCST-V12I2P8) :sheel Shalini, Nayancy Kumari
10 pages
177 1496393364 - 02-06-2017 PDF
No ratings yet
177 1496393364 - 02-06-2017 PDF
6 pages
177 1496393364 - 02-06-2017 PDF
No ratings yet
177 1496393364 - 02-06-2017 PDF
6 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
No ratings yet
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
5 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Cognitive Fatigue Detection in Vehicular Drivers Using K-Means Algorithm
No ratings yet
Cognitive Fatigue Detection in Vehicular Drivers Using K-Means Algorithm
48 pages
Data Mining Nov10
100% (1)
Data Mining Nov10
2 pages
GFJHFN
No ratings yet
GFJHFN
21 pages
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
No ratings yet
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
5 pages
Paper 16
No ratings yet
Paper 16
15 pages
Applsci 11 08971 v2
No ratings yet
Applsci 11 08971 v2
15 pages
High-Utility Itemset Mining With Effective Pruning Strategies
No ratings yet
High-Utility Itemset Mining With Effective Pruning Strategies
22 pages
Singh Shailendra Master PDF
No ratings yet
Singh Shailendra Master PDF
130 pages
An Improved Frequent Pattern Tree The Child Struct
No ratings yet
An Improved Frequent Pattern Tree The Child Struct
19 pages
Parallel Mining of Frequent Itemsets Using
No ratings yet
Parallel Mining of Frequent Itemsets Using
13 pages
Adaptive Clustering
No ratings yet
Adaptive Clustering
11 pages
Mining Recent Maximal Frequent Itemsets Over Data Streams With Sliding Window
No ratings yet
Mining Recent Maximal Frequent Itemsets Over Data Streams With Sliding Window
9 pages
Big Dta Project
No ratings yet
Big Dta Project
8 pages
Expose Iot Data Mining Yagoub - Semida
No ratings yet
Expose Iot Data Mining Yagoub - Semida
19 pages
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
No ratings yet
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
7 pages
Uncertainty Oriented-Incremental Erasable Pattern Mining Over Data Streams
No ratings yet
Uncertainty Oriented-Incremental Erasable Pattern Mining Over Data Streams
15 pages
An Improvement of FP-Growth Association Rule Minin
No ratings yet
An Improvement of FP-Growth Association Rule Minin
7 pages
426-Article Text-1037-1-10-20210421
No ratings yet
426-Article Text-1037-1-10-20210421
9 pages
A New DataStructure For Finding Maximum
No ratings yet
A New DataStructure For Finding Maximum
5 pages
DataStreamsCRC Anjaly
No ratings yet
DataStreamsCRC Anjaly
258 pages
A New Parallel Algorithm For Frequent Pattern Mining
No ratings yet
A New Parallel Algorithm For Frequent Pattern Mining
5 pages
Eng-Improve Frequent Pattern Mining in Data Stream-Himanshu Shah
No ratings yet
Eng-Improve Frequent Pattern Mining in Data Stream-Himanshu Shah
10 pages
Iot Sma
No ratings yet
Iot Sma
6 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Building An Open Source Facial Recognition System For Mass Surveillance
100% (1)
Building An Open Source Facial Recognition System For Mass Surveillance
31 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Reading Assignment 1
No ratings yet
Reading Assignment 1
3 pages
Maintaining Frequent Itemsets Over High-Speed Data Streams
No ratings yet
Maintaining Frequent Itemsets Over High-Speed Data Streams
12 pages
A Novel Drift Detection Algorithm Based
No ratings yet
A Novel Drift Detection Algorithm Based
12 pages
Frequent Patterns Mining in Time-Sensitive Data Stream
No ratings yet
Frequent Patterns Mining in Time-Sensitive Data Stream
8 pages
Screw Conveyor Design
100% (1)
Screw Conveyor Design
8 pages
Interactive Approach For Generation of Association Rules by Using Ontology IJERTCONV3IS01047
No ratings yet
Interactive Approach For Generation of Association Rules by Using Ontology IJERTCONV3IS01047
3 pages
Basics of A Jet Engine
No ratings yet
Basics of A Jet Engine
34 pages
6 Electric Potential and Relationship Between E and V - Maxwell's Equation
No ratings yet
6 Electric Potential and Relationship Between E and V - Maxwell's Equation
25 pages
A Systematic Survey DM and BD in IoT
No ratings yet
A Systematic Survey DM and BD in IoT
49 pages
Clustering Data Streams Theory Practice
No ratings yet
Clustering Data Streams Theory Practice
33 pages
Mining
No ratings yet
Mining
21 pages
P21CO003 Seminar 1
No ratings yet
P21CO003 Seminar 1
45 pages
Utility-Driven Data Analytics On Uncertain Data
No ratings yet
Utility-Driven Data Analytics On Uncertain Data
11 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Modified Frequent Pattern Mining From Data Stream
No ratings yet
Modified Frequent Pattern Mining From Data Stream
38 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
An Efficient Closed Frequent Itemset Miner For The MOA Stream Mining System
No ratings yet
An Efficient Closed Frequent Itemset Miner For The MOA Stream Mining System
10 pages
Mining Frequent Itemsets Based On CBSW Method: K Jothimani, DR Antony Selvadossthanmani
No ratings yet
Mining Frequent Itemsets Based On CBSW Method: K Jothimani, DR Antony Selvadossthanmani
5 pages
Electro Chemistry (MS)
No ratings yet
Electro Chemistry (MS)
208 pages
Junior French Course PDF
No ratings yet
Junior French Course PDF
232 pages
2013 Mining Frequent Pattern Form Large Dynamic Database With Different Exhibition of Time
No ratings yet
2013 Mining Frequent Pattern Form Large Dynamic Database With Different Exhibition of Time
6 pages
Diploma in Electrical Engineering Industrial Traning Report
No ratings yet
Diploma in Electrical Engineering Industrial Traning Report
42 pages
Data 07 00011
No ratings yet
Data 07 00011
22 pages
The Shard Presentation
No ratings yet
The Shard Presentation
15 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
F2489
No ratings yet
F2489
13 pages
Elements of Mechanical Engineering
No ratings yet
Elements of Mechanical Engineering
76 pages
Data Mining Technique To Analyse The Metrological Data
No ratings yet
Data Mining Technique To Analyse The Metrological Data
5 pages
Practical Training Seminar: Shubham Jain 132 KV G.S.S.Chambal Jaipur
No ratings yet
Practical Training Seminar: Shubham Jain 132 KV G.S.S.Chambal Jaipur
16 pages
Boot Reference List
No ratings yet
Boot Reference List
6 pages
Employee Benefit Plans 6: Limitations On Contributions and Benefits
No ratings yet
Employee Benefit Plans 6: Limitations On Contributions and Benefits
23 pages
BS5467 Cables Prysmian PDF
No ratings yet
BS5467 Cables Prysmian PDF
5 pages
BTC Script Grabber
No ratings yet
BTC Script Grabber
3 pages
LQ043T3DX02 SP 122805 PDF
No ratings yet
LQ043T3DX02 SP 122805 PDF
25 pages
Prepared By:-Kartik Thakkar
No ratings yet
Prepared By:-Kartik Thakkar
16 pages
Pick&Place Station Assembly Instructions
No ratings yet
Pick&Place Station Assembly Instructions
20 pages
Irregular Singular Points
No ratings yet
Irregular Singular Points
14 pages
JST PH Connectors - Datasheet
No ratings yet
JST PH Connectors - Datasheet
2 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
W73153 International GCSE Science (Single Award) 4SS0 AN Accessible Version
No ratings yet
W73153 International GCSE Science (Single Award) 4SS0 AN Accessible Version
4 pages
App T Da Pam 73-1 S
No ratings yet
App T Da Pam 73-1 S
4 pages
Automatic Localization of Casting Defects With Convolutional Neural Networks
No ratings yet
Automatic Localization of Casting Defects With Convolutional Neural Networks
11 pages
An Empirical Assessment of Empirical Corporate Finance
No ratings yet
An Empirical Assessment of Empirical Corporate Finance
40 pages
Fluid Power - 2
No ratings yet
Fluid Power - 2
11 pages
IPS SW Upgrade Document Rev 15
No ratings yet
IPS SW Upgrade Document Rev 15
17 pages
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
No ratings yet
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
14 pages
Propylparabens Uv-Vis 1
No ratings yet
Propylparabens Uv-Vis 1
12 pages
Methodologies For Stream Data Processing and Stream Data Systems
No ratings yet
Methodologies For Stream Data Processing and Stream Data Systems
20 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
1 page
HELMKE Plus: Three-Phase Low Voltage Squirrel Cage Motors
No ratings yet
HELMKE Plus: Three-Phase Low Voltage Squirrel Cage Motors
28 pages
Beginner Course-Navigating The UI
No ratings yet
Beginner Course-Navigating The UI
8 pages
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)

A Novel Weighted FP-Stream Algorithm For IoT Data Streams

Uploaded by

A Novel Weighted FP-Stream Algorithm For IoT Data Streams

Uploaded by

2020 IEEE International Conference on Big Data (Big Data)

A Novel Weighted FP-Stream Algorithm for IoT Data

Gazi University Gazi University Applied Sciences Hacettepe University

978-1-7281-6251-5/20/$31.00 ©2020 IEEE 4553

The data have been obtained as a result of combining the

Fig. 3. Processed sensor data

Fig. 4. Processed data graph

In the Weighted FP-Stream tree pruning formula,

You might also like