Reading Assignment 1
Reading Assignment 1
Purpose of study:
The primary goal of this study is to explore technological discoveries in many domains such as
engineering, business analytics, medical, etc. This can also analyze the approaches based on the
capability of knowledge discovery. There are several approaches for the analysis that have previously
been published and have been considered in this respect. We now have a good knowledge of the
advantages and potential applications thanks to this study.
There are a few problems identified by the authors during the research work. They are as follows:
According to Vanahalli et al., the datasets which are dimensionally high have benefited from
bioinformatics. It is the outcome of a multitude of traits combining with a small number of samples. Most
of the operational time is spent by traditional algorithms mining a vast number of sparsely distributed
small, and moderately sized objects that do not contain important or notable data. The current work
concentrated on huge cardinality mining itemsets, commonly referred to itemsets as gigantic, which are
crucial in many domains, including bioinformatics. The present ceaseless gigantic itemset mining
computations have failed to identify a comprehensive arrangement of notable incessant monster itemsets.
Related Literature:
In 2018, High utility itemset mining (HUIM), a fresh area of data mining, is what Bai et al. claim it to be. In
the study's investigation and analysis, the following issue statements were discovered:
1. Additionally, periodization of frequent item based on rating and negative-positive mining rules are
absent.
2. Less research has been done on some particular projections of database using upper and lower
bound trimming techniques and mean value abundance.
3. Prior research lacked data on the quantity of candidate sets, creating recurrence sets, and
accuracy taken together.
4. In the mining of negative rule and threshold level task, better approaches are needed for
knowledge discovery.
5. Mining weighted frequent itemsets in data streams requires research.
In 2016, Shrivastava et al. proposed that the persistent itemsets of mining are significant in association
rule mining (ARM). This may be used in a variety of industries, including remedial education, market
basket analysis, banking, retail, and more. The unique way the itemsets have solidified might be both
intriguing and useful. Using the utility pattern rare itemset (UPRI) approach, they looked at huge benefit
uncommon itemsets. The great usefulness was found by them.
Li et al. suggested useful details of mining may be carried out utilizing rare itemsets in 2016. The
2L-XMMMS model is distinctive in that it gives each item one of two minimal supports. It could mine both
common and uncommon itemsets.
In 2017, Ghorbani proposed that standard methodologies for locating systematic itemsets presuppose
that there are steady datasets and that the limitations imposed across the whole dataset significantly. In
any event, this is not the case when details are global. The main goal of their study is to increase the
productivity of mining recurring itemsets on secular data.
He et al. made the argument that data mining is crucial in big data in 2017. To increase mining
productivity, they put forth the FP-tree and MapReduce based MAFIM approach. Data distribution is
carried by using MapReduce. For repeated itemset computing, the FP-tree has been utilized. Once the
mining results had been incorporated, the center node used MapReduce to create the global recurrent
item sets. Their findings show how rapid and structured the MAFIM algorithm is.
The average utility item sets (EHAUI-Tree) approach, which may be used to add latest database
arrangement without restarting the system, was introduced by Phuong and Duy in 2017. The estimate of
updated data is first computed. Then, based on the updated information esteem and the previous High
Average-utility Upper-bound, item sets that roll out improvements will be calculated and renewed
(HAUUB).
In 2017, Zulkurnain and Shah proposed that overflow of data occurs among various industries, including
banking, telecommunications, scientific operations, and so on. Data mining may be used to extract usable
information from this swamped data. By obtaining useful information from large datasets, it supports
directing processes.
It was suggested by Hong et al. in 2017 to employ erasable-itemset (EI) mining to find itemsets without
harming the profitability of factory. For collections of erasable items, they offered an gradual mining
method. Updates frequently form the foundation of this (FUP). In the intermittent data environment, their
results reveal that the suggested method runs quicker than the batch technique.
Ismail et al. suggested in 2017 that mining high-benefit instances is a method for finding groups of
beneficial goods that can deliver an excellent advantage for customer database.
In 2017, Jiang and He unveiled a data structure and non-recursive FPNR-growth technique that was more
functional. The results of their experiment indicate that the FPNR-growth algorithm works better than the
FP-growth approach. It is productive in both mining and storage time.
Mohammed et al. suggested in 2017 that the most popular itemset in a database should be the one that
isn't shielded by other itemsets. The Honey Bee Algorithm is a simple, robust, population-based
stochastic theoretical algorithm based on the often seeking tendencies of honey bees.
Wang et al. suggested extracting common patterns in data streams is a crucial component of data mining
in 2017. Their resolutions are discussed in their article, along with how they relate to common item sets
and sliding windows.
SECTION 2:
Overall Assesment:
This paper is simple to learn and understand. The paper has given clear cut explanations with examples
for every topic which made me easy to understand. Yes the author did the great job by elaborating the
pros and cons of the problems given by other authors. He also explained about the future aspects. The
study, including its scientific and practical contributions, is distributed among others in a specific field
through publishing. This raises the awareness of new comprehension in their field among scientific
researchers and practitioners with comparable regards, hence advancing comprehension and its
execution.
Future Reasearch:
In the future, correlations between comparable data can be made, and cutoff conditions can be evaluated.
More research is required in candidate development and improvement. A variety of DM method
hybridization combinations have been proposed.
SECTION 3:
Questions:
1. I would like to know solutions and some practical approach for the given problems faced by the
authors?
2. What are the other approaches other than ILP?