0% found this document useful (0 votes)
15 views3 pages

Reading Assignment 1

Uploaded by

rishitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Reading Assignment 1

Uploaded by

rishitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SECTION 1:

Summary of the paper


Singh, B., & Dubey, V. (2018, April 21). A review on knowledge discovery and data mining techniques.
International Journal of Advanced Technology and Engineering Exploration, 5(41), 70–77.
https://doi.org/10.19101/ijatee.2018.541006

Purpose of study:
The primary goal of this study is to explore technological discoveries in many domains such as
engineering, business analytics, medical, etc. This can also analyze the approaches based on the
capability of knowledge discovery. There are several approaches for the analysis that have previously
been published and have been considered in this respect. We now have a good knowledge of the
advantages and potential applications thanks to this study.

There are a few problems identified by the authors during the research work. They are as follows:
According to Vanahalli et al., the datasets which are dimensionally high have benefited from
bioinformatics. It is the outcome of a multitude of traits combining with a small number of samples. Most
of the operational time is spent by traditional algorithms mining a vast number of sparsely distributed
small, and moderately sized objects that do not contain important or notable data. The current work
concentrated on huge cardinality mining itemsets, commonly referred to itemsets as gigantic, which are
crucial in many domains, including bioinformatics. The present ceaseless gigantic itemset mining
computations have failed to identify a comprehensive arrangement of notable incessant monster itemsets.

Related Literature:
In 2018, High utility itemset mining (HUIM), a fresh area of data mining, is what Bai et al. claim it to be. In
the study's investigation and analysis, the following issue statements were discovered:
1. Additionally, periodization of frequent item based on rating and negative-positive mining rules are
absent.
2. Less research has been done on some particular projections of database using upper and lower
bound trimming techniques and mean value abundance.
3. Prior research lacked data on the quantity of candidate sets, creating recurrence sets, and
accuracy taken together.
4. In the mining of negative rule and threshold level task, better approaches are needed for
knowledge discovery.
5. Mining weighted frequent itemsets in data streams requires research.

In 2016, Shrivastava et al. proposed that the persistent itemsets of mining are significant in association
rule mining (ARM). This may be used in a variety of industries, including remedial education, market
basket analysis, banking, retail, and more. The unique way the itemsets have solidified might be both
intriguing and useful. Using the utility pattern rare itemset (UPRI) approach, they looked at huge benefit
uncommon itemsets. The great usefulness was found by them.

Li et al. suggested useful details of mining may be carried out utilizing rare itemsets in 2016. The
2L-XMMMS model is distinctive in that it gives each item one of two minimal supports. It could mine both
common and uncommon itemsets.

In 2017, Ghorbani proposed that standard methodologies for locating systematic itemsets presuppose
that there are steady datasets and that the limitations imposed across the whole dataset significantly. In
any event, this is not the case when details are global. The main goal of their study is to increase the
productivity of mining recurring itemsets on secular data.

He et al. made the argument that data mining is crucial in big data in 2017. To increase mining
productivity, they put forth the FP-tree and MapReduce based MAFIM approach. Data distribution is
carried by using MapReduce. For repeated itemset computing, the FP-tree has been utilized. Once the
mining results had been incorporated, the center node used MapReduce to create the global recurrent
item sets. Their findings show how rapid and structured the MAFIM algorithm is.

The average utility item sets (EHAUI-Tree) approach, which may be used to add latest database
arrangement without restarting the system, was introduced by Phuong and Duy in 2017. The estimate of
updated data is first computed. Then, based on the updated information esteem and the previous High
Average-utility Upper-bound, item sets that roll out improvements will be calculated and renewed
(HAUUB).

In 2017, Zulkurnain and Shah proposed that overflow of data occurs among various industries, including
banking, telecommunications, scientific operations, and so on. Data mining may be used to extract usable
information from this swamped data. By obtaining useful information from large datasets, it supports
directing processes.

It was suggested by Hong et al. in 2017 to employ erasable-itemset (EI) mining to find itemsets without
harming the profitability of factory. For collections of erasable items, they offered an gradual mining
method. Updates frequently form the foundation of this (FUP). In the intermittent data environment, their
results reveal that the suggested method runs quicker than the batch technique.

Ismail et al. suggested in 2017 that mining high-benefit instances is a method for finding groups of
beneficial goods that can deliver an excellent advantage for customer database.

In 2017, Jiang and He unveiled a data structure and non-recursive FPNR-growth technique that was more
functional. The results of their experiment indicate that the FPNR-growth algorithm works better than the
FP-growth approach. It is productive in both mining and storage time.

Mohammed et al. suggested in 2017 that the most popular itemset in a database should be the one that
isn't shielded by other itemsets. The Honey Bee Algorithm is a simple, robust, population-based
stochastic theoretical algorithm based on the often seeking tendencies of honey bees.

Wang et al. suggested extracting common patterns in data streams is a crucial component of data mining
in 2017. Their resolutions are discussed in their article, along with how they relate to common item sets
and sliding windows.

Results & Conclusions:


To create a useful information discovery system and mine a common itemset, this study looks at several
data mining techniques. It talks about the prospects, intuitions, and influence of mining for and
negative-positive associations. It also contains the results that are now being released, as well as future
recommendations. The study and discussion presented above make it clear that a powerful data mining
expertise realization model that can handle creation of candidate and mining for both positive and
negative are necessary corporation rules.
Contributions:
A safe systematics model for accident instances has been established in this study. The two crucial
elements of this strategy are comparable units analysis and similarity degree assessment. The benefits of
this approach include the ability to collect both subjective usage of information regarding various
incidents. They investigated the descriptive capability of the data mining technique based on
multi-relational inductive logic programming (ILP). They investigated ILP's ability to hold and merge
(implicit) multiple relationships in E-COMPARED data.

SECTION 2:
Overall Assesment:
This paper is simple to learn and understand. The paper has given clear cut explanations with examples
for every topic which made me easy to understand. Yes the author did the great job by elaborating the
pros and cons of the problems given by other authors. He also explained about the future aspects. The
study, including its scientific and practical contributions, is distributed among others in a specific field
through publishing. This raises the awareness of new comprehension in their field among scientific
researchers and practitioners with comparable regards, hence advancing comprehension and its
execution.

Future Reasearch:
In the future, correlations between comparable data can be made, and cutoff conditions can be evaluated.
More research is required in candidate development and improvement. A variety of DM method
hybridization combinations have been proposed.

New Knowledge Learned:


Learned about the approaches based on knowledge discovery, benefit itemset mining , association rule
mining, convenience pattern rare itemset approach, FP-tree and MapReduce based MAFIM approach.

SECTION 3:
Questions:
1. I would like to know solutions and some practical approach for the given problems faced by the
authors?
2. What are the other approaches other than ILP?

Link to the video:


https://drive.google.com/drive/folders/1HGE_6H0eXFOz5bGwKtTNeh6H4lShWwi2?usp=sharing

You might also like