Assignment Data Mining
Assignment Data Mining
(21262)
Assignment: Data Mining
MSCS (1 semester)
2 AN OVERVIEW OF A CRIME DETECTION SYSTEM USING THE ART SRITHA ZITH DEY BABU,
OF DATA MINING DIGVIJAY PANDEY, ISMAIL
SHEIK
3 A Survey on Malware Detection Using Data Mining Techniques YANFANG YE, TAO LI,
DONALD ADJEROH, S.
SITHARAMA IYENGAR
4 Data Mining for the Internet of Things: Literature Review and Feng Chen, Pan Deng, Jiafu
Challenges Wan, Daqiang Zhang,
Athanasios V. Vasilakos, and
Xiaohui Rong
6 An Overview of Data mining application in judicial cases to Nima Norouzi, Elham Ataei
identify patterns of crime and crime detection
7 Using Data Mining to Detect Health Care Fraud and Abuse: A Hossein Joudaki, Arash
Review of Literature Rashidian, Behrouz Minaei-
Bidgoli, Mahmood
Mahmoodi, Bijan Geraili,
MahdiNasiri & Mohammad
Arab
9 Big data analytics for security and criminal investigations M.I. Pramanik, Raymond Y.K.
Lau, Wei T. Yue, Yunming Ye
and Chunping Li
10 Data mining in anti-money laundering field Noriaki Yasaka
YEAR JOURNAL HEC CATEGORY IMPACT
FACTOR PROPOSED ALGO /
METHOD
The bridge of data between the Network datasets used (1)Crime month. (2) Crime
police station and system of data day of the week. (3) The
mining will report about further real crime time
and upcoming crimes
Naive Bayes Classifier (NBC) Using 121 datasets parameter including the
feature selection method
(e.g., Document Frequency,
FS, or Gain Ratio) (DF, Gain
Ratio (GR), and FS) in
Classification CLOUDS: a decision tree malware detection
the parameters of the
classifier for large datasets transformation remain the
same for every time series
regardless of its nature,
related research including
DFT [86], wavelet functions
related topic [87], and PAA
[72].
Expert Systems with Applications European region was only Country, Frequency, and
and Decision Support Systems reported Percentage (%)
Their detection system reached Both experiments reused the an evaluation against
an AUC of 91% on their Second Life and Entropia manually identified ground
experimental dataset, rising datasets truth would be
to 99.7% with additional stronger justification of the
components method’s validity
The results are Crime continues to remain a Data also points toward the
promising, with 76% severe threat to all need for more training and
detection accuracy on communities and nations investment in educating and
real world ‘blacklist’ data across the globe alongside the empowering youth with
and sound evidence that sophistication in technology knowledge on the
the approach can and processes that are being advantages
provide effective exploited to enable highly
warnings several months complex criminal activities
ahead of the official
release of blacklists
We analyzed that crimes To predict crime and analyze The data ming system will
vary with criminal's age crime activity also help people to make a
and health or political direct connection with
power also police and the law of justice
officials
Their results showed To protect legitimate users Advantage of being
that the FS was very from these threats, anti- exhaustive in detecting
accurate using just the malware software products malicious logic. In other
top 50 features from different companies, words, static analysis does
including Comodo, Kaspersky, not have the coverage
Data mining Kingsoft, andinvolves
Data mining Symantec, problem
advantage that dynamic
of distributed
technologies are discovering novel, interesting, parallel computer systems
integrated with IoT and potentially useful patterns
technologies for decision from large data sets and
making support and applying algorithms to the
system optimization. extraction of hidden
information.
data mining technique in This paper aims to review Besides this benefit,
detecting financial fraud research studies conducted to researchers can take
with a 13%, followed by detect financial fraud using advantage of knowing the
both of neural network data mining tools within one most frequent used
and decision tree, with a decade and communicate the methods
11%. While support current trends to academic
vector machine is scholars and industry
represented by a 9% and practitioners
naïve Bayes is
represented by a 6%.
Besides fraud detection.
the results should be The main purpose of this advantage of the industrial
evaluated, and its research is to study the opportunity to reap the true
importance should be method based on data mining. benefits of advanced
explained. Usually, in This method can be studied in computing technology in
classification problems, a various fields such as areas that include position
matrix of complexity is a identification, forecasting, and monitoring, intelligent
useful tool crime prevention using data control, and database
mining tools and algorithms knowledge discovery
and using the existing
database and their military
arrangement at the crime
scene to prevent crime.
Recommend seven The scale of this problem is As advantages, Shin et al
general steps to mining large enough to make it a used a simple definition of
health care claims (or priority issue for health anomaly score and
insurance claim) to systems. extracted 38 features for
detecting detecting abuse
fraud and abuse (after
preprocessing of data):
1). Identifying the most
important attributes of
data by expert
domains
This increases the risk of Technical issues with a module Behind web data
reviewer bias and human of the Dark Web Portal comes the other significant
error affecting results, data source, email, which
especially with regards has the advantage of being
to the classification of both
borderline cases long established and well
used