Reverse Accessible in Local Outlier Factor Density Based Recognition
Reverse Accessible in Local Outlier Factor Density Based Recognition
e-ISSN: 2348-795X
Available at https://edupediapublications.org/journals
Volume 03 Issue 10
June 2016
Abstract: Recent data mining outlier to recognition data point the expected system to sufficient dataset or
is significantly many data exhibits that as dimensionality increases there exists hubs and anti hubs the
points that frequently occur in k nearest neighbor lists. Ant hubs are points that infrequently model in kNN
lists. .This proposed system to developing and comparing to unsupervised outlier detection models This
proposed method to details about the development and analysis of outlier detection methods is Local
Outlier Factor (LOF), and Local Distance-Based Outlier Factor(LDOF) .Outliers improves the results of
the previous systems to reference to speed, complexity and efficiency . The classification algorithms is
used to finding the relevant features and classify in the criteria in data mining methods. These techniques
suffer to increasing complexity, size and variety of data sets. The proposed incremental LOF algorithm
takes equivalent finding performance as the iterated static LOF algorithm while requiring significantly
less computational time. In addition, the incremental LOF algorithm is dynamically modify the data of
data points. This is a very important application, change data profiles to change over time. Moreover, we
have also given a broad comparison of the number of model the different outlier factors.
Index Terms: Clustering-based; Density-based and Model-based approaches; Nearest Neighbour; Outlier
Detection; Discrimination; Outliers; data mining; Clustering; Neural Network.
quickly say hubness in the setting of chart the usability of traditional similarity and distance
development for semi-directed learning. Also, measures. Parameter-free outlier detection
there have been endeavors to dodge the impact of algorithm [10] to compute Ordered Distance
center points in 1-NN time-arrangement order, Difference Outlier Factor Formulate a new outlier
obviously without clear mindfulness about the score for each instance by considering the
presence of the wonder (Islam et al., 2008), and to difference of ordered distances. Then use this
represent conceivable skewness of the circulation value to compute an outlier score.
of N1 in converse [8] nearest neighbor look where 3. Density Based approaches
Nk(x) indicates the quantity of times point x Distance-based approaches are known to face the
happens among the k closest neighbors of every local density problem created by the various
other point in the Data set. None of the said degrees of cluster density that exist in a dataset. In
papers, be that as it may, effectively break down order to solve the problem, density-based
the reasons for hubness or sum it up to different approaches have been proposed. The basic idea of
applications. Initially proposed outlier detection density-based approaches is that the density
algorithms determine outliers once all the data around an outlier remarkably varies from that
records (samples) are present in the dataset. We around its neighbors [14]. The density of an
refer to these algorithms as static outlier detection object‟s neighborhood is corelated with that of its
algorithms. In contrast, incremental outlier neighbor‟s neighborhood. If there is a significant
detection techniques identify outliers as soon as difference between the densities, the object can be
new data record appears in the dataset. considered as an outlier. To implement this
Incremental outlier detection was also used within idea,[11] several outlier detection methods have
more general framework of activity monitoring been developed recently. The detection methods
[18]. In addition, [19] proposed broad estimate the density around an object in different
requirements that incremental algorithms need to ways. [15] developed the local outlier factor
meet, [21] used on-line discounting distributional (LOF), which is amongst the most commonly a
learning of Gaussian mixture model and scoring used method in outlier detection. LOF is
based on the estimated probability density influenced by variations like local correlation
function. In [8] propose a outlier ranking based on integral (LOCI)[16],Local distance based outlier
the objects deviation in a set of relevant subspace factor(LDOF) [17], and local outlier
projections. It do not include irrelevant probabilities(LoOP)[18]. Now we will review
projections showing no clear difference between some density based outlier detection techniques.
outliers and the relevant objects and find objects Many outlier methods are proposed till date; these
which deviates in multiple relevant subspaces. existing methods can be broadly classified as:
The study in [9] distinguishes three problems distribution (statistical)-based, clustering-based,
occurred by the “curse of dimensionality” in the density based and model-based approaches [13].
context of data mining, searching and indexing Statistical approaches [12] assume that the data
applications like poor inequity of distances caused follows some standard or predetermined
by concentration, presence of irrelevant and distributions, and this type of approach aims to
redundant attributes, all of which make difficult find the outliers which don‟t follow such
distributions. The methods in this category always neural networks, decision trees or k-means, they
assume the typical example follow a particular require a training dataset to allow the network to
data distribution. Nevertheless, we cannot always learn. They autonomously cluster the input
have this kind of priori data distribution vectors through node placement to allow the
information in practice, mainly for high underlying data distribution to be modeled and the
dimensional real data sets [13]. normal/abnormal classes differentiated [18]. They
assume that related vectors have common feature
values and rely on identifying these features and
their values to topologically model the data
distribution. The neural network uses the class to
adjust the weights and thresholds to ensure the
network that can correctly classify the whole
network. These methods are also used to detect
the noise and novel data [19]. Neural Network is a
very crucial methodology that plays an important
role in the outlier detection.
scan and added to the set of outliers(step 05-13) In problem in which the item is excluded before we
this algorithm, the key step is computing the can make the choice. The problem formulated in
changed value of entropy. With the use of hashing this way gives rise to many overlapping sub
technique, I o(1) expected time ,we can determine problems--a hallmark of dynamic programming,
the frequency of an attribute value in and indeed, dynamic programming
corresponding hash table. Hence, we can
determine the decreased entropy value in o(m)
expected time since the changed values is only
dependent on the attribute values of the record to
be temporally removed.[17]
One of the simplest methods for showing that a
greedy algorithm is correct is to use a “greedy
stays ahead” argument. This style of proof works
by showing that, according to some measure, the
greedy algorithm always is at least as far ahead as
the optimal solution during each iteration of the
algorithm. Once you have established this, you
can then use this fact to show that the greedy
algorithm must be optimal.
required in five datasets and figure 3 shows the negative rate and improve the efficiency of
comparative study of outlier detection rate with density based outlier detection.
existing and proposed algorithms for outlier 7. CONCLUSIONS
detection. Outlier detection is very important and has
applications in wide variety of fields. So it
becomes important to learn how to detect outliers.
The main objective of this paper is to review
various outlier detection techniques and to study
how the techniques are categorized. So we can
conclude that, methods used for outlier detection
are application specific. The training algorithm
and testing algorithm are used for training and
testing the class. Reducing the search close to the
class boundaries saves computation time in
identifying such nuggets. Results from the
Fig 6: Data insertion time required evaluation on the real-time WDBC data sets
revealed that the proposed approach achieves
better performance than the existing classification
algorithm. Proposed a derived method which
improves in terms of speed and accuracy,
reducing the false positive and false negative rate
and improve the efficiency of density based
outlier detection The future implementation is in
machine learning techniques such as supervised
and semi-supervised methods.
8. FUTURE WORK
Future work on deleting data records from
database is needed. More specifically, it would be
Figure 7: Outlier detected interesting to design an algorithm with
Outlier-Detection Methods and the hubness exponential decay of weights, where the most
phenomenon, extending the previous recent data records will have the highest influence
examinations of (anti)hubness to large values of k, on the local density estimation. In addition, an
and exploring the relationship between hubness extension of the proposed methodology to create
and data sparsity Based on the analysis, we incremental versions of other emerging outlier
formulated the IQR,Greedy,AntiHub method for detection algorithms Connectivity Outlier Factor
semi-supervised and unsupervised outlier (COF) is also worth considering. Additional real-
detection, discussed its properties and proposed a life data sets will be used to evaluate the proposed
derived method which improves in terms of speed algorithm and ROC curves will be applied to
and accuracy, reducing the false positive and false quantify the algorithm performance.
[4] Nilam Upasania, Hari Omb, “Evolving fuzzy [11] V. Chandola, A. Banerjee, and V. Kumar,
min-max neural network for outlier detection” in “Anomaly detection: A survey,” ACM Comput
International Conference on Advanced Surv, vol. 41, no. 3, p. 15, 2009.
Computing Technologies and Applications
(ICACTA-2015) Elsevier [12] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and
J. Sander, “LOF: Identifying density-based local
[5] N., Zadrozny, B., and Langford, J. 2006. outliers,” SIGMOD Rec, vol. 29, no. 2, pp. 93–
Outlier detection by active learning. In 104, 2000.
Proceedings of the 12th ACM SIGKDD
International Conference on Knowledge [13] K. Zhang, M. Hutter, and H. Jin, “A new
Discovery and Data Mining. ACM Press, New local distance-based outlier detection approach for
York, NY, USA, 504–509. scattered real-world data,” in Proc 13th Pacific-
Asia Conf on Knowledge Discovery and Data
[6] V. Chandola, A. Banerjee, V. Kumar, Mining (PAKDD), pp. 813–822. 2009.
Anomaly detection: a survey, ACM Comput.41
(2009)15. [14] W. Jin, A. K. H. Tung, J. Han, and W.
Wang, “Ranking outliers using symmetric
[7] Milos Radovanovic, Alexandros Nanopoulos neighborhood relationship,” in Proc 10th Pacific-
and Mirjana Ivanovic, 2014. “Reverse Nearest Asia Conf on Advances in Knowledge Discovery
Neighbors in Unsupervised Distance Based and Data Mining (PAKDD), pp. 577–593, 2006.
Outlier Detection” IEEE Transactions on
Knowledge and Data Engineering,
N V S K Vijaya Lakshmi K is
Working As a Assistant Professor in
Dept Of IT, Sir C R Reddy College
Of Engineering, Eluru,Andhra
Pradesh.She Is Having 5 Years