0% found this document useful (0 votes)

6 views12 pages

A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu

This article presents a novel fuzzy proximity relation approach for outlier detection in mixed datasets using a rough entropy-based weighted density method. The method aims to address the limitations of existing techniques that primarily focus on numeric or categorical data, demonstrating its effectiveness through comparisons with established outlier detection methods. The proposed approach is validated using a hiring dataset and benchmarked against Harvard dataverse datasets to highlight its performance and efficiency.

Uploaded by

guptaoindrilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu

Uploaded by

guptaoindrilla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Soft Computing Letters 3 (2021) 100027

Contents lists available at ScienceDirect

Soft Computing Letters

journal homepage: www.sciencedirect.com/journal/soft-computing-letters

A fuzzy proximity relation approach for outlier detection in the mixed

dataset by using rough entropy-based weighted density method
T. Sangeetha, Geetha Mary A *
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632 014, Tamil Nādu, India

A R T I C L E I N F O A B S T R A C T

Keywords: Data mining is an emerging technology where researchers explore innovative ideas in different domains,
Data Mining particularly detecting anomalies. Instances in the dataset which considerably deviate from others by their
Entropy common patterns are known as anomalies. The state of being ambiguous and not affording certainty of data
Fuzzy Proximity
exists in this world of nature. Rough Set Theory is a proven methodology which deals with ambiguity and un
Mixed Data
Rough sets
certainty of data. Research works that have been done until this point were focused on numeric or categorical
Weighted Density type, which fails when the attributes are mixed type. By using fuzzy proximity and ordering relations, the nu
merical data has been converted to categorical data. This article presented an idea for detecting outliers in mixed
data where the weighted density values of attributes and objects are calculated. The proposed approach has been
compared with existing outlier detection methods by taking the hiring dataset as an example and benchmarked
with Harvard dataverse datasets to prove its efficiency and performance.

1. Introduction exemptions, issues, abandons, distortions, commotion, or contaminants

in various application domains. In earlier days, outliers were discarded
Data can be defined as any matter, numerals, or content easily as noise or exceptions.
handled by a system. Nowadays, companies have a huge volume of data An anomaly may demonstrate wrong information. For instance, the
in various styles and aspects. It comprises operational information such information may have been coded mistakenly, or the analysis might not
as stock and finance, non-operational information like weather fore have been run accurately[16]. If the outlying point is erroneous, then it
casting and monetary information, and meta information (the infor can be corrected or removed from the dataset. It may not be conceivable
mation about the information itself), like the design of different to decide whether an outlying point has invalid information. If the in
databases or definitions for a word given in a dictionary [3]. Modeling of formation contains critical anomalies, we may need to think about the
data or providing the link between these objects will provide some in utilization of powerful,measurable systems[6]. But nowadays, much
formation. The point of sale system provides information regarding importance will be given to identify outliers. Because sometimes it may
when the products are sold. The information can be translated into hold some valuable information. It is vital to identify outliers in major
knowledge based on previous facts and by future predictions. The point domains such as criminal activities like misuse of mobile phones and
of sale system can be improved by knowing the buying behavior of the credit card activities, pattern recognition of malignant tumors, secured
customers. In recent years, massive data acquisition are amassed at the communication in the presence of third parties, malfunction of an
supermarkets, images produced by the satellites, and data present in the airplane engine, and artificial intelligence [2]
networking system [29] The intra region anomalies are determined by density-based and
A dataset may contain instances that have not adhered to normal inter-region anomalies are determined by distance-based methods[10].
behavior or deviate from the rest of the objects are termed as outliers Also, outliers can be identified in exceptional cases and the generation of
[11]. A dataset may be comprised of numerical, categorical, or mixed novel patterns. Mostly clustering techniques provide efficient outlier
types of data. It also alludes to discovering designs in the information detection rather than classification method. The statistical approach,
system that does not adjust to expected behavior. Exceptions have probability model, will also be used to determine outliers[17].
likewise been alluded to as abnormalities, dissonant perceptions, Outliers are reported in two categories: the labeled objects are

* Corresponding e-mail:
E-mail address: [email protected] (G. Mary A).

https://doi.org/10.1016/j.socl.2021.100027
Received 24 May 2021; Received in revised form 10 October 2021; Accepted 11 November 2021
Available online 18 November 2021
2666-2221/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 1. Different Methodologies of outlier detection.

treated as normal objects, and the remaining objects that are not labeled strategies. In particular, while accessing labeling objects, it can be uti
are identified as outliers. Each pattern will be assigned an outlier score lized, or with the closer unlabelled objects that are close by preparing a
by fixing the threshold value to determine the degree of outliers [18]. layout for normal objects. The layout of the ordinary object at that point
The similitude of data cannot be correspondingly measured if there can be utilized to identify outliers - the items which do not fit into the
exists much noise. But the similarity measure and density measure do layout of normal objects are anomalies [4]. To enhance the nature of
not suit high dimensional data [8]. Nowadays, researchers are focussing exception location, one can get assistance from models of unsupervised
on detecting outliers in high-dimensional data. Because so many works strategies.
were carried out to detect outliers for qualitative and quantitative data
[23]. The proposed approach probably suits mixed data with a high level 3. Rough set theory and fuzzy approximation space
of significance. Different methodologies for outlier detection techniques
are shown in Fig. 1. During the 1980s, Zdzislaw Pawlak[27], a Polish mathematician,
developed a mathematical tool called rough sets with lower and upper
2. Outlier detection method approximation concepts, which have crisp sets. However, it does not
need any prior or extra information about concerned data. There exists a
2.1. Supervised method strict association between vague and uncertain data. The rough set
approach demonstrates a clear association between these two ideas.
This technique displays data uniformity and anomaly. The specialists Vagueness is associated with sets, while uncertainty is associated with
label similar objects and objects that do not coordinate the model of components of sets. The data analysis with rough sets uses decision ta
ordinary objects as exceptions or outliers [1]. The normal data objects bles with structured rows and columns[12]. The columns of a table are
appear much than the outlier objects. This method has two classes attributes classified into two groups: condition and decision attributes.
(normal and outliers) which are imbalanced. The small amount of Each row specifies an object which induces some decision or result. If
sample data taken for training will not suitably be considered for outlier some conditions are satisfied, then the decision rule is certain; other
distribution. But labeling the true object as an outlier should not be wise, it is uncertain.
allowed. It is more important than outlier detection. It also implicates the thought of similarity. Let us consider the in
formation table IT=(W, X, Y, Z) where W is the universe which should be
2.2. Unsupervised methods nonempty, X is the set of attributes, Y and Z are the conditional and
decisional attributes[13]. The components of W are objects, entities,
In a few applications, labeling objects as "usual" or "exception" are items, or investigations. Attributes are also represented as features, as
not made. Consequently, an unsupervised learning technique must be pects, or characteristics.
utilized. Clustering can be done between normal objects and outlier Assume S=(V, RT) then the subset Y⫅V and an equivalence relation
objects[9]. Objects which deviate from normal behavior form one RT∈IND(S). The subsets of X, such as lower and upper approximation,
cluster, and the remaining objects fall into a normal category. The issues are defined as follows:
in unsupervised strategies are sometimes data that does not belong to
RTY = ∪{X∈V/RT: X⫅Y}
any group might be considered noise but not an outlier[35]. Also, it is
RTY = ∪{X∈V /RT: X∩ =∅}
Y

regularly expensive to design clusters first and to find anomalies. It is

or
typically expected that outlier objects are distant than objects which are
x ∈ RTX if and only if [x]RT ⫅X
considered to be normal. x ∈ RTXif andonlyif [x]RT ∩X ∕
=0

2.3. Semi-supervised methods From this, Boundary(X) = RT X -RT X will be called the RT boundary
of X. The boundary sets are included in the upper approximation but not
It can be viewed as the utilization of semi-supervised learning in the lower approximation. Rough sets are defined through the lower

2
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 2. Set Approximation

and upper approximation. Also, a boundary region is a devoid set (RT X to be calculated. If the value has a high deviation, then it is an outlier
= RT X).
∕ [34].
Multivariate Outlier Detection (MOD) is a traditional strategy for the
detection of outliers. It regularly demonstrates those perceptions that
3.1. Membership relation and approximation are found generally a long way from the focal point of the information
distributed. A few distance measures are executed for such detection
Membership relation is derived from approximation spaces. Both [19]. The Mahalanobis distance is an outstanding rule which relies upon
membership and set approximation are related to knowledge only[28]. evaluated parameters of the multivariate distribution.
The representation is shown below: The rough membership function also is used to detect outliers from
l∈T Lthenl ∈ TL the real-world dataset. One of the most popular distance-based ap
lT ∈Lthenl ∈ TL proaches is the Manhattan distance. When the threshold value increases,
this technique outdoes the performance of statistical approaches and
In which,∈Treads "l surely belongs to L for T" and ∈T, "l possibly distance-based methods. The efficiency can be improved by fixing the
belongs to L concerning T", is the lower and upper membership relation, proper threshold value. The clustering technique provides more accu
respectively. Fig. 2 depicts the set approximation. racy than the distance-based approach[37,38]. Small clusters can be
constructed by using Partitioning Around Medoids (PAM) to detect
3.2. Fuzzy approximation space with Rough Sets outliers from the dataset.
In neural networks, the data will be trained and tested. It is used to
In general, fuzzy sets are used to handle the issues in understand clear the ambiguity in patterns and is also an effective tool to retrieve
ability of the patterns, incomplete and noisy data, multimedia infor knowledge from large databases. The rough set method with the neural
mation, and intercommunication between persons resolves quickly network is defined well to handle data mining problems. A back
within the determined time [14]. The minimal and maximal approxi propagation algorithm has been employed using rough sets to avoid
mation for a fuzzy set B in Z as the fuzzy sets T ↓B , T ↑ B in Z as inconsistencies between data. The neural system learning model uses
backpropagation. Neurobiologists and therapists initially ignited this
(T↓B)(r) = infs∈S (R(s, r), A(s))
field to create and test neuron’s computational analog. The neural sys
(T↑B)(r) = sups∈S T(R(s, r), A(s))
tem is arranged so that input/yield units are associated with weights
T↓B and T↑B can also be determined as how much inclusion Tr in B and related to it [25].
overlap of Tr and B respectively [10], which is related to r ∈ T ↓ Aonly [r] Backpropagation learns by preparing an informational index of
tuples iteratively, which contrasts the system’s expectation of an indi
T⫅B and r ∈ T ↑ B only [r]T ∩ B ∕= 0.
vidual tuple with the known target. The objective target might be the
class name known for the preparation tuple (characterization issues) or
4. Related work
consistent instance (forecast). Each preparation tuple has weights
altered to limit the error of mean squared value between the system’s
Datasets make different clusters based on different labeling tech
expectation and the real target instance [20]
niques. A data item to be compared with these formed clusters that don’t
Rough Entropy has been used to measure the uncertainty of data.
belong to any cluster will be identified as an outlier[7]. For a single class
Each object and attribute is calculated with a weighted density value to
classification, a support vector data description (SVDD) method was
detect outliers. But clustering of data had not been done[21]. The
used. It determines a hypersphere that includes all normal data within
clustering approach can be improved by using RKM (rough K-means)
its space. The objects which out lies from the hypersphere are termed
with a preliminary centroid selection method [22]. Cluster validity
outliers. In k-means clustering, objects that are found to be similar under
index will be achieved by improved entropy-based rough K-means
a feature vector are formed into clusters, and any object that does not
(ERKM) method. In multi granulation rough sets, the decision was made
group under any cluster is outliers. In the local outlier factor method
by "OR" instead of "AND" logic. When the two attributes have
(LOF), the relative distance of an object with its neighborhood points is

3
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Table 1
Study on different outlier detection methods
S. Outlier Detection Advantages Disadvantages
No Method

1 Support Vector Data It detects outliers well in smaller sample sizes and produces effective results for If the sample sizes become larger, outlier detection is
description (SVDD) more intricate and scanty datasets. difficult.
2 k means clustering Even if the dataset is huge,outlier detection is possible Generally, outliers are to be discarded, but in this method,
outliers form a separate group.
3 Local Outlier Factor(LOF) The point at the smallest distance is considered an outlier to the cluster, which is The threshold value will be fixed to detect outliers. The
at a denser level. But in general outlier detection approaches, the point at the fixation of the threshold value will be based on the problem
smallest distance will not be considered an outlier. and the user.
4 Multivariate Outlier It detects outliers (of n features)in n-dimensional space. Finding distributions of n-dimensional space is difficult, so
Detection (MOD) training of the dataset would be required.
5 Partitioning Around When compared to other available partitioning algorithms, outliers are less Choosing k medoids is random; it gives a different result for
Medoids (PAM) noticeable in the PAM method the same dataset.
6 Backpropagation Method A deeper understanding of the data is not required. Particularly sensitive to noisy data.
7 Rough k Means(RKM) The weighted density method uses the Gaussian function to detect outliers in a When separating objects which are overlapped between
vague dataset. clusters, the approach is susceptible.
8 Entropy Rough k Means Effectively outliers are removed, which results in the formation of quality Centroid selection is random based on the Rough k means
(ERKM) clusters. Method(RKM)

contradictions and inconsistencies, multi granulation with a rough set intuitionistic fuzzy numbers, sorting functions, and intimacy coefficients
framework has been used [40]. So that, it needs effective computation. [45].We create the outranked set for each alternative and present a
The traditional approach of outlier detection was the statistical hybrid information table that includes a Multi-Attribute Decision-Mak
method where it applies to single-dimensional datasets alone. The model ing matrix and a loss function table.Multi-attribute decision-making
suitably fits for perceptible real-world datasets where the categorical (MADM) is a crucial component of modern decision sciences[46]. It
data has been converted into numerical data for the processing of sta refers to a decision problem of selecting the best alternative or ranking
tistical methods [5]. So it increases the processing time for tangled alternatives based on numerous attributes.A three-way decision has
datasets. The simple outlier detection method with no prior information been included in a multi-scale decision information system, which offers
needed for the processing of data is the proximity-based technique. But, a novel approach to addressing multi-attribute decision-making con
the calculation of distances between all objects results in high expo cerns in a multi-scale decision information system[47]. In addition, a
nential growth. The number of objects n and its dimensionality m is review has been made for outlier detection using data mining methods.
directly proportional to its time complexity. So it will not be suitable for The pros and cons of different outlier detection methods are shown in
high dimensional data. Table 1.
The parametric method is suitable for larger datasets because it has a
built-in distribution model. If any model fits the prescribed dataset, then 5. Proposed model
the outcome will be accurate. The data model grows with paradigmatic
complexity, not with the size of data. The only condition is the pre Detecting outliers is a major data mining technique that has signif
defined model should be fit for the available dataset. The nonparametric icant consideration inside different research groups and application
methods need prior information to process. domains. Numerous methods have been created to identify outliers but
In some cases, the prior knowledge will not be available, or the only on numerical data. Those methods cannot be applied directly to
computation cost will be high[32]. Many datasets use not only a categorical data. So the fuzzy proximity relation is introduced to convert
determined data model but also follow a random distribution model. It numerical data to categorical[36]. Then the Density and uncertainty of
may be applicable for regression and principal component analysis every object and attribute are calculated. For a stable dataset, the fixa
methods. In the pre-processing stage, parameter settings are to be made, tion of the threshold value is high, and for the unstable dataset, the
and later they should be processed. lower threshold value is fixed. In this way, outliers are removed
An outer perception, or anomaly, seems to diverge extraordinarily incredibly to improve the execution of data mining algorithms. In Fig. 3,
from other individuals where it occurs. A perception (or a subset of at the pre-processing stage, the mixed data is converted to categorical by
perceptions) gives off an impression of conflicting with the rest of the using fuzzy proximity relation in post-processing. Finally, a rough set
data [24]. Exceptions are defined as the focuses lie outwards from the entropy-based weighted density outlier detection approach is applied to
cluster but at the same time are isolated from the noise[30]. Patterns determine outliers.
with the well-defined notion of normal behavior, which are not
confirmed, are outliers, and the regions of network structure differs from
5.1. Roughset entropy-based weighted density outlier detection algorithm
expected under the normal behavior [26].
Social network anomaly detection focuses on outlier detection
A dataset may include missing data and some negative and null
techniques developed in machine learning and statistical domains [31].
values, which are outliers. So the dataset is defined to be vague and
Intrusion detection with anomaly detection was proposed through sys
incomplete. To handle this scenario, a rough set with a weighted
tem calls[33]. First, evaluate decision-makers preferences for each
density-based outlier detection method is proposed. In the pre-
choice and introduce the concept of pre-decisions, resulting in an
processing stage, numerical data is converted to categorical data by
incomplete fuzzy decision system[43]. Then, using the defined similar
using fuzzy proximity relation, and then it is ordered. In the post-
ity relation, the weighted conditional probabilities are determined. The
processing stage, similar objects are identified concerning attributes
concept of relative utility functions is next introduced, followed by a
using indiscernibility relation, and complement entropy measure is used
method for determining relative utility function values. Then, in
to calculate uncertainty values; the weighted density values are calcu
incomplete fuzzy decision systems, we build a three-way decision model
lated by identifying indiscernible objects divided by the total number of
and apply it to the modeling of incomplete multi-attribute decision-
objects to each attribute. Finally, the user fixes the threshold value. If the
making issues[44].
calculated value is lesser than the threshold, then they are treated as
On IFVIS(intuitionistic fuzzy-valued information systems), three
outlier objects. The following definitions will be used to detect outliers
alternative sorting decision-making procedures include subtracting
when the table has been converted from mixed to categorical type,

4
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 3. Proposed Model for Outlier Detection using Rough Sets.

which is discussed below: follows:

Definition 1:A. dataset DS is defined by the triplet DS=(Z, R, C) where ∑( ( ) )
Z represents the universe, R represents the objects, and C represents the WeightedDensity(C) = Average Density Rj , Z(C)
ri ϵR
attributes in a dataset.
Definition 6. Let us consider the dataset DS=( Z, R, C), and θ is a
Definition 2. Let DS= (Z, R, C) and RT⫅C. The indiscernibility relation fixed threshold value from the weighted density objects. If the value of
RT for r in R or s in C is represented as Weighted Density(R) <θ then r is termed an outlier.

{Z|IND(RT)} = {[r]RT | rϵZ }

6. An empirical study on hiring dataset
Z
Definition 3:Let. DS=(Z, R, C), and RT⫅ C and = {C1,C2,
IND(RT) A fabricated mixed dataset "Hiring" is designed with four conditional
…Cm}.The complement entropy (CPE) with respect to RT is defined as attributes Degree, Experience, French, and Reference for the effective
∑n c
|C| |C|j proposed approach. The attribute experience has numerical values, and
CPE(RT) = the remaining attributes such as degree, french, and reference have
|R| |R|
j=1
categorical values. So many algorithms are available for numerical data
to detect outliers. But, our proposed method uses fuzzy proximity rela
where Ccj denotes complement set of Cj, which is Ccj = R – C;
tion to convert numerical data to categorical data. The FPR(oi,oj),which
derives binary relation for the numerical attribute experience by using
Definition 4. Let DS= (Z, R, C), the weight of every attribute for C is
the formula,finds the almost similarity among the objects oi &oj.
defined as
⃒ ⃒
( ) ⃒oi − oj ⃒
1 − CPE(RT) FPR oi , oj = 1 − ( )
Weight(C) = ∑n ( ) oi + oj
j=1 Cj
Based on calculated values, the attribute experience is ordered. The
Definition 5:The. average Density of each attribute will be deter proposed algorithm has been applied to this dataset to detect outliers
mined as and graphs have also been plotted using the nominal values. The author
⃒[ ] ⃒
( ) ⃒ Rj C ⃒ has been conducted evaluations by comparing existing methods with the
AverageDensity Rj = proposed method for hiring dataset. The hiring dataset with 10 objects
|Z|
and mixedattributes are shown in Table 2 and Table 3 shows fuzzy
From that, the weighted Density of each object will be determined as proximity relation for the attribute experience.

5
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Table 2
17 29
Hiring Dataset - Mixed Type Weight of Attribute(Degree) = ; Weight of Attribute(Experience) =
54 54
Objects Degree Experience French Reference

E1 MBA 5.2 Yes Excellent 26 17

Weight of Attribute(French) = ; Weight of Attribute(Reference) =
E2 MSc 4.3 Yes Good 54 54
E3 MSc 3.4 No Neutral The weight of each object should be calculated by the summation of
E4 MBA 2.5 No Good the product of the weight of attributes with indiscernible objects.
E5 MCA 6.2 Yes Good
E6 MCA 3.1 Yes Neutral 4 17 3 29 6 26 3 17
W (E1 ) = × + × + × + × = 0.67;
E7 MBA 2.2 Yes Excellent 10 54 10 54 10 54 10 54
E8 MSc 3.2 No Excellent
W(E2 ) = 0.67; W(E3 ) = 0.75; W(E4 ) = 0.82; W(E5 ) = 0.67 ;
E9 MCA 2.7 No Good
E10 MBA 2.4 Yes Neutral
W(E6 ) = 0.85; W(E7 ) = 0.88; W(E8 ) = 0.75; W(E9 ) = 0.78;

Let the almost indiscernibility be ω ≥ 90%, From Table 2, thus, the W(E10 ) = 0.88.
objects E1 , E2 , E5 are ω- identical. Similarly, E3 , E4 , E6 , E7 , E8 , E9 , E10 are
If θ < 0.7, then the objects E1 ,E2 and E5 are outliers. The normal and
ω – identical.
outlier objects are shown in Fig. 4.
/
U Rω1 = {{E1 , E2 , E5 }, {E3 , E4 , E6 , E7 , E8 , E9 , E10 }}
7. Experimental results
Based on the similarity value of ω, the attribute experience is ordered
into two groups. The numerical values of the attribute experience for
The working model of outlier detection algorithm in mixed datasets
objects {E1 , E2 , E5 } are having greater values. So it is classified as High will be understood by, conducted experiments on a hiring dataset that
and the remaining objects {E3 , E4 , E6 , E7 , E8 , E9 , E10 } are classified as
has 120 objects with four conditional attributes of numerical and cate
Low. Now the numeric type of experience attribute is converted to gorical values. It has been implemented with Processor-Intel Pen
categorical, which is shown in Table 4.
tium,1GigaByte RAM, and the Windows10 operating system. Existing
Obtain indiscernible relation for each attribute. Objects that possess
methods like Distance-based, Density-based, Local Outlier Factor and
indiscernible values for attributes are:
Class outlier factor were analyzed using Rapid Miner 7.0. The concept of
/ { }
U IND (Degree) = E1, E4 , E7 , E10 , {E2 , E3 , E8 }, {E5 , E6 , E9 } Rough sets was implemented using C. It is a flexible language that is used
to implement mathematical models. The proposed algorithm has been
U/IND(Experience) = {E1 , E2 , E5 }, {E3 , E4 , E6 , E7 , E8 , E9 , E10 } run on a hiring dataset that is of mixed type. The fuzzy proximity rela
tion method was used to convert numerical value to categorical value,
U/ IND(French) = {E1 , E2 , E5 , E6 , E7 , E10 }, {E3 , E4 , E8 , E9 } and then it was ordered.
A rough set entropy-based weighted density outlier detection
U/IND(Reference) = {E1 , E7 , E8 }, {E2 , E4 , E5 , E9 }, {E3 , E6 , E10 }
The complement entropy function is to be calculated for each attri Table 4
bute with the obtained indiscernible relation. Converted Table – Mixed to Categoric Type
( ) ( ) ( ) Objects Degree Experience French Reference
4 4 3 3 3 3 33
CE(Degree) = 1− + 1− + 1− = E1 MBA High Yes Excellent
10 10 10 10 10 10 50
E2 MSc High Yes Good
( ) ( )
3 3 7 7 21 E3 MSc Low No Neutral
CE(Experience) = 1− + 1− =
10 10 10 10 50 E4 MBA Low No Good
E5 MCA High Yes Good
24 33 E6 MCA Low Yes Neutral
CE(French) = ; CE(Reference) =
50 50 E7 MBA Low Yes Excellent

Calculate each attribute weight by adding the total number of at E8 MSc Low No Excellent

tributes with the complement entropy function. E9 MCA Low No Good

E10 MBA Low Yes Neutral

Table 3
Fuzzy Proximity Relation -Experience Attribute
R1 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

E1 1.0000 0.9053 0.7907 0.6494 0.9123 0.747 0.5946 0.7620 0.6836 0.6316
E2 0.9053 1.0000 0.8832 0.7353 0.8191 0.8379 0.677 0.8534 0.7715 0.7165
E3 0.7907 0.8832 1.0000 0.8475 0.7084 0.9539 0.7858 0.9697 0.8853 0.8276
E4 0.6494 0.7353 0.8475 1.0000 0.5748 0.8929 0.9362 0.8772 0.9616 0.9796
E5 0.9123 0.8191 0.7084 0.5748 1.0000 0.6667 0.5239 0.6809 0.6068 0.5582
E6 0.747 0.8379 0.9539 0.8929 0.6667 1.0000 0.8302 0.9842 0.9311 0.8728
E7 0.5946 0.677 0.7858 0.9362 0.5239 0.8302 1.0000 0.8149 0.898 0.9566
E8 0.762 0.8534 0.9697 0.8772 0.6809 0.9842 0.8149 1.0000 0.9153 08572
E9 0.6836 0.7715 0.8853 0.9616 0.6068 0.9311 0.8980 0.9153 1.0000 0.9412
E10 0.6316 0.7165 0.8276 0.9796 0.5582 0.8728 0.9566 0.8572 0.9412 1.0000

6
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 4. Showing Normal and Outlier objects.

Fig. 5. Comparison Chart for Existing Methods with Proposed Method

method has been applied for effective outlier detection. Fig. 5 shows that top-class outlier and N represents the number of nearest neighbors. This
the comparison chart for an existing and proposed method for outlier algorithm detects ten outlier objects. Further, our proposed method
detection. In the distance-based outlier detection method, each data rough set entropy-based weighted density outlier detection method
point has been ranked based on the distance to its k-th nearest neighbor detects outliers by computing the weighted density value of all objects
[39] so that the top n data points are declared as outliers. It detects ten and attributes. It detects 18 outlier objects[42]. Our proposed algo
outlier objects. In density-based outlier detection method DensityBased rithm’s performance and efficiency are high compared to existing
(p, P), an object that deviates at least P distance from the p, the pro methods because it calculates weighted density values for every object
portion of all data objects is considered outliers. This method does not and attribute so that a true object will never be detected as an outlier.
detect any outlier objects. In the local outlier factor method, each object The comparison chart showing various outlier detection methods is
should be calculated with a local outlier factor based upon the local shown in Fig. 5.
density measure. Then it is compared with their l nearest neighbors [41]. Also, benchmark datasets such as the annthyroid dataset, breast
The objects which are having lower density values when compared cancer dataset, and letter dataset have been taken from Harvard data
with their neighbors are termed to be outliers. It detects seven outlier verse to show the proposed algorithm efficiency, which has been
objects. In the class outlier factor method, each data point in the sample compared with other existing outlier detection methodologies such as
will be ranked based on ClassOutlierFactor=(S, N) where S represents local outlier factor (LOF), feature-based(FB), isolation forest(IF), K-

7
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 6. Comparison of Proposed and Existing Outlier Detection Methods with Benchmark Datasets.

nearest neighbor (KNN), average KNN and histogram-based outlier score. The accuracy of a classifier is calculated as the total number of
score (HBOS). The local outlier factor determines the Density of an ob objects which are correctly classified to the total number of objects
ject with the distance of its neighbors. Feature bagging selects features of available. The formula to calculate accuracy is as follows:
the subsamples randomly and finally combines the values of all base
TP + TN
detectors, using the local outlier factor. Isolation forest observes data by Accuracy =
TP + FN + TN + FP
constructing a tree[15]. The isolated value score is determined as out
liers that are well suitable for high dimensional data. By constructing where TP is True Positive, FP is False Positive, TN is True Negative, and
histograms, outliers are detected in the histogram-based outlier score FN is False Negative. Thus, sensitivity or recall measures the true posi
approach. It is an unsupervised learning method that generates scores by tive values proportion, which is correctly identified, whereas specificity
considering the independent features.KNN identifies the nearest measures the true negative values proportion, which is correctly
neighbor to an object. Based on the distance, it calculates scores, and detected. The values are obtained from the formulas shown below:
outliers are identified. In the Average KNN method, super samples are
constructed for individual classes. The test data is given as an input, and TN
Specificity =
Average KNN searches samples available in super samples or closer. TN + FP
Others are identified as outliers. The comparison chart of the proposed TP
method with existing outlier detection algorithms for the benchmark Sensitivity =
TP + FN
datasets is shown in Fig. 6.
Other approaches like fuzzy bipolar soft set and Pythagorean fuzzy TP
Recall =
bipolar soft set are compared with the proposed method to prove its TP + FN
efficiency. The fuzzy-based bipolar soft set is used to analyze the patients
Precision or positive predictive value is the one that measures rele
with the help of membership degrees and decide whether the patient is
vant objects from the retrieved objects. The formula to calculate preci
hypomania, depression, or bipolar. Mostly it is used in decision-making
sion is as follows:
systems.On the other hand, the pythagorean fuzzy bipolar soft setis
mostly used in group decision-making situations. Personalization of the TP
Precision =
findings acquired is avoided because a common idea is derived from the TP + FP
opinions of all doctors. But the proposed method identifies indiscernible F1 score measure provides the balance between precision and recalls
values, computesEntropy, and then calculates each object’s weighted when the distribution of classes is not even. It becomes worse when its
density value and attribute to detect outliers. value is 0 and best when it is 1. The formula to calculate the F1 score is as
follows:
7.1. Measures for performance evaluation
2 ∗ Precision ∗ Recall
F1 Score =
The performance evaluation of benchmark datasets is measured by Precision + Recall
calculating their accuracy, specificity, sensitivity, precision, and F1

8
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Table 5 The WDOD technique appears to be particularly suitable for big data
Performance Evaluation - Annthyroid Dataset sets with high dimensionality and data sets with a high number of
Sl.No Measures LOF REBWDOD outliers, based on the results of these experiments. The WDOD algo
rithm’s growing rate of execution time is substantially slower than the
1 Accuracy 98.16% 99.57%
2 Specificity 1.0 1.0 local outlier factor algorithm. As a result, when the data size is huge,
3 Sensitivity 0.9813 0.9955 attributes are more the suggested WDOD algorithm can ensure efficient
4 Precision 1.0 1.0 execution in detecting outliers which are shown in Fig. 7, Fig. 8, and
5 F1 Score 0.9906 0.9978 Fig. 9.

8. Conclusion
Table 6
Performance Evaluation - Breast Cancer Dataset
In this paper, outlier detection for a mixed dataset has been pro
Sl.No Measures LOF REBWDOD posed. In the pre-processing stage, fuzzy proximity relations with order
1 Accuracy 99.18% 99.46% information rules convert numeric to categorical attributes. The rough
2 Specificity 1.0 1.0 set-based Entropy weighted density outlier detection method has been
3 Sensitivity 0.99 0.99
carried out to detect outliers in the post-processing stage. Research
4 Precision 1.0 1.0
5 F1 Score 0.9958 0.9972 works carried out so far detect outliers only for numeric or categorical
data, where mixed data was not considered. The proposed model detects
outliers in the hiring dataset, which has mixed data, by calculating their
Table 7 weighted density value so that the normal object will not be detected as
Performance Evaluation - Letter Dataset an outlier. However, the proposed algorithm is benchmarked with
Sl.No Measures LOF REBWDOD Harvard dataverse datasets such as the annthyroid dataset,breast cancer
dataset, and letter dataset compared with the existing local outlier factor
1 Accuracy 97.56% 98.69%
2 Specificity 1.0 1.0 outlier method to prove its efficiency and performance level. As the
3 Sensitivity 0.97 0.98 number of increasing objects and attributes, the proposed method en
4 Precision 1.0 1.0 sures efficient execution in detecting outliers.Future work will be
5 F1 Score 0.9872 0.9930 focused on detecting outliers where input is dynamic and in multi
granulation sets. The proposed work has some limitations, such that the
The performance evaluation of benchmark datasets such as annthy fixation of threshold value sometimes results in regular objects become
roid, breast cancer, and letter dataset are shown in Table 5, Table 6, and outlier and outliers become regular objects.This research did not receive
Table 7. any specific grant from funding agencies in the public, commercial, or
not-for-profit sectors.
7.2. Analysis of efficacy
9. Acknowledgements
The following three sorts of tests have been conducted to see how
each algorithm’s performance changes as factors change, such as the size Funding
of the data set, the dimensionality of the data set, and the number of
outliers[48].In comparison to the local outlier factor method, the WDOD No funding received for this research.
approach takes less time in terms of data size, data dimensionality, and
mark the number of outliers.

Fig. 7. Comparing execution time as the number of objects grows

9
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Fig. 8. Comparing execution time as the number of attributes grows

Fig. 9. Comparing execution time with an increased number of outliers

Competing interests Author contributions

The process of writing and the content of the article does not give The corresponding author claims the major contribution of the paper
grounds for raising the issue of a conflict of interest. including formulation, analysis and editing. The Second author provides
guidance to verify the analysis result and manuscript editing.

Availability of data and material Compliance with ethical standards

Not applicable This article is a completely original work of its authors; it has not
been published before and will not be sent to other publications until the
journal’s editorial board decides not to accept it for publication
Code availability (Definations 1 & 6) (Algorithm 1)

Not applicable

10
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

Algorithm 1 [16] F.E. Grubbs, Procedures for detecting outlying observations in samples,
The algorithm for the proposed model has been shown below: Technometrics 11 (1) (1969) 19–21. https://www.tandfonline.com/doi/abs/10.1
080/00401706.1969.10490657.
Input: Dataset DS(W,α, β) and θ be threshold values. [17] D. Hawkins, Identification of outliers, Monographs on Applied Probability and
Output: Set S holds outlier objects. Statistics (1980).
[18] V. Hodge, J.A. Austin, Survey of outlier detection methodologies, Artificial
Step 1:Start
Intelligence Review 22 (2) (2004) 85–126. https://link.springer.com/article/10.
Step 2: Input the dataset of mixed type.
1023/B:AIRE.0000045502.10941.a9.
Step 3: Use fuzzy proximity relation and ordering to convert numeric into
[19] J. Han, M. Kamber & J. Pei, Data Mining concepts and techniques, 2012.
categorical data. [20] E.M. Knorr, R.T. Ng, A unified approach for mining outliers, in: Proc. Conf. of the
Step 4: Let S = ∅ Centre for Advanced Studies on Collaborative Research (CASCON), Toronto,
Step 5: For each attribute βi ∈ β Canada, 1997. https://www.aaai.org/Papers/KDD/1997/KDD97-044.pdf.
Step 6: Calculate the indiscernibility function U/IND(αi ) according to definition2; [21] E.M. Knorr, N.G. RT, Algorithms for mining distance-based outliers in large
Step 7: Calculate the complement entropy according to definition3; datasets, in: Proc. Int. Conf. on Very Large Data Bases (VLDB), New York, NY,
Step 8:For each attribute βi ∈ β, compute weighted Density according to definition4; 1998. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.8026&rep
Step 9:For each object αi ∈ W, calculate the weighted Density according to =rep1&type=pdf.
definition5; [22] E.M. Knorr, R.T. Ng, Finding intensional knowledge of distance-based outliers, in:
Step 10: If (Weighted Density(αi) <θ) Proc. Int. Conf. onVeryLargeDataBases(VLDB)Edinburgh Scotland on Very Large
Step 11: S = S ∪ {αi}. Data Bases (VLDB), Edinburgh, Scotland, 1999. http://citeseerx.ist.psu.edu/vie
Step 12: Return S. wdoc/download?doi=10.1.1.45.9005&rep=rep1&type=pdf.
[23] A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, J.A. Srivastava, Comparative study of
Step 13: Stop.
anomaly detection schemes in network intrusion detection, in: Proceedings of
theThirdInternational Conference on Data Mining. SIAM, 2021. https://epubs.sia
m.org/doi/abs/10.1137/1.9781611972733.3.
Declaration of interests [24] A. McCallum, K. Nigam, L.H. Ungar, Efficient clustering of high-dimensional data
sets with application to reference matching, in: Proc. ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (SIGKDD), Boston, MA, 2000. https://dl.
The authors declare that they have no known competing financial acm.org/doi/abs/10.1145/347090.347123.
interests or personal relationships that could have appeared to influence [25] M. Markou, S. Singh, Novelty detection: A Review - Part1:Statistical Approaches,
the work reported in this paper. Signal Processing 83 (12) (2003) 2481–2497. https://www.sciencedirect.com/sci
ence/article/abs/pii/S0165168403002020.
[26] M. Markou, S. Singh, Novelty detection: A Review-Part 2: Neural Network based
References approaches, Signal Processing 83 (12) (2003) 2499–2521. https://www.sciencedir
ect.com/science/article/abs/pii/S0165168403002032.
[1] H.P. Achtert, L. Kriegel, E Reichert, R. Schubert, Zimek Wojdanowski, A Visual [27] Z. Pawlak, Rough Sets, J. Computer and Information Sciences 11 (1982) 341–356.
Evaluation of Outlier Detection Models, in: Proc. International Conference on https://link.springer.com/article/10.1007/BF01001956.
Database Systems for Advanced Applications (DASFAA), Tsukuba, Japan, 2010. htt [28] M.I. Petrovskiy, Outlier detection algorithms in data mining systems, Programming
ps://link.springer.com/chapter/10.1007/978-3-642-12098-5_34. and Computer Software 29 (4) (1988) 228–237. https://link.springer.com/ar
[2] C.C. Aggarwal, P.S. Yu, Outlier detection for high dimensional data, in: Proc.ACM- ticle/10.1023/A:1024974810270.
SIGMOD, Int. Conf. Management of Data (SIGMOD’01), 2001, pp. 37–46. Santa [29] F. Preparata & M. Shamos, Computational Geometry: an Introduction. Springer
Barbara,CA, https://dl.acm.org/doi/abs/10.1145/375663.375668. Verlag.
[3] A. Arning, R. Agrawal, &P. Raghavan, A linear method for deviation detection in [30] R. Banshal, N. Gaur, S.N. Singh, Outlier detection-Applications and Techniques in
large databases, in: Proc. Int. Conf. on Knowledge Discovery and Data Mining Data Mining, IEEE Conference 44 (12) (2016) 2862–2870. https://ieeexplore.ieee.
(KDD), Portland, 1996. OR, https://www.aaai.org/Papers/KDD/1996/KDD9 org/abstract/document/7508146.
6-027.pdf. [31] R. Kannan, H. Woo, C.C. Aggarwal, Outlier Detection for text data-An extended
[4] P. Ashok, &G.M.K. Nawaz, Outlier Detection Method on UCI Repository Dataset by version 23 (2000) 61–69. https://arxiv.org/abs/1701.01325.
Entropy-Based Rough K-means, J. Defence Science 11 (2016) 113–121. [32] R. Kaur, S. Singh, A review of social network-centric anomaly detection techniques
[5] V. Barnett, &T. Lewis, Outliers in statistical data, John Wiley and sons, 1994. 23 (1) (2016) 61–69. https://www.inderscienceonline.com/doi/abs/10.1504
[6] S.D. Bay, &M. Schwabacher, Mining distance-based outliers in near-linear time /IJCNDS.2016.080582.
with randomization and a simple pruning rule, in: Proc. Int. Conf. on Knowledge [33] P. J. Rousseeuw, & A.M. Leroy Robust regression and outlier detection. John Wiley
Discovery and Data Mining (KDD), Washington, DC., 2003. https://dl.acm.org/d & Sons, Inc., New York, NY, USA.
oi/abs/10.1145/956750.956758. [34] P.I.J. Ruts, Rousseeuw, Computing depth contours of bivariate point clouds,
[7] R.J. Beckman, R.D. Cook, Outliers Technometrics 25 (2) (1983) 119–149, https:// Computational Statistics and Data Analysis 23 (2021) 153–168. https://www.sci
doi.org/10.1080/00401706.1983.10487840. encedirect.com/science/article/abs/pii/S0167947396000278.
[8] M.M. Breunig, H.P. Kriegel, J. Sander, Identifying density-based local outliers, in: [35] D. Synder, MS thesis, Department of Computer Science, Florida State University,
Proc Acm Sigmod Conference, 2021, pp. 93–104. https://dl.acm.org/doi/abs/10 2001.
.1145/342009.335388. [36] S. Mitra, Sankar, P. Mitra, Data Mining in soft computing framework: A Survey, J.
[9] V. Chandola, A. Banerjee, V. Kumar, Anomaly Detection A Survey, ACM IEEE transactions on neural networks, 13 (2002) 132. https://ieeexplore.ieee.or
ComputingSurveys 41 (1) (2011) 58–66, 2011, https://dl.acm.org/doi/abs/10.11 g/abstract/document/977258.
45/1541880.1541882. [37] E. Savage, G. Becker, R. Zamudio, A survey of link mining and anomaly detection
[10] A.G. Christy, M. Gandhi, S.V. Subramaniyan, Cluster based outlier detection for 34 (3) (2015) 645–654.
cluster data 5 (5) (2012) 363–387. [38] J. Tang, Z. Chen, A.W. Fu, D.W. Cheung, Capabilities of outlier detection schemes
[11] D. Dasgupta, F.A. Nino, Comparison of negative and positive selection algorithms in large datasets, framework, and methodologies, Knowledge and Information
in novel pattern detection, in: Proceedings of the IEEE International Conferenceon Systems 11 (1) (2006) 45–84. https://link.springer.com/article/10.1007/s10
Systems, Man, and Cybernetics. Nashville, TN 1, 2000, pp. 125–130. https://ieeex 115-005-0233-6.
plore.ieee.org/abstract/document/884976. [39] V. Kumar, S Kumar, A.K. Singh, Outlier detection – A clustering based approach,
[12] M. Ester, H.P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering IJISME I (7) (2021) 383–387. ISSN:2319-6386.
clusters in large spatial databases with noise, in: Proc. Int. Conf. on Knowledge [40] X. Zhao, J. Liang, F. Cao, A simple and effective outlier detection algorithm for
Discovery and Data Mining (KDD), Portland, OR, 1996. https://www.aaai.org/Pa categorical data, J.Machine Learning and Cybernetics 5 (2014) 469–477. https://l
pers/KDD/1996/KDD96-037.pdf?source=post_page. ink.springer.com/article/10.1007%2Fs13042-013-0202-4.
[13] F. Jiang, Y. Sui, C. Cao, Outlier Detection Based on RoughMembership Function, [41] Z.A. Bakar, R. Mohamed, A. Ahmad, M.M. Deris, A comparative study for outlier
Rough Sets and Current Trends in Computing 4259 (2006) 388–397. https://link. detection techniques in data mining 10 (4) (2006) 371–395. https://ieeexplore.
springer.com/chapter/10.1007/11908029_41. ieee.org/abstract/document/4017846.
[14] S. Forrest, C. Warrender, B. Pearlmutter, Detecting intrusions using system calls: [42] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method
Alternate data models, in: Proceedings of the IEEE Symposium on Security and for very large databases, in: Proc. ACM SIGMOD Int. Conf. on Management of Data
Privacy, IEEE Computer Society, Washington, DC, USA, 1999, pp. 133–145. htt (SIGMOD), Montreal, Canada, 2021. https://dl.acm.org/doi/abs/10.1145/2359
ps://ieeexplore.ieee.org/abstract/document/766910. 68.233324.
[15] A. Ghoting, S. Parthasarathy, M. Otey, Fast mining of distance-based outliers in [43] J. Zhan, J. Ye, W. Ding, P. Liu, A novel three-way decision model based on utility
high dimensional spaces, in: Proc SIAM Int Conf on Data Mining(SDM) dimensional theory in incomplete fuzzy decision systems, IEEE Transactions on Fuzzy Systems
spaces, Bethesda, ML, 2006. https://link.springer.com/article/10.1007/s10618-00 (2021).
8-0093-2.

11
T. Sangeetha and G. Mary A Soft Computing Letters 3 (2021) 100027

[44] K. Zhang, J. Zhan, W.Z. Wu, On multi-criteria decision-making method based on a [47] W. Wang, J. Zhan, C. Zhang, Three -way decisions based multi-attribute decision
fuzzy rough set model with fuzzy α-neighborhoods, IEEE Transactions on Fuzzy making with probabilistic dominance relations, Information Sciences 559 (2021)
Systems (2020). 75–96.
[45] J. Zhan, H. Jiang, Y. Yao, Three-way multi-attribute decision-making based on [48] J. Deng, J. Zhan, W.Z. Wu, A three-way decision methodology to multi-attribute
outranking relations, IEEE Transactions on Fuzzy Systems (2020). decision-making in multi-scale decision information systems, Information Sciences
[46] J. Ye, J. Zhan, W. Ding, H. Fujita, A novel fuzzy rough set model with fuzzy 568 (2021) 175–198.
neighborhood operators, Information Sciences 544 (2021) 266–297.

Burned Final
No ratings yet
Burned Final
304 pages
188 1496475265 - 03-06-2017 PDF
No ratings yet
188 1496475265 - 03-06-2017 PDF
6 pages
A Survey On Outlier Detection Methods
No ratings yet
A Survey On Outlier Detection Methods
4 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Feature Bagging For Outlier Detection
No ratings yet
Feature Bagging For Outlier Detection
11 pages
Outlier Detection A Survey
No ratings yet
Outlier Detection A Survey
84 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
07 Outlier Detection
No ratings yet
07 Outlier Detection
54 pages
Compusoft, 3 (6), 831-835 PDF
No ratings yet
Compusoft, 3 (6), 831-835 PDF
5 pages
Outlier Detection
No ratings yet
Outlier Detection
10 pages
Ec 2645704571
No ratings yet
Ec 2645704571
2 pages
PAACDA Comprehensive Data Corruption Detection Algorithm
No ratings yet
PAACDA Comprehensive Data Corruption Detection Algorithm
8 pages
Detecting Outliers in High Dimensional Data Sets U
No ratings yet
Detecting Outliers in High Dimensional Data Sets U
6 pages
References
No ratings yet
References
6 pages
Six Sigma Methodology With Fraud Detection: 1 Applications of Data Mining
No ratings yet
Six Sigma Methodology With Fraud Detection: 1 Applications of Data Mining
4 pages
741 Outlier Detection
No ratings yet
741 Outlier Detection
55 pages
Unit - 3: Big Data Analytics
No ratings yet
Unit - 3: Big Data Analytics
23 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Outlier Detection For Different Applications Review IJERTV2IS3508
No ratings yet
Outlier Detection For Different Applications Review IJERTV2IS3508
13 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Lecture-8 Outlier Detection
No ratings yet
Lecture-8 Outlier Detection
72 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
Expt 2
No ratings yet
Expt 2
3 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
Unit 4
No ratings yet
Unit 4
17 pages
Unit 5
No ratings yet
Unit 5
70 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
Lecture 12
No ratings yet
Lecture 12
54 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
No ratings yet
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
10 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
44 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Module 11 (C)
No ratings yet
Module 11 (C)
4 pages
Anomaly Detection On Data Streams With H
No ratings yet
Anomaly Detection On Data Streams With H
6 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
12 Outlier
No ratings yet
12 Outlier
18 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
No ratings yet
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
16 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
Anomaly Detection For Web Log Based Data
No ratings yet
Anomaly Detection For Web Log Based Data
5 pages
12 Outlier
No ratings yet
12 Outlier
16 pages
12outlier 1
No ratings yet
12outlier 1
45 pages
Unit 4-2
No ratings yet
Unit 4-2
7 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
Anomaly Detection: World-Leading Research With Real-World Impact!
No ratings yet
Anomaly Detection: World-Leading Research With Real-World Impact!
72 pages
Outlier Detection
No ratings yet
Outlier Detection
30 pages
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
No ratings yet
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
8 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
13 pages
Paper 13
No ratings yet
Paper 13
16 pages
Exploring Anomaly Detection in Data Science: Applications, Methods, and Significance
No ratings yet
Exploring Anomaly Detection in Data Science: Applications, Methods, and Significance
16 pages
Unit-5 Outlier Analysis
No ratings yet
Unit-5 Outlier Analysis
32 pages
CC&BD Unit 4
No ratings yet
CC&BD Unit 4
12 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
46 pages
Economics, SSS1 - The Theory of Production (3) - Labour As An F.O.P
No ratings yet
Economics, SSS1 - The Theory of Production (3) - Labour As An F.O.P
3 pages
Monica Noa - Professional Resume - 2022 1
No ratings yet
Monica Noa - Professional Resume - 2022 1
2 pages
Developing An Effective Employee Orientation Program LB
No ratings yet
Developing An Effective Employee Orientation Program LB
7 pages
Analisa Upsr 2010 - 2014
No ratings yet
Analisa Upsr 2010 - 2014
8 pages
Title Proposal For Quantitative Research
No ratings yet
Title Proposal For Quantitative Research
3 pages
PBD TRANSIT FORM - ENG YEAR 2 (Checklist)
No ratings yet
PBD TRANSIT FORM - ENG YEAR 2 (Checklist)
3 pages
Assignment (JELLA MAE YCALINA)
No ratings yet
Assignment (JELLA MAE YCALINA)
2 pages
Physics Pratical
No ratings yet
Physics Pratical
12 pages
Foundation: Eudora Schools
No ratings yet
Foundation: Eudora Schools
5 pages
Lesson 3
No ratings yet
Lesson 3
6 pages
Service Catalogue For Amadeus Training
No ratings yet
Service Catalogue For Amadeus Training
10 pages
Minhaj Learning and Research Centre
No ratings yet
Minhaj Learning and Research Centre
6 pages
Enhancing Supply Chain Resilience: A Machine Learning Approach For Predicting Product Availability Dates Under Disruption
No ratings yet
Enhancing Supply Chain Resilience: A Machine Learning Approach For Predicting Product Availability Dates Under Disruption
21 pages
Massive Online Open Course (MOOC) On Gender Sensitization: 20 May 2022 - July15 2022
No ratings yet
Massive Online Open Course (MOOC) On Gender Sensitization: 20 May 2022 - July15 2022
2 pages
Assignment No.3
No ratings yet
Assignment No.3
8 pages
(Wk3) Debrief Rubric
No ratings yet
(Wk3) Debrief Rubric
1 page
Educ. Infin. 6, 2089-6867 (2017) .: Behavioral Research, 3 (2), 96-101
No ratings yet
Educ. Infin. 6, 2089-6867 (2017) .: Behavioral Research, 3 (2), 96-101
2 pages
CS8079-HCI Model Exam QB
No ratings yet
CS8079-HCI Model Exam QB
3 pages
Course Overview Mining Machinery 17
No ratings yet
Course Overview Mining Machinery 17
20 pages
Ccs333 Final QB
No ratings yet
Ccs333 Final QB
7 pages
Bai Tap Ve Tu Noi Trong Tieng Anh Linking Words Connectors
No ratings yet
Bai Tap Ve Tu Noi Trong Tieng Anh Linking Words Connectors
5 pages
Theories of Morality Chart
No ratings yet
Theories of Morality Chart
1 page
2012 March
No ratings yet
2012 March
8 pages
AP10 Q1 Mod 6 Lesson Plan
No ratings yet
AP10 Q1 Mod 6 Lesson Plan
5 pages
Interpreting SNT TC 1a - Part7
No ratings yet
Interpreting SNT TC 1a - Part7
2 pages
And He
No ratings yet
And He
1 page
Microsoft Windows Server 2016 Licensing
No ratings yet
Microsoft Windows Server 2016 Licensing
2 pages
Change Management or Organization Development
No ratings yet
Change Management or Organization Development
3 pages
New Prof Ed Monkayo June 14 2019
100% (2)
New Prof Ed Monkayo June 14 2019
148 pages

A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu

Uploaded by

A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu

Uploaded by

Soft Computing Letters 3 (2021) 100027

Contents lists available at ScienceDirect

Soft Computing Letters

A fuzzy proximity relation approach for outlier detection in the mixed

1. Introduction exemptions, issues, abandons, distortions, commotion, or contaminants

Fig. 1. Different Methodologies of outlier detection.

regularly expensive to design clusters first and to find anomalies. It is

Fig. 2. Set Approximation

Fig. 3. Proposed Model for Outlier Detection using Rough Sets.

which is discussed below: follows:

{Z|IND(RT)} = {[r]RT | rϵZ }

E1 MBA 5.2 Yes Excellent 26 17

tributes with the complement entropy function. E9 MCA Low No Good

Fig. 4. Showing Normal and Outlier objects.

Fig. 5. Comparison Chart for Existing Methods with Proposed Method

Fig. 7. Comparing execution time as the number of objects grows

Fig. 8. Comparing execution time as the number of attributes grows

Fig. 9. Comparing execution time with an increased number of outliers

Competing interests Author contributions

Availability of data and material Compliance with ethical standards

You might also like