0% found this document useful (0 votes)
22 views11 pages

An Exhaustive Research On The Application of Intrusion Detection

Uploaded by

Jayson Lariza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

An Exhaustive Research On The Application of Intrusion Detection

Uploaded by

Jayson Lariza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Hindawi

Journal of Sensors
Volume 2021, Article ID 5558860, 11 pages
https://doi.org/10.1155/2021/5558860

Research Article
An Exhaustive Research on the Application of Intrusion Detection
Technology in Computer Network Security in Sensor Networks

Yajing Wang ,1 Juan Ma ,1 Ashutosh Sharma ,2 Pradeep Kumar Singh ,3


Gurjot Singh Gaba ,4 Mehedi Masud ,5 and Mohammed Baz 6
1
Internet of Things Technology Department, Shanxi Vocational &Technical College of Finance & Trade, Taiyuan,
030031 Shanxi, China
2
Institute of Computer Technology and Information Security, Southern Federal University, Russia
3
Department of CSE, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
4
School of Electronics and Electrical Engineering, Lovely Professional University, Phagwara, Punjab 144411, India
5
Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099,
Taif 21944, Saudi Arabia
6
Department of Computer Engineering, College of Computer and Information Technology, Taif University, PO Box. 11099,
Taif 21994, Saudi Arabia

Correspondence should be addressed to Yajing Wang; [email protected] and Mehedi Masud; [email protected]

Received 17 February 2021; Revised 7 May 2021; Accepted 13 May 2021; Published 29 May 2021

Academic Editor: Omprakash Kaiwartya

Copyright © 2021 Yajing Wang et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Intrusion detection is crucial in computer network security issues; therefore, this work is aimed at maximizing network security
protection and its improvement by proposing various preventive techniques. Outlier detection and semisupervised clustering
algorithms based on shared nearest neighbors are proposed in this work to address intrusion detection by converting it into a
problem of mining outliers using the network behavior dataset. The algorithm uses shared nearest neighbors as similarity,
judges whether it is an outlier according to the number of nearest neighbors of a data point, and performs semisupervised
clustering on the dataset where outliers are deleted. In the process of semisupervised clustering, vast prior knowledge is added,
and the dataset is clustered according to the principle of graph segmentation. The novelty of the proposed algorithm lies in
outlier detection while effectively avoiding the dependence on parameters, thus eliminating the influence of outliers on
clustering. This article uses real datasets: lypmphography and glass for simulation purposes. The simulation results show that
the algorithm proposed in this paper can effectively detect outliers and has a good clustering effect. Furthermore, the
experimentation reveals that the outlier detection-based SCA-SNN algorithm has the best practical effect on the dataset without
outliers, clearly validating the clustering performance of the outlier detection-based SCA-SNN algorithm. Furthermore,
compared to the other state-of-the-art anomaly detection method, it was revealed that the anomaly detection technology based
on outlier mining does not require a training process. Thus, they overcome the current anomaly detection problems caused due
to incomplete normal patterns in training samples.

1. Introduction ing system. The intruders who do not have access to this
data may steal useful private information against the con-
With the widespread advancement in the Internet and sent of the network users. However, the firewalls are
online platforms, network security requirements have also placed in between two or more computers dedicated to
become inevitable [1, 2]. Various threats related to com- isolating these networks based on determining rules or
puter network security can be seen nowadays, like software policies. But these firewalls are not enough to be secured
bugs and intrusions. These bugs appear due to the large from such types of attacks. This is the scenario where
functionality and large size of the software or the operat- intrusion detection systems play a vital role in stopping
2 Journal of Sensors

the cyber attacks and analyze the security problems at the


time of such intrusions so that these situations can be
Network
tackled in the future [3–5]. The intrusion detection sys- traffic
tems collect the computer network information to track processing
the possibility of attacks or misuses against ethical concerns
[6, 7]. There are several types of network data concerns that Threat Prevention
fall into the category to be protected by intrusion detection, reporting system
like network traffic data, system status files, and system-
level test data [8–10]. There exist various applications of
network intrusion detection systems which are depicted Network intrusion
detection system
in Figure 1.
The network traffic processing application can convert
the traffic into various network parameter patterns, helpful Anomaly Threat
in management. The prevention system is liable to detect detection classification
the threats, and threat classification is done utilizing signa-
ture matching that is designated to match the input against Signature
the already present pattern. The other applications include matching
threat reporting and anomaly detection that detects the
traffic signatures [11, 12].
With the rapid development and application of computer
network technology and the increasing number of computer Figure 1: Applications of network intrusion detection system.
network users, ensuring the security of information on the
network has become a key technology of computer networks ysis done for the two datasets, followed by the concluding
[13–15]. However, various security mechanisms have been remarks in Section 5.
developed to protect computer networks, such as user autho-
rization and authentication, access control, data encryption, 2. Literature Review
and data backup. But the above security mechanisms can
no longer meet the current network security needs [16]. Net- Domestic research on intrusion detection technology and
work intrusions and attacks are still not uncommon. There- methods started relatively late, but with the in-depth
fore, intrusion detection is one of the key technologies that exploration of universities, scientific research institutes,
emerged in information and network security assurance. and enterprises, the development is very rapid, and many
Introducing intrusion detection technology is equivalent to new detection theories and results have been produced.
introducing a closed-loop security strategy [17, 18] into the The current research on intrusion detection technology
computer system. mainly covers neural networks, data mining, support vector
This article addresses intrusion detection by converting it machines, artificial immunity, etc., involving smart grids,
into a problem of mining outliers using the network behavior industrial infrastructure, industrial networks, and other
dataset. A preventive technique for intrusion protection of related fields [19–22].
computer network security is proposed to detect the outliers Sun et al. proposed an improved method of cascading
using the semisupervised clustering algorithms based on transmission edges. Using the character interval, the charac-
shared nearest neighbors. The nearest neighbor similarity ter interval can be used to represent several consecutive
criteria are used in this work to judge the outlier according characters, which can effectively reduce the number of trans-
to the number of nearest neighbors of a data point, and on mission edges. In addition, the two methods before and after
this basis, semisupervised clustering is performed for deleting the improvement were compared through comparative
the outliers. The novelty of the proposed algorithm lies in experiments. The results show that the number of transmis-
outlier detection while effectively avoiding the dependence sion edges can be reduced to 10% of that before the improve-
on parameters, thus eliminating the influence of outliers on ment, thereby increasing the efficiency of deep packet
clustering. This work used the real dataset for simulation inspection [23].
and compared it with the other anomaly detection technolo- Haojie et al. analyzed the potential security threats of 5G
gies. It was revealed that the anomaly detection technology in-vehicle networks and focused on intrusion detection
based on outlier mining does not require a training process. methods for in-vehicle networks. Four experimental scenar-
This overcomes the current anomaly detection problems ios were selected from potential attacks on the vehicle net-
caused due to incomplete normal patterns in training sam- work, and real car data were collected to compile various
ples. Furthermore, the proposed algorithm effectively detects attack databases for the first time. In order to find the appro-
outliers and provides good clustering outcomes based on the priate method to identify different attacks, four lightweight
similarity. intrusion detection methods are proposed to identify the
The rest of this article is arranged as follows: Section 2 abnormal behavior of the vehicle network. In addition, the
presents the state-of-the-art literature review followed by research carried out a comparison of the detection perfor-
the research methods depicted in Section 3. Section 4 pro- mance between the four detection methods with the con-
vides the results and discussion part of the experimental anal- sideration of comprehensive evaluation indicators. The
Journal of Sensors 3

evaluation results provide the best lightweight detection technology. Anomaly detection technology can also
solution for the vehicle network. This article helps to be called behavior-based intrusion detection technol-
understand the advantages of test methods in the detection ogy, which assumes that all intrusions have abnormal
performance of in-vehicle networks. Furthermore, it pro- characteristics. On the other hand, misuse detection
motes the application of detection technologies to safety technology, also known as knowledge-based intru-
issues in the automotive industry [24]. sion detection technology, expresses intrusion behav-
Zhang et al. took intrusion detection system (IDS) as the ior in attack mode and attack signature
research object, established an IDS model based on data min-
ing, obtained experimental results, and drew relevant exper- (3) According to the working method: it can be divided
into offline detection and online detection. Offline
imental conclusions. At the same time, it was compared
detection: it is a non-real-time system that analyzes
with traditional IDS, and six experiments were carried out.
audit events after the event and checks for intrusions.
As a result, the detection rate, false-negative rate, and false-
Online detection: real-time online detection system,
positive rate of two different methods in six experiments were
which includes real-time network data packet analy-
obtained. The experiment concludes that the intrusion detec-
tion system using data mining has better network protection sis and real-time host audit analysis
and security performance, and the detection ability of net- (4) The system network architecture is divided into cen-
work vulnerability intrusion is stronger. Thus, this research tralized detection technology, distributed detection
provides a new way to detect and research network protec- technology, and layered detection technology. The
tion security loopholes [25]. analysis result is transmitted to the adjacent upper
Kumar et al. proposed a model in which a set of training layer, and the detection system of the higher layer
examples obtained by using a network analyzer (i.e., Wire- only analyzes the analysis result of the next layer. In
shark) can be used to construct an HMM. Since it is not an addition, the hierarchical detection system makes
intrusion detection system, the obtained file trace can be used the system more scalable by analyzing the hierarchi-
as a training example to test the HMM model. It also predicts cal data [27–30].
the probability value of each test sequence and indicates
whether the sequence is abnormal. This article also shows a 3.2. Intrusion Detection System and Working Principle. An
numerical example; the example calculates the best observation intrusion detection system refers to the system used to detect
sequence for the HMM and state sequence probability [26]. various intrusion behaviors. It is an important part of the net-
The innovation of this paper is that the problem of intru- work security system. By monitoring the operation status of
sion detection can be converted into the problem of mining the network and computer system, various attack attempts,
outliers in the network behavior dataset. Compared with attack behaviors, or attack results are found. And then
other anomaly detection technologies, the anomaly detection promptly issue an alarm or make a corresponding response
technology based on outlier mining does not require a train- to ensure the confidentiality, integrity, and availability of sys-
ing process, which overcomes the current anomaly detection tem resources. Intrusion detection systems have been widely
faced with the problem of high false alarm rate caused by used and researched as an important means to resist network
incomplete normal patterns in training samples. This intrusion attacks [31, 32]. The basic intrusion detection sys-
paper describes the outlier mining algorithm based on tem for computer network security is depicted in Figure 2.
the similarity. The intrusion detection system is a typical “snooping
device.” It does not bridge multiple physical network seg-
ments (usually only one listening port). It does not need to
3. Research Methods forward any traffic, but only needs to passively and silently
collect the messages it cares about on the network. Based
3.1. Classification of Intrusion Detection. Through the
on the collected messages, the intrusion detection system
research of existing intrusion detection technology methods,
extracts the corresponding traffic statistical characteristic
intrusion detection technology can be classified from differ-
values. It uses the built-in intrusion knowledge base to per-
ent angles:
form intelligent analysis and comparison with these traffic
(1) According to the source of detection data, there are characteristics [33, 34]. According to the preset threshold,
three categories: host-based intrusion detection the message traffic with higher matching coupling will be
technology, network-based intrusion detection considered an offense. The intrusion detection system will
technology, and host- and network-based intrusion wake up and alarm or carry out a limited counterattack
detection technology. The above three intrusion according to the corresponding configuration. The principle
detection technologies all have their own advantages of intrusion detection is shown in Figure 3.
and disadvantages and can complement each other. The workflow of an intrusion detection system is roughly
However, a complete intrusion detection system divided into the following steps:
must be distributed based on both the host and the
(1) Information collection. The first of intrusion detec-
network
tion is information collection, which includes the
(2) According to the detection technology: divided into content of network traffic, the status, and behavior
anomaly detection technology and misuse detection of user connection activities
4 Journal of Sensors

Signature based detection

Host intrusion detection

Network intrusion detection

Anomaly based detection

Figure 2: Basic intrusion detection system for computer network security.

Current user behavior

User history
Intrusion detection Detection
behavior

No
Intrusion
Yes

Disconnect Record Restore

Figure 3: Intrusion detection principle.

(2) Signal analysis. The information collected above is of intrusion according to the actual monitored infor-
generally analyzed by three technical means: pattern mation. The prediction of the error rate of the next
matching, statistical analysis, and completeness anal- event reflects the abnormal degree of user behavior
ysis. The first two methods are used for real-time to a certain extent. At present, this method is widely
intrusion detection, while integrity analysis is used used, but the method is not yet mature, and there is
for postmortem analysis no more complete product [35–38]
(3) Real-time recording, alarm, or limited counterattack. (2) Probabilistic statistical anomaly detection. This
The fundamental task of IDS is to make appropriate method is based on the modeling of historical user
responses to intrusions. These responses include behavior, and based on early evidence or models,
detailed log records, real-time alarms, and limited the audit system detects the user’s use of the system
counterattack sources. The only technical methods to in real time, according to the user behavior probabil-
identify intrusions are user characteristics, intruder ity stored in the system. The statistical model is used
characteristics, and activity-based. The structure of to detect, and when suspicious user behavior is
the intrusion detection system is shown in Figure 4 found, it keeps track and monitors and records the
user’s behavior
3.3. Intrusion Detection Technology Methods. At present,
there are many standard intrusion detection technology (3) Expert system misuse detection. Aiming at character-
methods, and a few are listed below for explanation. istic intrusion behaviors, expert systems are often
used for detection. In the realization of the expert
(1) Neural network anomaly detection. This method can detection system, the knowledge of the safety expert
be self-learning and self-adaptable to user behavior is expressed through the rules of the If-Then
and can effectively process and judge the possibility structure (or a compound structure). Therefore,
Journal of Sensors 5

Information collection

Pattern matching Statistical analysis Integrity analysis

Real-time intrusion detection analysis Post-event intrusion detection analysis

Identification of intrusion

Real-time recording alarm or limited analysis

Figure 4: Intrusion detection system structure.

establishing an expert system depends on the the current anomaly detection problems. They are faced with
completeness of the knowledge base, which depends the problem of a high false alarm rate caused by incomplete
on the completeness and real-time nature of audit normal patterns in training samples.
records
3.4. Steps Involved in Proposed Intrusion Detection Algorithm.
(4) Model-based intrusion detection. Intruders often use The outlier mining algorithm proposed in this article is based
specific behavioral procedures when attacking a on the similarity index described in the following steps.
system, such as the behavioral sequence of guessing
passwords. This behavioral sequence constitutes Step 1. Enter the dataset: A matrix with n rows and m col-
a model with specific behavioral characteristics. umns indicates that each record in the original network
According to the attack represented by this model, record set of n intrusion detection has m characteristic
the behavioral characteristics of intention can detect attributes. Suppose the domain X = fx1 , x2 , ⋯, xn g is the
malicious attack attempts in real time object to be detected, and each object has m indicators,
namely, xi = fxi1 , xi2 , ⋯, xim g, i = ð1, 2, ⋯, nÞ, expressed as
Invasion technology has undergone rapid changes in a data matrix:
scale and method, and intrusion methods and techniques
have also progressed and developed. Outlier mining is an 0 1
x11 K x1m
important direction of research on intrusion detection tech- B C
nology. Outlier mining is to mine a small part of abnormal X=B
@M O M CA: ð1Þ
data from a large amount of complex data, which is novel
and significantly different from conventional data patterns. xn1 L xnm
Outlier mining is often anomalous data mixed in a large
amount of high-dimensional data, and these anomalous data Step 2. Find the set of isolated points in n objects: in order to
will bring serious consequences. Currently, in the field of judge the degree of dispersion of each object in x, first calcu-
intrusion detection research, many scholars apply cluster late the similarity coefficient r ij between each object pair and
analysis to anomaly detection. But through the analysis of form a similarity coefficient matrix, namely,
the characteristics of intrusions, it can be considered that 0 1
outlier mining technology is more suitable for anomaly- r 11 K r 1n
based intrusion detection than clustering technology. B C
R=B
@M O MC A, ð2Þ
Because there is a clear difference between normal behavior
and abnormal behavior, and in real applications, the number r n1 L r nn
of abnormal behaviors is much lower than the number of
normal behaviors [39–44]. Compared with the entire net- sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 m  
work behavior, the intrusion behavior is a small number of r=1− 〠 xik − x jk 2 , ð3Þ
abnormal data, which can be treated as an isolated point in n k=1
the dataset, which can better reflect the nature of the intru- n
sion. Therefore, intrusion detection can be converted into pi = 〠 r ij : ð4Þ
the problem of mining outliers in the network behavior data- j=1
set. Compared with other anomaly detection technologies,
the anomaly detection technology based on outlier mining Among them, pi is the sum of the ith row of the relative
does not require a training process. Therefore, it overcomes coefficient matrix. The smaller the value, the farther the
6 Journal of Sensors

Table 2: Data object distribution of the glass dataset.


Data collector
Classification Classification Percentage
General category Categories 1, 2, 3, and 7 89.8%
Outlier class Categories 5 and 6 10.2%
Analyzer

Table 3: Outlier detection results on the lypmphography dataset.

Alarm User database Direct Derivative Correct


K value Accuracy
isolation outliers isolation points
Figure 5: System structure of anomaly intrusion detection based on 8 3 12 4 66.7% (4/6)
similarity and isolated point analysis method. 12 5 10 4 66.7% (4/6)
16 8 15 6 66.7% (4/6)
Table 1: Data object distribution of lypmphography dataset.

Classification Classification Percentage Table 4: Outlier detection results on the glass dataset.
General category Categories 2 and 3 95.9%
Direct Derivative Correct
Outlier class Categories 1 and 4 4.1% K value Accuracy
isolation outliers isolation points
8 10 24 16 66.7% (16/22)
object i is from other objects, which is the postoption of the 10 12 28 18 73.7% (18/22)
isolated point set. 16 16 33 22 100% (22/22)

pmax − pi (iii) The user database uses an outlier mining algo-


λi = × 100%: ð5Þ
pmax rithm based on similarity sum to divide quanti-
tative data into the normal dataset and outlier
dataset. Then, transfer these two datasets to the
Among them, λ is the threshold, and objects with λi ≥ λ analyzer
are considered as outliers.
(iv) The analyzer receives the alarm information
(1) Anomaly intrusion detection system based on simi- from the outlier dataset, transmits the informa-
larity and outlier analysis method: the abnormal tion to the alarm, and then transmits the normal
intrusion detection system is to obtain the audit data dataset to the user database
of the system and use them as the original data value (v) The user database updates the normally
of the intrusion detection data source and user transmitted dataset so that the user database in
behavior characteristics, and then use the similarity the intrusion detection system can accurately
and outlier analysis method to divide the original describe user behavior characteristics. The
data values into normal datasets and isolated data- description of algorithm for intrusion detection
sets. Finally, point the dataset to determine whether
it is under attack
Step 1. Get the original data xi of the current user’s resource
(2) The system structure of abnormal intrusion detection usage at a certain moment.
is based on similarity and outlier analysis methods. It
is composed of collectors, analyzers, alarms, and user
databases. The structure diagram is shown in Figure 5 Step 2. Calculate the degree of dispersion of each object in X;
(3) Working principle of the proposed algorithm: that is, calculate the similarity coefficient r ij between each
object.
(i) The data collector mainly collects the original
audit data and then transmits the data to the
Step 3. Calculate Pi and λi in the ith row of the similarity
data analyzer
coefficient matrix.
(ii) The data analyzer has the functions of data
transmission and data analysis. On the one Step 4. If the object with λi ≥ λ is considered an outlier set,
hand, it receives the alarm information from there is abnormal behavior and alarms; otherwise, it belongs
the qualitative data and transmits it to the alarm. to normal user behavior. The user database is updated.
On the other hand, the data analyzer transmits From the perspective of time consumption, it is mainly
the quantitative data to the user database the comparison of distance. Although the anomaly detection
Journal of Sensors 7

0.6

0.55

Similarity coefficient (Ri)


0.5

0.45

0.4

0.35

0.3

0.25
100 300 500 700 900 1100
Binding force
C-Kmeans PCA
SCA-SNN HC

Figure 6: Experimental results on the lypmphography dataset.

technology of outlier mining adds extra time and space When K = 8, we get that the nearest neighbor set of a data
consumption than the cluster-based anomaly detection point contains very few objects, so we determine it as an out-
technology, it also improves the algorithm’s performance lier and analyze the data points in the nearest neighbor set of
and improves the performance of intrusion attacks and the outlier. So, when the K value is 8, we get 12 outliers,
detection rate. including 4 correct outliers. When K = 12, although the accu-
racy rate of outlier detection is not improved from the table
4. Results and Discussion above, the number of outliers obtained from analyzing the
characteristics of the classes decreases, and some data points
The experimental datasets in this article are all from UCI real that are judged incorrectly are removed.
datasets, and the experimental results are the average of data From this perspective, it is clear that the detection rate
obtained from multiple experiments. The performance judg- has increased by 7%. When K = 16, all 6 outliers were
ment of outlier detection is mainly based on analyzing the detected, and the detection rate reached 100%. The algorithm
proportion of correct outliers detected in all outliers, and also has apparent effects on the glass dataset (as shown in
the evaluation function of semisupervised clustering algo- Table 4).
rithm is used. The two semisupervised clustering algorithms: C-
The known number of paired constraints is the initial set Kmeans and Sine Cosine Algorithm-based sharing nearest
of constraints randomly generated by the system. The known neighbor (SCA-SNN), are evaluated in this study for outlier
constraints are subtracted from the evaluation index because, detection for both the lypmphography and glass dataset. Fur-
in the semisupervised clustering algorithm, the known super- thermore, the semisupervised clustering is performed on the
vision information cannot reflect the effect of the clustering “denoising” dataset after detecting the outliers. The experi-
algorithm. The experiment uses the lypmphography dataset mental results obtained from these methods are also com-
and the glass dataset for comparison experiments. The object pared with other state-of-the-art methods like hierarchical
distribution of the dataset is shown in Tables 1 and 2. clustering (HC) and principle component analysis (PCA) to
The experimental results of outlier detection are shown determine the effectiveness of semisupervised clustering.
in the table. The first column in the table is the K value. The experimental results are shown in Figure 6–9.
The second column indicates the number of isolated points Figure 6 presents four different algorithms for the lypm-
obtained by analyzing the nearest neighbors of the data phography dataset experimental outcomes before finding the
points; that is, the data points with very few nearest neigh- outliers and without performing the denoising step. The
bors are direct. It is judged as an isolated point. The second experimental dataset utilized in Figure 7 is the “denoising”
column indicates the number of isolated points obtained lypmphography dataset, which only contains the second
from the nearest neighbor set of the isolated point called a and third types of the original lypmphography dataset. For
derived isolated point. The fourth column refers to the true experimental comparison on this dataset (done in Figure 7),
isolated point among the isolated points obtained in the sec- it can be seen that as the number of paired constraints
ond column. Finally, the last column is the correct rate of increases, the effect of the SCA-SNN algorithm is steadily
outlier detection. increasing among all other algorithms. However, after
The experimental results of the lypmphography dataset removing the outliers, the C-Kmeans algorithm also provides
are shown in Table 3. Since the number of real categories in relatively stable performance, and there is no significant
the dataset is 4, the experiment starts training from K = 4. fluctuation of the clustering results. But from the overall
8 Journal of Sensors

1.05

0.95

Similarity coefficient (Ri)


0.85

0.75

0.65

0.55

0.45

0.35

0.25
100 300 500 700 900 1100
Binding force
C-Kmeans PCA
SCA-SNN HC

Figure 7: Experimental results of the “denoising” lypmphography dataset.

0.85

0.75
Similarity coefficient (Ri)

0.65

0.55

0.45

0.35

0.25
100 200 300 400 500 600 700 800
Binding force
C-Kmeans PCA
SCA-SNN HC

Figure 8: Experimental results on the glass dataset.

clustering results, the performance of the SCA-SNN algo- rithm is always better than that of the C-Kmeans, PCA, and
rithm is better than the C-Kmeans, PCA, and HC algorithm. HC algorithm.
All the algorithms do not have noticeable results on the Regardless of whether there are outliers in the dataset, the
original lypmphography dataset. Although the experimental clustering effect of the SCA-SNN algorithm is better than
results have improved as the number of paired constraints that of the C-Kmeans algorithm and the other state-of-the-
increases when the number of constraints reaches 1000, the art algorithms, especially after removing the outliers. On
correct judgment rate of the C-Kmeans algorithm is only the set, the SCA-SNN algorithm has better experimental
0.48, and the SCA-SNN algorithm only reaches 0.58, which results.
indicates that the data is concentrated. Furthermore, the out- From the above four experimental results, the outlier
lier data caused a great impact on the clustering results and detection-based SCA-SNN algorithm has the best experi-
weakened the guiding role of the paired constraints, resulting mental effect on the dataset without outliers, which shows
in the entire clustering algorithm without good results. that the detection of outliers is a crucial process and fully val-
Figures 8 and 9 are the experimental results on the glass idates the clustering performance of the outlier detection-
dataset. It can be found from Figure 8 that the C-Kmeans based SCA-SNN algorithm. In many practical applications,
algorithm exhibits its instability due to the existence of the dataset often contains some outliers. These outliers may
“noise” data. From the overall perspective of the clustering contain potentially valuable information. Therefore, mining
results, the clustering performance of the SCA-SNN algo- outliers can effectively improve the performance of clustering
Journal of Sensors 9

0.95

0.85

Similarity coefficient (Ri )


0.75

0.65

0.55

0.45

0.35

0.25
100 200 300 400 500 600 700 800
Binding force
C-Kmeans PCA
SCA-SNN HC

Figure 9: Experimental results of the “denoising” glass dataset.

and get the correct classification. It can also help people technology based on clustering. In general, the statistical
obtain more valuable information. distribution of abnormal and normal behavior in-network
data meets the conditions of use of outlier mining. Network
5. Conclusion security has always been a concern of people. However, with
the further development of the network and the diversifica-
This paper proposes an outlier detection and semisupervised tion of hacker attacks, there is still much research and chal-
clustering algorithm based on nearest neighbor similarity. lenging issues to be solved urgently.
The wood algorithm uses the C-Kmeans algorithm to train
the dataset, which can obtain a reasonable and accurate data Data Availability
sharing nearest neighbor set, and quickly and accurately
detect global outliers based on the obtained results, which All data has been shared within the manuscript.
also has a significant effect on local outliers. The algorithm
effectively avoids the insufficient preprocessing of noise
points and the influence of inaccurate input parameters on
Conflicts of Interest
the results. Also, it overcomes the problem of large calcula- The authors declare that they have no conflicts of interest to
tions such as the Jarvis-Patrick algorithm. In the process of report regarding this study.
semisupervised clustering, the acquired paired prior knowl-
edge is expanded to maximize the guiding effect of prior
knowledge. The algorithm detects outliers and effectively Acknowledgments
avoids the dependence on parameters and eliminates the The authors would like to acknowledge the support of
influence of outliers on clustering. The algorithm combines Taif University Researchers Supporting Project number
prior knowledge and expands, making the clustering process (TURSP-2020/239), Taif University, Taif, Saudi Arabia.
“rules to follow.” Experiments on real datasets show that the
outlier detection algorithm combined with semisupervised
clustering results in the best clustering results. Furthermore, References
the experimentation reveals that the outlier detection-based
[1] M. Masud, G. S. Gaba, S. Alqahtani et al., “A lightweight and
SCA-SNN algorithm has the best experimental effect on the robust secure key establishment protocol for Internet of med-
dataset without outliers. This approach shows that the ical things in COVID-19 patients care,” IEEE Internet of
detection of outliers is crucial and fully validates the cluster- Things Journal, 2021.
ing performance of the outlier detection-based SCA-SNN [2] M. Masud, M. Alazab, K. Choudhary, and G. S. Gaba, “3P-
algorithm. SAKE: privacy-preserving and physically secured authenti-
With the increasingly prominent network security issues, cated key establishment protocol for wireless industrial net-
the research of intrusion detection technology has attracted works,” Computer Communications, vol. 175, pp. 82–90, 2021.
more and more attention. An intrusion detection algorithm [3] R. G. Bace, Intrusion detection, Sams Publishing, 2000.
based on outlier data mining is given based on the in-depth [4] K. Scarfone and P. Mell, Guide to intrusion detection and pre-
study of data mining intrusion detection technology. Outlier vention systems (idps), vol. 800, no. 2007, 2007NIST Special
mining technology can complete anomaly detection work. Publication, 2007.
When the abnormal data is much smaller than the normal [5] G. Rathee, A. Sharma, R. Kumar, F. Ahmad, and R. Iqbal, “A
data, the detection result is better than anomaly detection trust management scheme to secure mobile information
10 Journal of Sensors

centric networks,” Computer Communications, vol. 151, [21] A. Jayaswal and R. Nahar, “Detecting network intrusion
pp. 66–75, 2020. through a deep learning approach,” International Journal of
[6] M. Poongodi, A. Sharma, V. Vijayakumar et al., “Prediction of Computer Applications, vol. 180, no. 14, pp. 15–19, 2018.
the price of Ethereum blockchain cryptocurrency in an indus- [22] S. Kumar, K. Singh, S. Kumar, O. Kaiwartya, Y. Cao, and
trial finance system,” Computers & Electrical Engineering, H. Zhou, “Delimitated anti jammer scheme for Internet of
vol. 81, article 106527, 2020. vehicle: machine learning based security approach,” IEEE
[7] B. Dayıoğlu, Use of Passive Network Mapping to Enhange Net- Access, vol. 7, pp. 113311–113323, 2019.
work Intrusion Detection, [M.S. thesis], University Library, [23] R. Sun, L. Shi, C. Yin, and J. Wang, “An improved method in
Middle East Technical University, Turkey, 2001. deep packet inspection based on regular expression,” Journal
[8] T. Lappas and K. Pelechrinis, Data Mining Techniques for of Supercomputing, vol. 75, no. 6, pp. 3317–3333, 2019.
(Network) Intrusion Detection Systems, vol. 92521, Depart- [24] H. Ji, Y. Wang, H. Qin, Y. Wang, and H. Li, “Comparative
ment of Computer Science and Engineering UC, Riverside, performance evaluation of intrusion detection methods for
Riverside CA, 2007. in-vehicle networks,” IEEE Access, vol. 6, pp. 37523–37532,
[9] G. Dhiman, K. K. Singh, M. Soni et al., “MOSOA: a new multi- 2018.
objective seagull optimization algorithm,” Expert Systems with [25] J. Zhang, “Detection of network protection security vulnerabil-
Applications, vol. 167, article 114150, 2021. ity intrusion based on data mining,” International Journal of
[10] G. Rathee, A. Sharma, H. Saini, R. Kumar, and R. Iqbal, “A Network Security, vol. 21, no. 6, pp. 979–984, 2019.
hybrid framework for multimedia data processing in IoT- [26] P. Narwal, D. Kumar, and S. N. Singh, “A hidden markov
healthcare using blockchain technology,” Multimedia Tools model combined with markov games for intrusion detection
and Applications, vol. 79, no. 15-16, article 7835, pp. 9711– in cloud,” Journal of Cases on Information Technology,
9733, 2020. vol. 21, no. 4, pp. 14–26, 2019.
[11] M. A. Aydın, A. H. Zaim, and K. G. Ceylan, “A hybrid intru- [27] H. Yao, Q. Wang, L. Wang, P. Zhang, M. Li, and Y. Liu, “An
sion detection system design for computer network security,” intrusion detection framework based on hybrid multi-level
Computers & Electrical Engineering, vol. 35, no. 3, pp. 517– data mining,” International Journal of Parallel Programming,
526, 2009. vol. 47, no. 4, pp. 740–758, 2019.
[12] V. Singh and S. Puthran, “Intrusion detection system using [28] A. Yang, Y. Zhuansun, C. Liu, J. Li, and C. Zhang,
data mining a review,” in 2016 International Conference on “Design of intrusion detection system for internet of things
Global Trends in Signal Processing, Information Computing based on improved bp neural network,” IEEE Access, vol. 7,
and Communication (ICGTSPICC), pp. 587–592, Jalgaon, pp. 106043–106052, 2019.
India, 2016. [29] S. Pundir, M. Wazid, D. P. Singh, A. K. Das, J. J. P. C.
[13] D. Rathore and A. Jain, “Design hybrid method for intrusion Rodrigues, and Y. Park, “Intrusion detection protocols in
detection using ensemble cluster classification and som net- wireless sensor networks integrated to Internet of things
work,” International Journal of Advanced Computer Research, deployment: survey and future challenges,” IEEE Access,
vol. 2, no. 3, pp. 181–186, 2019. vol. 8, pp. 3343–3363, 2020.
[14] M. Masud, G. S. Gaba, K. Choudhary, R. Alroobaea, and M. S. [30] S. Naseer, Y. Saleem, S. Khalid et al., “Enhanced network
Hossain, “A robust and lightweight secure access scheme for anomaly detection based on deep neural networks,” IEEE
cloud based E-healthcare services,” Peer-to-Peer Networking access, vol. 6, pp. 48231–48246, 2018.
and Applications, pp. 1–15, 2021. [31] X. Li, M. Xu, P. Vijayakumar, N. Kumar, and X. Liu, “Detec-
[15] M. Masud, G. S. Gaba, K. Choudhary, M. S. Hossain, M. F. tion of low-frequency and multi-stage attacks in industrial
Alhamid, and G. Muhammad, “Lightweight and anonymity- Internet of things,” IEEE Transactions on Vehicular Technol-
preserving user authentication scheme for IoT-based health- ogy, vol. 69, no. 8, pp. 8820–8831, 2020.
care,” IEEE Internet of Things Journal, 2021. [32] Y. Xun, J. Liu, and Y. Zhang, “Side-channel analysis for intel-
[16] W. Meng, E. Tischhauser, Q. Wang, Y. Wang, and J. Han, ligent and connected vehicle security: a new perspective,” IEEE
“When intrusion detection meets blockchain technology: a Network, vol. 34, no. 2, pp. 150–157, 2020.
review,” IEEE Access, vol. 6, no. 1, pp. 10179–10188, 2018. [33] A. Gupta, R. K. Jha, P. Gandotra, and S. Jain, “Bandwidth
[17] F. Farahnakian and J. Heikkonen, “Anomaly-based intrusion spoofing and intrusion detection system for multistage 5g
detection using deep neural networks,” International Journal wireless communication network,” IEEE Transactions on
of Digital Content Technology and its Applications, vol. 12, Vehicular Technology, vol. 67, no. 1, pp. 618–632, 2018.
pp. 70–118, 2018. [34] H. Yang and F. Wang, “Wireless network intrusion detection
[18] T. Qian, Y. Wang, M. Zhang, and J. Liu, “Intrusion detection based on improved convolutional neural network,” IEEE
method based on deep neural network,” Huazhong Keji Daxue Access, vol. 7, pp. 64366–64374, 2019.
Xuebao, vol. 46, no. 1, pp. 6–10, 2018. [35] M. Poongodi, A. Sharma, M. Hamdi, M. Maode, and
[19] R. Priyadharshini and E. J. Leavline, “Cuckoo optimisation N. Chilamkurti, “Smart healthcare in smart cities: wireless
based intrusion detection system for cloud computing,” Inter- patient monitoring system using IoT,” The Journal of Super-
national Journal of Computer Network and Information Secu- computing, no. article 3765, pp. 1–26, 2021.
rity, vol. 10, no. 11, pp. 42–49, 2018. [36] X. Xu, L. Li, and A. Sharma, “Controlling messy errors in vir-
[20] A. U. Makarfi, K. M. Rabie, O. Kaiwartya, X. Li, and R. Kharel, tual reconstruction of random sports image capture points for
“Physical layer security in vehicular networks with reconfigur- complex systems,” International journal of system assurance
able intelligent surfaces,” in 2020 IEEE 91st Vehicular Technol- engineering and management, pp. 1–8, 2021.
ogy Conference (VTC2020-Spring), pp. 1–6, Antwerp, Belgium, [37] G. K. Sodhi, S. Kaur, G. S. Gaba, L. Kansal, A. Sharma, and
2020. G. Dhiman, “COVID-19: role of robotics, artificial intelligence,
Journal of Sensors 11

and machine learning during pandemic,” Current Medical


Imaging, vol. 17, 2021.
[38] Y. Liu, Q. Sun, A. Sharma, A. Sharma, and G. Dhiman, “Line
monitoring and identification based on roadmap towards edge
computing,” Wireless personal communications, no. article
8272, pp. 1–24, 2021.
[39] M. Fan and A. Sharma, “Design and implementation of con-
struction cost prediction model based on SVM and LSSVM
in industries 4.0,” International Journal of Intelligent Comput-
ing and Cybernetics, vol. 14, no. 2, pp. 145–157, 2021.
[40] H. Sun, M. Fan, and A. Sharma, “Design and implementation
of construction prediction and management platform based
on building information modelling and three-dimensional
simulation technology in industry 4.0,” IET collaborative intel-
ligent manufacturing, 2021.
[41] X. Ren, C. Li, X. Ma et al., “Design of multi-information fusion
based intelligent electrical fire detection system for green
buildings,” Sustainability, vol. 13, no. 6, p. 3405, 2021.
[42] A. Sharma and R. Kumar, “A framework for pre-computated
multi-constrained quickest QoS path algorithm,” Journal of
Telecommunication, Electronic and Computer Engineering
(JTEC), vol. 9, no. 3-6, pp. 73–77, 2017.
[43] M. Poongodi, M. Hamdi, A. Sharma, M. Ma, and P. K. Singh,
“DDoS detection mechanism using trust-based evaluation sys-
tem in VANET,” IEEE Access, vol. 7, pp. 183532–183544, 2019.
[44] D. Kumar, A. Sharma, R. Kumar, and N. Sharma, “A holistic
survey on disaster and disruption in optical communication
network,” Recent Advances in Electrical & Electronic Engineer-
ing (Formerly Recent Patents on Electrical & Electronic Engi-
neering), vol. 13, no. 2, pp. 130–135, 2020.

You might also like