Fuzzy K-Mean Clustering To Preclude Cyber Security Risk: Problem Statement

This document discusses using fuzzy k-means clustering to classify cyber security logs into attack, unsure, and no attack categories. It first breaks the data into 3 clusters using fuzzy k-means clustering. It then manually labels a small sample of the data and trains a neural network classifier using the labeled data. This helps identify anomalies in cyber security logs and reduce false detections. The methodology combines artificial intelligence and analyst intuition to classify logs into the 3 categories.

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views5 pages

Fuzzy K-Mean Clustering To Preclude Cyber Security Risk: Problem Statement

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Fuzzy K-Mean clustering to preclude Cyber Security Risk

Problem Statement:

The growing prevalence of cyber threats in the world is affecting every network user. Numerous
security monitoring systems are being employed to protect computer networks and resources
from falling victim to cyber-attacks. There is a pressing need to have an efficient security
monitoring system to monitor the large network datasets generated in this process. A large
network datasets representing Malware attacks have been used in this work to establish an expert
system. The characteristics of attacker’s IP addresses can be extracted from our integrated
datasets to generate statistical data. The cyber security expert provides to the weight of each
attribute and forms a scoring system by annotating the log history. A semi supervise method is
used to classify cyber security log into attack, unsure and no attack by first breaking the data
into 3 cluster using Fuzzy K mean (FKM), then manually label a small data (Analyst Intuition)
and finally train the neural network classifier multilayer perceptron (MLP) base on the manually
labelled data. It helps in finding anomaly in a cyber security log, which generally results in
creating huge amount of false detection. The classification results are encouraging in segregating
the types of attacks. It also automate the integration process of datasets and implicitly send the
statistical data to the machine learning and data mining algorithms which would make a
complete end-to-end process of identifying attack-related traffic from the network datasets.

Literature Survey:

The voluminous amount of big data presents a great challenge when we attempt to study the
patterns, or association amongst the data. The advancement in handling big data enables many
industrial problems and challenges to be addressed. These industries and companies are now able
to understand and process volumes of data which was once beyond their reach. While many
domains have benefited through the use of big data technologies, cyber security is one field that
is just beginning explores the use of big data analytics. The ability to detect and deter cyber-
attacks can make or break the functional success of an enterprise [1]. Using big data,
organizations may be able to rigorously detect threats, create better defense mechanisms and
improve security. Security Information and Event Management (SIEM) systems [2] is a system
that is capable of analyzing data from several log files, however such systems are limited to the
amount of data they can handle. With systems such as Hadoop [3], cyber security data can now
be stored in a dedicated repository which can accommodate more than three months of data as
well as combining and analyzing real-time data along with historical data. Advanced persistent
threats (APT) is a network attack in which an unauthorized person gains access to a network and
stays there undetected for a long period of time [4]. The very nature of big data analytics which
deals with longer term data could potentially help to detect advanced persistent threats (APTs)
that manifest over time. Big data analytics play an important role in detecting advanced threats
and insider threats [5]. Monitoring systems can potentially minimize false alarms by providing
smarter analytics. Data analytics can be used to assist systems in collecting internal data by
merging with relevant external data to detect known patterns to stay ahead of malicious activities
or intruders. Currently, 8% of major global companies [5] have adopted big data analytics for
one or more use cases related to security and fraud detection. Gartner predicted that within a
year, this will be increased to 25% with a positive return on investment within six months of
implementation. Data analysis should be intelligent and timely as anything that is delayed will
lose its value, especially in the field of cyber security. Given that hackers are well aware of
security measures and other fraud detection measures that are employed by enterprises, they are
able to directly attack without any reconnaissance phase. Hence, to stay ahead of hackers,
enterprises can use big data analytics to improve monitoring and detection systems with
contextual data and apply smarter analytics. Data correlation techniques can be used among the
high-priority alerts and monitoring systems to detect patterns and get a bigger picture on the state
of security. Also, enterprises can opt for fast tuning of their rules and models to test against data
streaming close to real time. The Teradata report [6] states that the traditional methods that fall
short in detecting and preventing threats can be enhanced with big data analytics. Many big data
tools and techniques have emerged that can efficiently handle the volume and complexity of
various kinds of data, such as machine generated and network-related data. Also, the results from
the survey conducted by Teradata [6] indicates, that the shortcomings of traditional solutions in
detecting and preventing threats can be overcome by using big data analytics. Hence, big data
systems are parts of a cyber defense strategy for every enterprise to meet the needs of complex
and large scale analytics. A major concern with the cyber security monitoring process is that
when multiple security monitoring systems are employed and each system generates numerous
log files (such as security logs, network traffic logs), there is no well established system that can
identify the relationships among these log files and integrate them. These log files are crucial for
identifying attack related patterns and assist in early detection of APTs or any other malicious
attacks. The work in [2] identifies the challenges in dealing with big data analysis, such as
automating the whole process of locating, identifying and understanding the data. A good
database schema design is mandated prior to analyzing the dataset. Similarly, mining requires
data to be integrated, cleaned and efficiently accessible, which involves the use of effective
mining algorithms and big data computing environments. Labrinidis [7] also describes that
significant research is required to achieve automated integration of data sets as well as a suitable
database design, even for simpler analysis of a single data set. It is also essential that effective
mining techniques are used to extract information from the large datasets. The objective of this
research is to use an efficient expert system that tags on the expertise of cyber security expert
and allow them to input suitable weights for different attribute. The cyber security expert also
contributes to the scoring system based on the words in log file. We then adopt Fuzzy k-Means
(FKM) algorithm to create clusters of attackers and no attackers in order to segregate the attack-
related traffic from the network datasets. Our Analyst Intuition approach is inspired by Kalyan
[8] and Chang [9]. Kaylan [8] used semi-supervise approach for huge, unbalanced and unlabeled
data. The approach started with labelling a sample of data and trains the system and used that to
test against the remaining huge data. Likewise, Chang et. al. [9] also use similar method known
as “Expectation Regulated Neural Network” for DDOS attack. In this study, we collected 3 days
of data that chalks up to 36 million of log files amounting to 36 Gigabytes of data. The 3 days of
data include the logs from about 1200 computers. The logs files were obtained from firewall
server, intrusion detection system and anti-virus logs. The total amount of Malware instances is
about 60 cases. We apply data mining techniques to study the statistical data obtained from the
integrated datasets. These analytics help in identifying the attack related traffic from normal
traffic as well as extracting attack patterns. The Fuzzy k-Means (FKM) clustering algorithm were
performed to create attacker and non-attacker clusters on the time-related and connection-related
data obtained from the integrated datasets. Several models were generated through changing the
key parameters. The testing step was repeated several times to determine accuracy and efficiency
in results. The results obtained from the algorithms were validated against each other’s in
verifying the attack-related traffic. The FKM algorithm created three cluster in total: (i) cluster-1
consists of no attackers, (ii) cluster-2 consists of uncertain number of attacker, and (iii) cluster-3
consists of 364 non-attackers. One of the issue in cyber security is that different network security
systems and tools generate log files in different format that renders complexity in consolidation.
This research demonstrates the integration and analysis of datasets for identifying attack-related
traffic that can potentially lead to easier threat detection in cases where attacks occur on multiple
platforms.

Methodology:

A special semi supervise method is used which helps in classifying cyber security log into attack,
unsure and no attack by first, breaking the data into 3 cluster using Fuzzy K mean (FKM), then
manually label a small data (Analyst Intuition) and finally train the neural network classifier
multilayer perceptron (MLP) base on the manually labelled data. It helps in finding anomaly in a
cyber security log which is generally creating huge amount of false detection. The method of
including Artificial Intelligence (AI) and Analyst Intuition (AI) is also known as AI2 [8][9]. The
Fuzzy k-Means (FKM) clustering algorithm is used to create attacker and non-attacker clusters
on the time-related and connection-related data obtained from the integrated datasets. The model
is illustrated in Figure 1. The model starts by extracting data from application. The log
information is extracted for training and testing. The data is split into 3 clusters base on K-means
algorithm. The 3 clusters are no attack, unsure and attack. After that data is train using
Multilayer Perception Neural Network using 2/3 of the data. The remaining 1/3 of the data is
used for test. The original data is not labelled. From the log files, words are given a certain
weight and score is created from there. The above method use excel to visualize the data and
manually label the data into 3 classes, attack, unsure and no attack. From there model is trained
and use the model for classification. The classification system takes in expert view to provide
weightages according to the types of attacks as shown in Table I. Figure 1 The model that detects
anomaly from big data .The total amount of Malware instances is about 60 cases. Network traffic
data comes in at very high speed resulting in more than 1000 log files being generated every
second; we use batch processing instead of real-time processing.

Data mining techniques to study the statistical data obtained from the integrated datasets which
consist of attacks like Malware, Trojan, Passing off, Soft1026, and Virus. The expert system
allows cyber security expert to enter their inputs to form the scores. These analytics help in
identifying the attack related traffic from normal traffic as well as extracting attack patterns. The
Fuzzy k-Means (FKM) clustering algorithm is used to create attacker and non-attacker clusters
on the time-related and connection-related data obtained from the integrated datasets. The
clustering algorithm forms 3 clusters: Strong, Average, and Mild. The 3 clusters correspond to
the labels attack, unsure and no attack respectively. The input of the algorithm consists of 2
attributes, the first attribute is word-found and the second attribute is scoring. The corresponding
algorithm is shown in Algorithm 1. The K-means Euclidean distance is used as the distance
measure. Prior to clustering, fuzzification was performed. The system first looks for keywords
among data like worm, malware and mark the feature as 1 when keywords are encountered.
Expert weightage is then given and forms the scoring.

REFERENCES:

[1] Harper, Jelani. "Enterprise Threats: Big Data and Cyber Security." 11 June 2013. Dataversity
Education.

[2] Cardenas, Alvaro, Pratyusa Manadhata and Sreeranga Rajan. "Big Data Analytics for Security." IEEE
Security and Privacy 2013. Document.

[3] Hadoop, Apache. Apache Hadoop - Apache Software Foundation. 2005.

[4] Margaret Rouse. (2016 Dec). TechTarget. Retrieved from:

http://searchsecurity.techtarget.com/definition/advanced-persistentthreat-APT

[5] Gartner. (February 2014). Security Announcements - Retrieved from Gartner website:
http://www.gartner.com/

[6] Ponemon, Institute. "Big Data Analytics in Cyber Defense." Ponemon Institute Research Report.
2013.

[7] Labrinidis, Alexandros, and H. V. Jagadish. "Challenges and opportunities with big data."
Proceedings of the VLDB Endowment 5.12, 2012.

[8] Apache Mahout. The Apache Software Foundation. 2014. Retrieved from:
https://mahout.apache.org/general/faq.html

[9] Veeramachaneni, Kalyan, and Ignacio Arnaldo. "AI2: Training a big data machine to defend.”
ReRetrieved from: https://people.csail.mit.edu/kalyan/AI2_Paper.pdf

Data Analytics For CyberSecurity
100% (5)
Data Analytics For CyberSecurity
207 pages
KYC Test Cases
No ratings yet
KYC Test Cases
18 pages
Data Analytics Book
100% (1)
Data Analytics Book
278 pages
Data Mining: OLAP Operations
100% (1)
Data Mining: OLAP Operations
8 pages
Spatial & Web Mining
100% (1)
Spatial & Web Mining
45 pages
Modeling and Predicting Cyber Hacking BR
No ratings yet
Modeling and Predicting Cyber Hacking BR
78 pages
Cyber Threat Detection Synopsis
No ratings yet
Cyber Threat Detection Synopsis
14 pages
ML042250036 PDF
No ratings yet
ML042250036 PDF
241 pages
KVA Anusha - PGP12021 - BA
100% (1)
KVA Anusha - PGP12021 - BA
8 pages
Machine Learning: Notes by Aniket Sahoo - Part II
No ratings yet
Machine Learning: Notes by Aniket Sahoo - Part II
140 pages
Log Files For Intrusion
No ratings yet
Log Files For Intrusion
5 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
TCS
No ratings yet
TCS
21 pages
Saurabh Kansal Dec Month 2024 - 18 Feb
No ratings yet
Saurabh Kansal Dec Month 2024 - 18 Feb
12 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
D S M C P: ATA Cience Ethodology For Ybersecurity Rojects
No ratings yet
D S M C P: ATA Cience Ethodology For Ybersecurity Rojects
14 pages
Business and Data Analytics
No ratings yet
Business and Data Analytics
4 pages
Analyzing The Ipr Strategies and Its Challenges in Pharmaceutical Industry
No ratings yet
Analyzing The Ipr Strategies and Its Challenges in Pharmaceutical Industry
30 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
Sample Paper
No ratings yet
Sample Paper
7 pages
CYT180Week1 - Data Analytics and Cybersecurity
No ratings yet
CYT180Week1 - Data Analytics and Cybersecurity
25 pages
Complete Final Intrusion Detection
No ratings yet
Complete Final Intrusion Detection
70 pages
Big-Data-Analytics-in-Cyber-Defense FINAL
No ratings yet
Big-Data-Analytics-in-Cyber-Defense FINAL
32 pages
Munkhdorj Yuji 2017 Cyber Attack Prediction Using Social Data Analysis
No ratings yet
Munkhdorj Yuji 2017 Cyber Attack Prediction Using Social Data Analysis
27 pages
Lecture 7
No ratings yet
Lecture 7
35 pages
Nicd 327
No ratings yet
Nicd 327
40 pages
SAMPLING
0% (1)
SAMPLING
62 pages
Personal Statement - Ari
100% (1)
Personal Statement - Ari
3 pages
CBM & RCM Applied On Nuclear Power Plants
No ratings yet
CBM & RCM Applied On Nuclear Power Plants
11 pages
MCKC: A Modified Cyber Kill Chain Model For Cognitive Apts Analysis Within Enterprise Multimedia Network
No ratings yet
MCKC: A Modified Cyber Kill Chain Model For Cognitive Apts Analysis Within Enterprise Multimedia Network
27 pages
Business Research Chapter 2
100% (1)
Business Research Chapter 2
41 pages
2015KS Krishnappa-Big Data Analytics For Cyber Security
No ratings yet
2015KS Krishnappa-Big Data Analytics For Cyber Security
15 pages
ITJNS02 Final
No ratings yet
ITJNS02 Final
40 pages
Full Paper 341
No ratings yet
Full Paper 341
11 pages
Application of Data Mining Technology in Detecting
No ratings yet
Application of Data Mining Technology in Detecting
13 pages
Research Paper
No ratings yet
Research Paper
17 pages
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
4 pages
Ijhs 9745+1341 1349
No ratings yet
Ijhs 9745+1341 1349
9 pages
Indian Stock Market Prediction Using Deep Learning
No ratings yet
Indian Stock Market Prediction Using Deep Learning
6 pages
An Analysis of Cyber Crime Prediction Model in Financial Sector Using Big Data Analytics
No ratings yet
An Analysis of Cyber Crime Prediction Model in Financial Sector Using Big Data Analytics
10 pages
Data Mining For Security Applications
No ratings yet
Data Mining For Security Applications
6 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
20 pages
Hypothesis Testing in ML
No ratings yet
Hypothesis Testing in ML
3 pages
Data Mining Based Cyber-Attack Detection: Tianfield, Huaglory
No ratings yet
Data Mining Based Cyber-Attack Detection: Tianfield, Huaglory
18 pages
Linear Regression Experiment
No ratings yet
Linear Regression Experiment
6 pages
Machine Learning and Cyber Security: December 2017
No ratings yet
Machine Learning and Cyber Security: December 2017
8 pages
Mining High Utility Patterns in One Phase Without Generating Candidates
No ratings yet
Mining High Utility Patterns in One Phase Without Generating Candidates
17 pages
Effective Data Mining Techniques For Intrusion Detection and Prevention System
No ratings yet
Effective Data Mining Techniques For Intrusion Detection and Prevention System
5 pages
Paper 1
No ratings yet
Paper 1
4 pages
DWM Module 1 (1.1)
No ratings yet
DWM Module 1 (1.1)
11 pages
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
No ratings yet
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
10 pages
DDo S13
No ratings yet
DDo S13
12 pages
Factors Affecting Consumer Interest in Choosing A Coffee Store in Tangerang District
No ratings yet
Factors Affecting Consumer Interest in Choosing A Coffee Store in Tangerang District
13 pages
2023 Staar Questions
No ratings yet
2023 Staar Questions
53 pages
Big Data Architecture Incident Response
No ratings yet
Big Data Architecture Incident Response
19 pages
Introduction
No ratings yet
Introduction
6 pages
Analyst Intuition Based Hidden Markov Model On High Speed Temporal Cyber Security Big Data
No ratings yet
Analyst Intuition Based Hidden Markov Model On High Speed Temporal Cyber Security Big Data
4 pages
JETIRBS06068
No ratings yet
JETIRBS06068
5 pages
Progressive Band Selection Processing of Hyperspectral Image Classification
No ratings yet
Progressive Band Selection Processing of Hyperspectral Image Classification
5 pages
BETHEL RARAMI Big Data Assignment
No ratings yet
BETHEL RARAMI Big Data Assignment
11 pages
Bdaanormalydetection
No ratings yet
Bdaanormalydetection
7 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
PM WEB19311
No ratings yet
PM WEB19311
10 pages
Aricle-Towards A Framework To Detect Multi-Stage
No ratings yet
Aricle-Towards A Framework To Detect Multi-Stage
6 pages
Review of Multistage Cyber Attack: International Journal of Engineering Applied Sciences and Technology, 2015, 1, 1-8
No ratings yet
Review of Multistage Cyber Attack: International Journal of Engineering Applied Sciences and Technology, 2015, 1, 1-8
5 pages
Document 1
No ratings yet
Document 1
5 pages
Cis5206: Data Mining For Business Analytics and Cyber Security Sanatkumar Kantibhai Chaudhari (0061141617) Assignment 3 Case Study
No ratings yet
Cis5206: Data Mining For Business Analytics and Cyber Security Sanatkumar Kantibhai Chaudhari (0061141617) Assignment 3 Case Study
12 pages
A Review On Deep Ensemble Models To Detect An Classify Intruder Behavior IJERTCONV12IS01029
No ratings yet
A Review On Deep Ensemble Models To Detect An Classify Intruder Behavior IJERTCONV12IS01029
3 pages
On The Capability of An SOM Based Intrusion Detection System
No ratings yet
On The Capability of An SOM Based Intrusion Detection System
6 pages
29 Clustering Approach Based On Mini Batch Kmeans For Intrusion Detection System Over Big Data
No ratings yet
29 Clustering Approach Based On Mini Batch Kmeans For Intrusion Detection System Over Big Data
3 pages
Bivariate Correlation in SPSS
No ratings yet
Bivariate Correlation in SPSS
2 pages
Statistical Tool
No ratings yet
Statistical Tool
4 pages
Network Threat Characterization in Multiple Intrusion Perspectives Using Data Mining Technique
No ratings yet
Network Threat Characterization in Multiple Intrusion Perspectives Using Data Mining Technique
12 pages
1820 - Bloedorn Et Al. - Data Mining For Network Intrusion Detection How To Get Started
0% (1)
1820 - Bloedorn Et Al. - Data Mining For Network Intrusion Detection How To Get Started
9 pages
The Effect of Sponsorship On Marketing Communication Performance: A Case Study of Airtel Ghana
No ratings yet
The Effect of Sponsorship On Marketing Communication Performance: A Case Study of Airtel Ghana
15 pages
Data Mining Approach For Cyber Security
No ratings yet
Data Mining Approach For Cyber Security
7 pages
A Final Project Submitted in Partial Fulfillment of The Requirement For The Degree of Sarjana Pendidikan in English
No ratings yet
A Final Project Submitted in Partial Fulfillment of The Requirement For The Degree of Sarjana Pendidikan in English
63 pages
Internship Report
No ratings yet
Internship Report
13 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
3 pages
Research Proposal PDF
No ratings yet
Research Proposal PDF
12 pages
Table of Content
No ratings yet
Table of Content
3 pages
Network Intrusion Detection Using Association Rules: Flora S. Tsai
No ratings yet
Network Intrusion Detection Using Association Rules: Flora S. Tsai
3 pages
Ecologic Studies in EPIDEMIOLOGY: Concepts, Principles, and Methods
No ratings yet
Ecologic Studies in EPIDEMIOLOGY: Concepts, Principles, and Methods
21 pages
.Sem I - Statistical Foundations With Excel - 2024-25
No ratings yet
.Sem I - Statistical Foundations With Excel - 2024-25
4 pages
Layered Approach Using Conditional Random Fields For Intrusion Detection
No ratings yet
Layered Approach Using Conditional Random Fields For Intrusion Detection
7 pages
(Faculty of Commerce) Accountancy and Business Statistics: (Sem. I)
No ratings yet
(Faculty of Commerce) Accountancy and Business Statistics: (Sem. I)
3 pages
Utilizing Neural Networks For Effective Intrusion Detection: Martin Botha & Rossouw Von Solms
No ratings yet
Utilizing Neural Networks For Effective Intrusion Detection: Martin Botha & Rossouw Von Solms
15 pages
Characterization and Trends of Development in Data Mining Techniques For Intrusion Detection Systems (IDS)
No ratings yet
Characterization and Trends of Development in Data Mining Techniques For Intrusion Detection Systems (IDS)
7 pages
Application of Data Mining For Intrusion Detection: REG - NO:11109636
No ratings yet
Application of Data Mining For Intrusion Detection: REG - NO:11109636
12 pages
GNP-Based Fuzzy Class-Association Rule Mining in IDS
No ratings yet
GNP-Based Fuzzy Class-Association Rule Mining in IDS
5 pages
AFM 244 - Lecture 1
No ratings yet
AFM 244 - Lecture 1
3 pages
Anova
No ratings yet
Anova
10 pages
Review: Soft Computing Techniques (Data-Mining) On Intrusion Detection
No ratings yet
Review: Soft Computing Techniques (Data-Mining) On Intrusion Detection
8 pages
Finish Analisis Soal
No ratings yet
Finish Analisis Soal
7 pages
Cophenetic Correlation
No ratings yet
Cophenetic Correlation
2 pages
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
No ratings yet
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
10 pages
BAN 602 - Project2
No ratings yet
BAN 602 - Project2
4 pages
Model For Intrusion Detection System With Data Mining
No ratings yet
Model For Intrusion Detection System With Data Mining
4 pages
Big Data Security Using System Logs
No ratings yet
Big Data Security Using System Logs
7 pages