BUGGING AND DEBUGGING THE DATASET USING
IMPROVED HIDDEN MORKOV MODEL FOR
CYBERSECURITY APPLICATIONS
Gopika. E
M.tech (NIS),
Pondicherry University
Under the Guidance of
Mr. K. Palanivel
System Analyst (Sr. Scale),
Computer Centre,
Pondicherry University.
Abstract- For the time being usage of devices like mobile Investigation on cybersecurity threats and models Kutub et al.,
phones, tablets, computer, laptops etc., have increased with (2015) tells [3] about various cybersecurity threats, frameworks,
population. In companies as well as citizens their information and security models. They focused on email security, firewalls,
has been stored or recorded in these devices in sake of and vulnerabilities. There are general recommendations on how
memories. Along with this, the bugs has been increased, to secure the password but not any authenticated protocol to
attackers attack the system for stealing the information. So, protect the system inherently. So, Password security is must to
Cybersecurity comes into picture. The Paper aims to do more studies in terms of techniques and models from this to
automatically fix the bug from the dataset using improved ensure passwords are protected.
hidden morkov model which utilizes Baum-welch genetic
algorithms for application like Mozilla, Fire Fox. To investigate and the relationship between the object-oriented
metrics and defect prediction. Dharmendra et al., (2016)
Keywords- Cybersecurity, Deep Learning, Bugging,
proposed an object oriented Software Bug Prediction System
Debugging, feature extraction.
(SBPS) [16] by computing bug the accuracy of the proposed
I . RELATED WORKS model of the datasets. They took defected datasets from Promise
Software Engineering Repository, some datasets gave similar
Nowadays, deep learning becomes most popular in research prediction, accuracy result and some gave different results in
field, and this have been successfully applied in various domains terms of accuracy at prediction stage in one dataset. The average
like computer vision, speech recognition, natural language accuracy of this work was 76.27% at predictions. In future the
processing, malware detection etc., real time datasets should be analyse in this proposed model and
Aim of Aleksandar et al., (2015) to evaluate computer IDS should give maximum average accuracy. Whereas in software
(Intrusion Detection System) the evaluation [1] study and bug prediction S. Delphine et al., built the data model using three
existing approaches to set goals. They classified their discussion methods logistic regression, Naïve Bages and Decision Tree
in three keys: Workload, Metrics and Measurement which are less accuracy for predicting software bug. There are
methodology. The most commonly considered IDS properties in multiple algorithms which can be used to predict the bug
evaluation studies for various IDS types. Some are Attack occurrence. For better accuracy, instead of constructing the
detection and reporting speed, Resistance to evasion techniques, individual algorithms, multiple decision trees are used to predict
Resource consumption related, Performance overhead and the final outcome with the help of bagging logic. To eliminate the
Workload processing capacity. Approaches of evaluating IDSes biasness in the dataset accuracy K fold validation is applied. In
in workloads in executable form are Workload drivers, Manually future this model can be improved in Artificial Neural Network
Generations, Manually assembly, Readily available exploit for high accuracy of the predictions.
database, Vulnerability and attack injection and In Trace form
Real-world production traces, Publicly available traces, Testbed How the virus invades the computer through Wi-Fi. As much
environment, Honeypots. Whereas in metrics Basic, Composite, software’s are there to identify it but its computer language is
ROC curve, Cost based/Information theoretic. These were they coded in a way that make it more difficult task. Muhammad et
evaluated the computer Intrusion System and Survey was written al., (2017) proposed novel deep feature extraction selection (D-
based on the common practices. FES) [9] helps in such a way that it easily able to analyse the
target where it has been attacked and its accuracy is more. D-FES
1
combine more feature taking and more weightage of selection with similar models and testing against the cyber injects from real
technique which helps in attack of other sys virus. SAE has high world cyber-attack. Focus of future work should be necessary
level capacity to get the data from big Wi-Fi network and it can complexity to ensure that the controls are layered based on both
be easily learnt and accessible to open Wi-Fi can easily be hacked their MILs and their relative weights/masses.
and data can be changed. This software helps in separating the
original and false data and thus can be removed. In future, we The [10] proposed approach automates the severity assessment
plan to extend D-FES to detect any attack classes not limited to process and helps users by subtracting the severity assignment
impersonation attack only, have the capability to identify an step from bug reporting. Waheed et at., (2019) perform the cross-
unknown attack that exploits zero-day vulnerability and fit the project evaluation on the history-data of the open source products
distributed nature of the IoT environment, which is characterized of Eclipse and Mozilla. The evaluation results suggest that the
by limited computing power, memory, and power supply. In proposed approach outperforms the state-of-the-art approaches.
Malware detection Ding et al., (2017) represent [26] the malware
opcode sequences and detection using deep learning algorithm Guenseok et al., (2020) suggest a technique [4] for bug
(Deep Brief Networks). They proposed using unlabeled data to localization and repair. First, they were collected information
improve the accuracy of malware detection models. Because this
from bug report and program source code. Then they gave these
DBN provides more accuracy than the other baselines model.
as input to autoencoder. The output of autoencoder was input to
Use the DBNs as an autoencoder to extract the feature vectors of
CNN. Compare the old and new bug report and program source
executables. Experiments show that compared with the baseline
code. Here they implemented Seq-GAN algorithm for program
models, the proposed models can obtain the same performance source code, which will check line by line in source code and
while using only a small number of features. In future to bugs were found. They did this only for C programming.
investigate how the amount of unlabeled data affects the
However, for various language bug repair are require further
performance of DBNs.
research which can be improved in future. Mohamed et al.,
(2020) deep learning [15] approaches for cybersecurity intrusion
Matilda et al., (2018) proposed [13] a novel malware prediction detection, use of datasets and comparative study. Deep leaning
model based on recurrent neural network. That means malicious approaches for deep administrative models and unsupervised
payload has been executed before the attack detected. This model models. They describe the 35 cyber datasets and classified these
will reduce dynamic detection time, less than 5 sec per file. This into seven. The three important performances were false alarm
will be preventing the attacks. Future work can build on these rate, accuracy and detection rate for machine learning methods
results to integrate file-specific behavioural detection into were compared with two datasets. Priyanka et al., (2020)
endpoint anti-virus systems across different operating systems. discussed [29] about deep learning with cybersecurity and
Qixue et al., (2018) considered the risks caused by these reviewed on papers between 2014 to 2019 nearly 80 research
vulnerabilities by studying their impact on common deep papers of Deep learning approaches such as Convolutional
learning applications such as voice recognition and image Neural Network (CNN), Auto Encoder (AE), Deep Belief
classification. They created this paper to alert the researchers to Network (DBN), Recurrent Neural Network (RNN), Generative
not forget conventional threats and actively look for ways to Adversal Network (GAN) and Deep Reinforcement Learning
detect flaws in the software implementations of deep learning (DIL) are used to categorize the papers referred. Each specific
applications. technique is effectively discussed with its algorithms, platforms,
dataset, and potential benefits.
For efficient Intrusion Detection in big data Environment
Mohammed et al., (2019) proposed [2] a deep CNN Rahul et al., (2021) proposed the first deep learning based general
(convolutional neural network) and WDLSTM (Weight-Dropped techniques for bug localization in student program. Use of Novel
Long Short-Term memory) network for big data environment. Tree Convention Neural Network to predict whether program
They use deep CNN some features from the IDS (Intrusion pass or fail. State of the Art Neural Prediction Attribution
Detection System) data. Then WDLSTM, was to retain data long techniques used to find locate bugs in lines of program. [24] They
term dependencies and solved overfitting problems. They compared with 3 baselines like one static and two state of the art
showed good results and performance by achieving on UNSW- dynamic bug localization techniques. This is only evaluated on
NB15 big dataset. In future, to obtain real IDS can make this student program. May be in future this work should be used in an
proposed deep CNN and WDLSTM are analysed for more arbitrary program in the context of regression techniques and this
complex and bigger dataset. Suresh et al., (2019) discuss [5] techniques has been used in C programs, to future improve other
issues in each area and recommend practices to enhance BBP languages can be used in this.
effectiveness. Also informs research and practice about issues
and best practices in crowdsourcing information security for
timely discovery and remediation of vulnerabilities. Sri et al.,
(2019) they were designed a tool based on policies and standards
which was defined by US Department of energy and NIST
(National Institute of standards and technology). Well explained
architecture of CyFEr (Cybersecurity vulnerability mitigation
Framework through Empirical Paradigm), which was compared
2
BASE PAPER STUDY [7] Sri Nikhil Gupta Gourisetti, Michael Mylrea, Hirak Patangia “Cybersecurity
vulnerability mitigation framework through empirical paradigm: Enhanced
prioritized gap analysis” (2019).
AUTHOR: Abdullah Moaid Mohammad Ai-Shehri [8] Sun N, Zhang J, Rimba P, Gao S, Zhang LY, Xiang Y,“Data-driven
cybersecurity incident prediction: a survey”, (2019).
Mourad Elluomi [9] Muhamad Erza Aminanto, Rakyong Choi, Harry Chandra Tanuwidjaja,
“Deep Abstraction and Weighted Feature Selection for Wi-Fi Impersonation
Detection” (2017).
PUBLISED: 28 April 2021 [10] WAHEED YOUSUF RAMAY, QASIM UMER, XU CHENG YIN,
CHAO ZHU, AND INAM ILLAHI “Deep Neural Network-Based Severity
INTRODUCTION: Prediction of Bug Reports” (2019).
[11] Mohamed Amine Ferrag, Leandros Maglaras, Sotiris Moschoyiannis, and
Helge Janicke “Deep Learning for Cyber Security Intrusion Detection:
For network security and computer security systems Approaches, Datasets, and Comparative Study”
are available, each includes IDS, Firewalls antivirus software [12] Ashish Sureka, Pankaj Jalote “Detecting Duplicate Bug Report Using
Character N-Gram-Based Features” (2010).
etc., which detects, fixes, and identifies unauthorized systems. [13] Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni
When it all comes to be a malfunction in a program, then it is Stroulia, and Russel Greiner “Detecting Duplicate Bug Reports with Software
known as software bug. Software bug cause vulnerabilities of Engineering Domain Knowledge” (2016).
software, which result in cyber-attacks. Thus, A conclusion that [14] Matilda Rhode, Pete Burnap, Kevin Jones “Early-stage malware prediction
using recurrent neural networks” (2018).
the identified bug handling is indeed as a costly and laborious [15] A. Sampathkumar, Jaison Mulerikkal, M. Sivaram “Glowworm swarm
task and the more automation of bug handling process occurs. optimization for effectual load balancing and routing strategies in wireless
sensor networks” (2020).
BASIC COMPONENTS: [16] Hung-Jen Liao, Chun-HungRichardLin, Ying-ChihLin, Kuang-YuanTung
“Intrusion detectionsystem:A comprehensive review” (2012).
[17] Shivkumar Shivaji, E. James Whitehead, Ram Akella, Sunghun Kim
1. The Introduction about cybersecurity bugging in “Reducing Features to Improve Code Change Based Bug Prediction”.
software and role of deep learning in cybersecurity [18] DHARMENDRA LAL GUPTA and KAVITA SAXENA “Software bug
application. prediction using object-oriented metrics” (2016).
[19] S. Delphine Immaculate, M. Farida Begam, M. Floramary “Software Bug
Prediction Using Supervised Machine Learning Algorithms” (2019).
2. Various existing deep learning and classification [20] By GUANJUN LIN , SHENG WEN, QING-LONG HAN, JUN ZHANG
techniques in cybersecurity. AND YANG XIANG “Software Vulnerability Detection Using Deep Neural
Networks: A Survey” (2020).
[21] Awni Hammouri, Mustafa Hammad, Mohammad Alnabhan, Fatima
3. The proposed model with feature extraction data Alsarayrah “Software Bug Prediction using Machine Learning Approach”
minimization and feature selection. (2018).
[22] Rahul Gupta, Aditya Kanade, Shirish Shevade, “Deep Learning for Bug-
Localization in Student Programs” (2019)
4. The Performance analysis is illustrated by means, [23] Jianjun He, Ling Xu, Meng Yan, Xin Xia, Yan Lei “Duplicate Bug Report
along with result as well as graph and this work Detection Using Dual-Channel Convolutional Neural Networks” (2020).
ends by presenting the future work. [24] Yang Shi, Ye Mao, Tiffany Barnes, Min Chi, Thomas W. Price “Exploring
How to Use Deep Learning Effectively through Semi-supervised Learning for
Automatic Bug Detection in Student Code” (2021)
CONCLUSION: [25] YI LI, SHAOHUA WANG, TIEN N. NGUYEN, SON VAN NGUYEN
“Improving Bug Detection via Context-Based Code Representation Learning
This paper concludes that the software bugs fixed and Attention-Based Neural Networks” (2019)
automatically using dataset from the cybersecurity applications. [26] Ding Yuxin, Zhu Siyi “Malware detection based on deep learning
algorithm” (2017)
This Paper consists of Improved Hidden Morkov model for [27] Qixue Xiao, Kang Li, Deyue Zhang, Weilin Xu “Security risks in Deep
fixing the bug. It comprises of Baum-Welch algorithm. learning Implementations” (2018).
[28] Jayati Deshmukh, Annervaz K M, Sanjay Podder, Shubhashis Sengupta,
Neville Dubash “Towards Accurate Duplicate Bug Retrieval
using Deep Learning Techniques” (2017).
REFERENCE [29] Priyanka Dixit , Sanjay Silakari “Deep Learning Algorithms for
Cybersecurity Applications: A Technological and Status Review” (2020).
[1] Mohammad Mehedi Hassan , Abdu Gumaei , Ahmed Alsanad , Majed
Alrubaian , Giancarlo Fortino , “A Hybrid Deep Learning Model for Efficient
Intrusion Detection in Big Data Environment, Information Sciences” (2019).
[2] Kutub Thakur, Meikang Qiu, Keke Gai, Md Liakat Ali “An Investigation on
Cyber Security Threats and Security Models” (2015).
[3] Geunseok Yang, Kyeongsic Min, Byungjeong Lee, “Applying Deep
Learning Algorithm to Automatic Bug Localization and Repair” (2020).
[4] Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne,
“Evaluating Computer Intrusion Detection Systems:A Survey of Common
Practices”, (2015).
[5] Suresh S Malladi, Hemang C Subramanian, “Bug Bounty Programs for
Cyber-Security: Practices, Issues and Recommendations” (2019).
[6] Tadas Limba, Tomas Plėta, Konstantin Agafonov, Martynas Damkus
“Cyber security management model for critical infrastructure” (2017).