Is Presentation
Is Presentation
The proliferation of Internet users, cloud services, and IoT devices has vastly
expanded the attack surface, leading to a surge in adversaries seeking to exploit
vulnerabilities. State actors' involvement in the cyber-security domain has
further escalated the frequency of attacks and malware proliferation. McAfee
Labs reported a record high of 57.3 million newly identified malware samples in
Q3 of 2017 . Traditional methods like manual code inspection and reverse
engineering are inadequate against the rapid evolution of adversarial
capabilities.
Deep learning (DL) offers a promising solution to the
limitations of traditional security methods. It's a
branch of machine learning inspired by biological
neural networks, capable of learning from data for
decision-making. DL's rise in the past five years is
Solution: attributed to increased data availability, algorithmic
advancements, and GPU-enabled computing power.
DL excels in tasks like image classification, object
detection, and voice recognition. Recent applications
extend DL to security tasks such as malware analysis,
intrusion detection, and botnet detection. DL models
can learn from malware patterns to enhance antivirus
software and intrusion detection systems, including
identifying previously unknown malware like 0-day
threats.
Artificial intelligence (AI) involves creating intelligent
machines, striving to replicate or surpass human
intelligence. Machine learning (ML), a subset of AI,
enables computers to learn and make decisions
Terminology: without explicit programming. ML encompasses
supervised learning, where models learn from
labeled data, and unsupervised learning, where
models uncover patterns from unlabeled data.
Continue...
Artificial neural networks (ANN) are models inspired by the human brain,
comprised of neurons organized in layers. Each neuron processes information
and computes outputs, adjusting its weights and biases during training to
minimize error.
Deep neural networks (DNN) contain multiple hidden layers, distinguishing them
from shallow neural networks. Deep learning (DL), a subset of machine learning,
leverages DNNs. Two prominent types of DNNs are Convolutional Neural
Networks (CNN) and Recurrent Neural Networks (RNN), widely researched and
utilized in various applications.
Malware Analysis:
Static malware analysis involves reverse engineering malware binaries to analyze instructions
without execution, but can be defeated by evasion techniques like obfuscation. Dynamic
malware analysis executes malware in a controlled environment to observe behavior, yet
sophisticated malware can evade detection by detecting sandbox environments.ML and DL offer
automated solutions to scalability issues, with DL showing promise in reducing false positive
rates.
DL-based techniques excel in classifying malware swiftly and accurately, surpassing human
analysts. They efficiently flag suspicious binaries for human verification and demonstrate
superior detection of new malware with similar characteristics to existing ones.
Continue...
Kephart et al. (1995) introduced biologically inspired anti-virus techniques, including an NN-based
virus detector and a computer immune system for automatic virus analysis, achieving low false
positive rates (FPR). However, their method was limited to a small dataset of boot viruses and
required malware execution for detection.
Dahl et al. (2013) addressed the challenge of high input feature dimensionality for malware
classification by proposing random projections to reduce complexity. Their approach achieved a
low two-class error rate but still had a high test-error rate for classifying specific malware types.
Saxe et al. (2015) proposed a DNN-based malware detection system achieving a 95% detection
rate at 0.1% FPR, leveraging byte entropy histograms and PE import address table features. Their
approach outperformed previous methods and was deployed in Invincea Labs' cloud security
analytics platform.
Continue...
Pascanu et al. (2018) utilize Echo state networks (ESNs) and RNNs to extract
malware features, achieving a true positive rate of 98.3% and a false positive
rate of 0.1%.
Huang et al. (2018) propose MtNet, a DL architecture for malware binary
classification and categorization of malware families, showing improvement
over shallow learning models with a binary malware error rate of 0.358% and a
family error rate of 2.94%.
Intrusion detection systems (IDSs) utilize misuse-
based techniques, focusing on specific attack
IV. INTRUSION patterns, and anomaly-based methods, detecting
deviations from normal network behavior. Misuse-
DETECTION: based approaches require frequent updates and
struggle with 0-day attacks, while anomaly-based
methods face high false positive rates and individual
training needs. DL techniques aid both types, with
neural networks learning from datasets to classify
instances as known attacks or normal behavior.
Continue...
Cannady et al. (Year) pioneered the use of NNs for misuse-based intrusion
detection, achieving a low error rate of 0.06% on simulated attack events.
However, their approach requires lengthy training/testing times, rendering it
impractical for real-time detection.
Palagiri (Year) proposed another NN-based approach for misuse-based intrusion
detection, utilizing a feedforward NN with backpropagation called "Meta
Neural." Despite achieving a prediction rate of 100%, their system exhibited a
high false-positive rate, making it unsuitable for real-world IDS deployments.
Continue...
Hodo et al. (Year) propose an offline Intrusion Detection System (IDS) tailored for IoT
networks, employing Neural Networks (NNs) to identify Denial of Service (DoS) and
Distributed Denial of Service (DDoS) attacks. Their approach utilizes Multi-Layer
Perceptron (MLP) architecture trained on Internet packet traces, achieving a high
accuracy of 99.4% in binary classification between attack traffic and normal traffic.
However, deployment of this system necessitates a dedicated server to run the NN
models.
Continue...
Tuor et al. (Year) introduce an online unsupervised Deep Learning (DL) methodology
designed for real-time anomaly detection in network activity using system logs.
Their technique involves training Deep Neural Networks (DNNs) and Recurrent
Neural Networks (RNNs) to discern anomalous user behaviors. Despite achieving a
notable average anomaly score in the 95.53 percentile, this approach lacks
consideration for individual user activity patterns and requires human intervention
for decision-making based on anomaly scores.
Continue...
Katherios et al. (Year) propose a real-time network anomaly detection system with
two learning stages. The first stage employs adaptive unsupervised anomaly detection
using a shallow autoencoder, while the second stage utilizes a custom nearest-
neighbor classifier to reduce false positives. Experiments on real-world data from the
NIIFI network in Hungary demonstrate high true positive rates (98.5%) and low false
positive rates (1.3%), reducing human intervention by 5x with detection latency within
a few seconds of attack onset.
Continue...
DL-based techniques offer significant enhancements to traditional IDS
deployments, but evaluations regarding latency and resource costs are lacking,
crucial for real-world applicability. Sommer et al. (Year) argue that intrusion
detection poses unique challenges for effective machine learning adoption due
to diverse network traffic and evaluation complexities, necessitating further
research to address these obstacles.