Open Source Tools: UNIT II: Malware and Vulnerability
Open Source Tools: UNIT II: Malware and Vulnerability
K-Means
Adarsh Kumar
Professor, Systems, School of Computer Science, UPES, Dehradun,
Uttarakhand, India
[email protected]
Table of Contents
Tree Structure
A sample decision tree could have the following structure:
Root Node: Check IP Reputation Score.
Left Branch (Low Reputation): Port Number == 80 ⇒ Benign
Right Branch (High Reputation): Protocol == Suspicious ⇒ Malicious
Classification Example
Input 1: IP Reputation = Low, Port Number = 80 ⇒ Classified as Benign
Input 2: IP Reputation = High, Protocol = Telnet ⇒ Classified as Malicious
Hyperplane
A hyperplane is a decision boundary in n-dimensional space.
In 2D, it is a line; in 3D, a plane.
The objective is to find a hyperplane that maximizes the margin between classes.
Support Vectors
Support vectors are data points closest to the hyperplane.
They are critical in defining the hyperplane’s position.
Removing non-support vectors does not affect the hyperplane.
Data Points:
Not Spam: (1, 2), (2, 3), (3, 3)
Spam: (5, 4), (6, 5), (5, 6)
Hyperplane: Separates Spam from Not Spam.
Support Vectors: Points closest to the hyperplane.
Genetic Algorithms
Initial Population
Fitness Evaluation
Fitness Function:
True Positives
Fitness =
True Positives + False Positives
Example Fitness Values:
Individual 1: 0.8
Individual 2: 0.6
Individual 3: 0.9
Individual 4: 0.4
New Population:
1110
1010
0111
0001
Structure:
Input layer:
Receives input data (features).
Each neuron corresponds to a feature of the input.
Hidden layers:
Intermediate layers that transform inputs into outputs.
The number of hidden layers and neurons can vary based on the model.
Output layer:
Produces the final output (predictions).
The structure depends on the task (e.g., regression or classification).
Structure:
Input layer:
Receives input data (features).
Each neuron corresponds to a feature of the input.
Hidden layers:
Intermediate layers that transform inputs into outputs.
The number of hidden layers and neurons can vary based on the model.
Output layer:
Produces the final output (predictions).
The structure depends on the task (e.g., regression or classification).
Key concepts:
Neurons:
Basic units of a neural network that receive inputs, process them, and produce an output.
Activation functions:
Functions that determine the output of neurons (e.g., sigmoid, ReLU, tanh).
Introduce non-linearity into the model, enabling it to learn complex patterns.
Input Features:
Email Content:
Textual data processed using techniques like TF-IDF or word
embeddings (e.g., Word2Vec, GloVe).
Example: An email body containing phrases like ”Congratulations!
You’ve won” may indicate spam.
Sender Information:
Analyzes the sender’s domain (e.g., @example.com) and frequency
of emails from that sender.
Example: If emails from the domain ”unknown-sender.com” are
frequent and marked as spam, it raises suspicion.
Subject Line:
Checks for the presence of specific keywords (e.g., ”free”, ”urgent”,
”limited time offer”).
Example: Subject ”Get your free trial now!” would likely contribute
to a spam classification.
Decision Making:
Threshold for Classification:
A threshold (e.g., 0.5) is set to classify the output from the
neural network.
If the predicted probability p > 0.5, classify as spam;
otherwise, classify as not spam.
Example: If the model outputs a probability of 0.7, the email
is classified as spam. Conversely, an output of 0.4 classifies it
as not spam.
Example Calculation
Given:
Learning Rate: 0.01
Batch Size: 32
Number of Layers: 3
Output accuracy: 85%
Loss calculation:
n
1X
MSE = (yi − yˆi )2
n
i=1
Example Calculation
Calculating MSE:
Assume true values y = [y1 , y2 , . . . , y100 ] and predicted values
ŷ = [yˆ1 , yˆ2 , . . . , y100
ˆ ].
Calculate the squared differences:
(yi − yˆi )2
Example Calculation
Calculating MSE:
Assume true values y and predicted values ŷ:
y = [3, −0.5, 2, 7, 1.5, 4, 5.5, 6, 2.5, 3.5, . . .] (100 values)
ŷ = [2.5, 0.0, 2, 8, 1.4, 4.5, 5.0, 5.5, 3.0, 3.8, . . .] (100 values)
Calculate the squared differences for each i:
2
(yi − yˆi )
For example:
2 2 2
(y1 − yˆ1 ) = (3 − 2.5) = (0.5) = 0.25
2 2 2
(y2 − yˆ2 ) = (−0.5 − 0.0) = (−0.5) = 0.25
2 2 2
(y3 − yˆ3 ) = (2 − 2) =0 =0
2 2 2
(y4 − yˆ4 ) = (7 − 8) = (−1) =1
Repeat this for all 100 values to get the squared differences:
Squared Differences = [0.25, 0.25, 0, 1, . . .]
Sum the squared differences:
100
2
X
Total = (yi − yˆi )
i=1
Suppose after summing all values, we get:
Total = 15
Compute the MSE:
Total 15
MSE = = = 0.15
100 100
Adarsh Kumar Adarsh[dot]Kumar[at]ddn[dot]upes[dot]ac[dot]in February 21, 2025 32 / 52
Dec.Tre. and Rand. Frst. Genetic Algorithms Neu. Net.for Sys. Tun. K-Means
Example Calculation
Interpretation:
The MSE value of 0.15 indicates the average squared
difference between the predicted and true values.
Lower MSE values suggest better model performance,
indicating that the predictions are closer to the actual
outcomes.
Given:
URL Length: 45
Special Characters: 5
Domain Age (in days): 30
Neural Network Output:
Probability of phishing: ŷ = 0.92
n
1X
BCE = − [yi log(yˆi ) + (1 − yi ) log(1 − yˆi )]
n i=1
Definitions:
n: Total number of samples.
yi : Actual label (1 for phishing, 0 for non-phishing).
yˆi : Predicted probability of phishing.
Assumptions:
Let’s assume we have a batch size of n = 2 with the following labels:
For the first sample: y1 = 1 (phishing)
For the second sample: y2 = 0 (not phishing)
Corresponding predicted probabilities:
yˆ1 = 0.92 (for phishing)
yˆ2 = 0.05 (for not phishing)
Calculating BCE:
Substitute values into the BCE formula:
n
1 X
BCE = − [yi log(yˆi ) + (1 − yi ) log(1 − yˆi )]
n i=1
1
BCE = − [1 · log(0.92) + 0 · log(0.05) + (1 − 1) · log(1 − 0.92) + (1 − 0) · log(1 − 0.05)]
2
1
BCE = − [log(0.92) + log(0.95)]
2
BCE ≈ 0.0693
The value BCE = 0.0693 represents the Binary Cross-Entropy (BCE) loss for the
given classification problem. BCE is a commonly used loss function for binary
classification tasks, and it measures how well the predicted probabilities match the
true labels.
In this case:
You have two samples:
First sample (actual label y1 = 1) with predicted probability yˆ1 = 0.92
(indicating it is phishing).
Second sample (actual label y2 = 0) with predicted probability yˆ2 = 0.05
(indicating it is not phishing).
The BCE formula is used to calculate how close the predicted probabilities are to the
true labels. A lower BCE value indicates a better fit between the model’s predictions
and the actual labels.
The final value of BCE = 0.0693 suggests that the model’s predictions for these two
samples have a small error, meaning the model is predicting fairly accurate
probabilities.
Hyperparameter Tuning
Hyperparameters in Cybersecurity
This total time indicates the efficiency of the training process, which is
critical when deploying models in real-time cybersecurity applications
where quick response times are essential.
Adarsh Kumar Adarsh[dot]Kumar[at]ddn[dot]upes[dot]ac[dot]in February 21, 2025 42 / 52
Dec.Tre. and Rand. Frst. Genetic Algorithms Neu. Net.for Sys. Tun. K-Means
Convergence Criteria
Malware samples are clustered based on their features (e.g., file size, code
structure, behavior).
K-Means can group different types of malware into clusters.
Centroids represent the average characteristics of malware within a cluster.
Malware with similar behaviors are grouped together.
Helps to detect new strains of malware by comparing them to known clusters.
Clusters can be analyzed to understand common attack vectors.
Useful for developing targeted detection and response strategies.