01_03_calculating-entropy-with-python.en

This video discusses the importance of entropy in traffic analysis for command and control data transmission, focusing on how to identify suitable fields for data storage. The fieldEntropy function is introduced, which calculates the entropy of data fields, specifically targeting string, bytes, and bytearray types while ensuring a minimum length for effective data transmission. The video demonstrates the calculation of entropy using both structured English text and random bytes, highlighting the differences in entropy values and their implications for data obfuscation.

Uploaded by

rasha.ziad.share

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views3 pages

01_03_calculating-entropy-with-python.en

Uploaded by

rasha.ziad.share

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

Hello and welcome

back to this course. In the previous video, we talked about performing some traffic
analysis
for command and control. Our goal there was to look at the various protocols,
types of traffic, packets and packet fields, and try to identify a good option for
command and control. We want something
that's rather common, has some space for storing
command and control data, and where the data that we're going to transmit
doesn't stick out. That third objective,
not sticking out is what we're going to
be talking about a little bit more in this video. In that video, we talked
briefly about entropy, which is a measurement of
the randomness in data or the amount of information a particular piece
of data can encode. For what we're doing, we want the ability to have some
relatively high
entropy values. Because if you have data
that's relatively same, or it all looks the same, things like English text, they
have relatively
low entropy. If you're used to having low entropy data like
English texts in a field, and suddenly you have
something that's got a bit more randomness to
it, like computer code, then that computer
code might stand out a little bit in
that particular field. As part of our traffic analysis, we looked at the entropy of
various fields using a
function called fieldEntropy. Today in this video, we're going to be
looking at what that fieldEntropy
function actually does. Let's take a look at it now. Our fieldEntropy function is
designed to calculate
the entropy of a field. In the previous video, we briefly mentioned that
we put some constraints on that field that we're going
to calculate entropy for. The first is that
we were looking for fields of certain types. When we're trying
to transmit data over command and
control infrastructure, we need some space to
actually send more than a single bit or a small
value in each packet. The less data we can
fit into each packet, the more packets
we have to send, and the more noticeable it is. One of our constraints
is that we want fields that store
certain types of data. Strings, bytes and
bytearrays are three different
Python data types that essentially come down
to here's a set of bytes. They might be
interpreted differently, but it says that, we might be able to store multiple bytes
of information
within a single field. This section of the
code here is testing if we have a field of
type string, bytes, or bytearray, and if so we want to convert
that to a bytearray. Because a bytearray in Python
essentially just a list of bytes as opposed to
something like a string, which is interpreted
as a string. With our list of
individual bytes, we can treat each byte as a independent occurrence and
calculate the entropy on it. Our other constraint
that we talked about is we wanted
a minimum length. If we can only store a single byte of
information in each packet, we're going to need
a lot of packets to achieve our goals. In this case, we've set our
minimum length to five, but we could use something
larger if we chose. If our length is greater
than the minimum length, then we're going to calculate the entropy of that
bytearray. To do so, we're going
to use a couple of Python libraries called
Pandas and SciPy. Pandas gives us the ability
to represent data as a series and then
count the number of occurrences of each
item in that series. We'll take our
bytearray of data, which might be a string, it might just be an
array of bytes, etc. We're going to store it
in a series data type. Then we can call
value counts to say, how many bytes with
value zero do we have? How many bytes with value
one do we have, etc. In the end we should have a array which says for
this particular value, we had this many occurrences. These counts aren't
quite probabilities, but they're pretty close. When we're calculating entropy, we
use observed probabilities, and we might compare those
two expected probabilities. For example, if you're
flipping a coin, the probability should
be 50 percent that it is heads and 50 percent that it is
tails for every coin flip. That's the ideal case. Some people can force a coin to
flip a little
bit more heads than tails, etc based off of few
different factors. But in theory, half heads, half tails. That's the theory. In
practice, if you
flip a coin 100 times, the odds aren't great, that you'd get exactly 50
of heads and 50 of tails. You might have a 51, 49 split
or something like that. Those are our observed
probabilities in this particular case. When we're talking about the entropy of our
packet fields, we don't have our
expected probabilities. We could say that
there's a one in 256 chance of each
byte occurrence, and that's probably true. But what we do have is
observed probabilities. We say, out of the 200
bytes in our byte array, five of them are the letter A. That observed probability
is
5/200 or 140th and that's our observed
probability of A versus our theoretical probability
of 1 out of 256. Based off of all of these
observed probabilities, we can calculate the entropy
or the amount of randomness or amount of
information that can be encoded in our byte array. Ideally, we should have observed
probabilities
pretty close to 1/256 for every single
potential byte value. That would give us a nice high entropy and
tell us that we can put any type of obfuscated
data in our field. However, if we have something
like English text or Spanish text or Latin text
or pick a language text, we have a much lower
entropy because there's a lot of structure to languages, so the amount of
randomness
in a string is much lower. If I say have the phrase
H-E-L-L-O W-O-R-L in English, you know what the next
letter is going to be. There's no randomness to it. Therefore, there's less
information encoded in that string than if we have something that's
completely random. To demonstrate this, I've
got a few lines here that we're going to use our
field entropy function to calculate the randomness of. We're going to look at
the string "Hello world." Then we're going to look
at an equal length string, 12 bytes, but they're
going to be random. We'll import from
the random library the rand bytes
function to do that. Same calculation in both cases. One of them's English texts,
one of them's completely
random bytes. If we run our entropy function
here, give it a moment, we see that the entropy
of our random bytes is higher than the entropy
of our English text. If we run this repeatedly, got lucky in this case, because
what I wanted
to demonstrate was, in most cases our entropies are going to be the same
here for both of these because hello
world doesn't change and this is the entropy of our 12 random bytes
where they're all unique. However, down here we get
a different entropy and the reason why is in
this random string, there are repeated
bytes somewhere. Here we go. We see
a xfc and a xfc. Because there's a
repetition there, our probabilities are
slightly off of the 1 out of 256 that we're expecting and our entropy goes
down a little bit. We can run this repeatedly as many
times as we want and we see here that our
entropies are the same in the first and the third, meaning that we
got one occurrence of each byte that we see, which is about what we'd expect. This
demonstrates just
as Entropy.py function, which is just designed
to determine if a particular field
meets some criteria, the right types of value stored
in it, and enough length. Then also to calculate the entropy of the
data in that field to see what we might be able
to store in it. Thank you.

Fs-8700-82 Carrier Datalink: Driver Manual
No ratings yet
Fs-8700-82 Carrier Datalink: Driver Manual
28 pages
4
No ratings yet
4
16 pages
comm. - itc
No ratings yet
comm. - itc
16 pages
subtitle (19)
No ratings yet
subtitle (19)
2 pages
04 Entropy Perplexity Notes
No ratings yet
04 Entropy Perplexity Notes
16 pages
Samarth Adatia MLSP Exp2
No ratings yet
Samarth Adatia MLSP Exp2
14 pages
Unit 5 Data Compression (1)
No ratings yet
Unit 5 Data Compression (1)
12 pages
01_04_detecting-encodings-with-python.en
No ratings yet
01_04_detecting-encodings-with-python.en
3 pages
Source code
No ratings yet
Source code
4 pages
Multimedia Data Compression and Storage Lab manual
No ratings yet
Multimedia Data Compression and Storage Lab manual
24 pages
DCshortCodes.ipynb - Colab
No ratings yet
DCshortCodes.ipynb - Colab
3 pages
Experiment-5
No ratings yet
Experiment-5
14 pages
Data compression
No ratings yet
Data compression
26 pages
Book-Chapter-07 (Lossless Compression Algorithms) Merged
No ratings yet
Book-Chapter-07 (Lossless Compression Algorithms) Merged
25 pages
Data Types, User Input and Control Flow Python Codes 13th Nov 2024
No ratings yet
Data Types, User Input and Control Flow Python Codes 13th Nov 2024
8 pages
Crypto Chapter 3 Entropy
No ratings yet
Crypto Chapter 3 Entropy
31 pages
Trần Quang Hiển 22070479
No ratings yet
Trần Quang Hiển 22070479
5 pages
Module 3.1 - Encryption
No ratings yet
Module 3.1 - Encryption
58 pages
Module 4
No ratings yet
Module 4
15 pages
UNIT-3 - Reading and Writing Console
No ratings yet
UNIT-3 - Reading and Writing Console
6 pages
Assignment 1 (Tomal 12009003)
No ratings yet
Assignment 1 (Tomal 12009003)
8 pages
Huffman Coding
No ratings yet
Huffman Coding
7 pages
ELEN0060-2 Information and Coding Theory: Université de Liège
No ratings yet
ELEN0060-2 Information and Coding Theory: Université de Liège
7 pages
chapter10_part1_Huffman(1)
No ratings yet
chapter10_part1_Huffman(1)
17 pages
L01
No ratings yet
L01
5 pages
CS8383 OOPS Lab Syllabus
0% (1)
CS8383 OOPS Lab Syllabus
2 pages
Orth Outh Niversity: Department of Electrical and Computer Engineering
No ratings yet
Orth Outh Niversity: Department of Electrical and Computer Engineering
3 pages
Real-Time Segregation of Encrypted Data Using Entropy
No ratings yet
Real-Time Segregation of Encrypted Data Using Entropy
10 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
ADC_EXPT_2_078_mane_B1
No ratings yet
ADC_EXPT_2_078_mane_B1
10 pages
Lec 31
No ratings yet
Lec 31
11 pages
Lab # 4 String Class: Objective
No ratings yet
Lab # 4 String Class: Objective
11 pages
Design and Implementation of Text To Speech Application For Vision Impaired Students
80% (5)
Design and Implementation of Text To Speech Application For Vision Impaired Students
83 pages
Entropy 3
No ratings yet
Entropy 3
10 pages
Assembly Chapter3 PDF
No ratings yet
Assembly Chapter3 PDF
7 pages
CSC 202 Exams 2019-2020 Final - Edited
No ratings yet
CSC 202 Exams 2019-2020 Final - Edited
2 pages
chap2
No ratings yet
chap2
47 pages
ITDS Fifth Assignment
No ratings yet
ITDS Fifth Assignment
21 pages
Ap Computer Science A 2014 Java Quick Reference PDF
No ratings yet
Ap Computer Science A 2014 Java Quick Reference PDF
1 page
Dan Boneh Notes
No ratings yet
Dan Boneh Notes
58 pages
01 Python Basics
No ratings yet
01 Python Basics
33 pages
Internet of Things Encryption James Ballance Developing The Industrial Internet of Things I 3/11/2019
No ratings yet
Internet of Things Encryption James Ballance Developing The Industrial Internet of Things I 3/11/2019
8 pages
Does The Position of The Salt Improve Its Effectiveness When Hashing?
No ratings yet
Does The Position of The Salt Improve Its Effectiveness When Hashing?
2 pages
Reasoning About Uncertainty Entropy
No ratings yet
Reasoning About Uncertainty Entropy
4 pages
Lecture 2-Print
No ratings yet
Lecture 2-Print
19 pages
Binary Coding Techniques Group 13
No ratings yet
Binary Coding Techniques Group 13
8 pages
Advanced Network Security: - Lecture# 4-1 - By: - Syed Irfan Ullah - Abasyn University Peshawar
No ratings yet
Advanced Network Security: - Lecture# 4-1 - By: - Syed Irfan Ullah - Abasyn University Peshawar
54 pages
Source Coding
No ratings yet
Source Coding
29 pages
BibWord Guide
No ratings yet
BibWord Guide
31 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Cryptography Additional Study Guide
No ratings yet
Cryptography Additional Study Guide
37 pages
Lab 1 Random Number Generator: 2 Cryptool
No ratings yet
Lab 1 Random Number Generator: 2 Cryptool
6 pages
3F7 - FTR - Improving Arithmetic Codes
No ratings yet
3F7 - FTR - Improving Arithmetic Codes
12 pages
The Ring Programming Language Version 1.4 Book - Part 8 of 30
No ratings yet
The Ring Programming Language Version 1.4 Book - Part 8 of 30
30 pages
Computational Physics I: (Pseudo) Randomness
No ratings yet
Computational Physics I: (Pseudo) Randomness
16 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
QBasic Manual
0% (1)
QBasic Manual
34 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
SCA Guide 19.1.0
No ratings yet
SCA Guide 19.1.0
210 pages
Manual
No ratings yet
Manual
28 pages
cp467_12_lecture14_compression1
No ratings yet
cp467_12_lecture14_compression1
146 pages
18EC54-Feb 2022 (Solved)
No ratings yet
18EC54-Feb 2022 (Solved)
23 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Alm Co-1 PDF
No ratings yet
Alm Co-1 PDF
9 pages
Java Book
No ratings yet
Java Book
432 pages
PDF Attached To An Email
No ratings yet
PDF Attached To An Email
9 pages
Nandeck Manual
No ratings yet
Nandeck Manual
106 pages
Motivation: Introduction To Theoretical Computer Science Finite Automata
No ratings yet
Motivation: Introduction To Theoretical Computer Science Finite Automata
83 pages
Trusting Social Data Analyst Screening Test: July 2019
No ratings yet
Trusting Social Data Analyst Screening Test: July 2019
7 pages
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
No ratings yet
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
93 pages
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
From Everand
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
Vere salazar
No ratings yet
Itc Notes
No ratings yet
Itc Notes
20 pages
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
No ratings yet
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
76 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Lecture Notes On Cryptography by Shafi Goldwasser, Mihir Bellare
100% (8)
Lecture Notes On Cryptography by Shafi Goldwasser, Mihir Bellare
283 pages
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
From Everand
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
Vere salazar
No ratings yet
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
Python Programming For Beginners 2 Books in 1 B0B7QPFY8K
100% (1)
Python Programming For Beginners 2 Books in 1 B0B7QPFY8K
243 pages
CDAC - Common Admission Test Syllabus Section A - English (20 Questions)
No ratings yet
CDAC - Common Admission Test Syllabus Section A - English (20 Questions)
5 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Informatica Powercenter 8.6: Basics Training Course
No ratings yet
Informatica Powercenter 8.6: Basics Training Course
197 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Useful Python
From Everand
Useful Python
Stuart Langridge
No ratings yet
Odbcau10 User Guide
No ratings yet
Odbcau10 User Guide
494 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
TCL Tutorial
50% (2)
TCL Tutorial
86 pages
Information Theory PDF
No ratings yet
Information Theory PDF
26 pages
Aaa2 PDF
No ratings yet
Aaa2 PDF
561 pages
Wipro Training Assignment Day 3
No ratings yet
Wipro Training Assignment Day 3
11 pages
Information Theory: A Concise Introduction
From Everand
Information Theory: A Concise Introduction
Stefan Hollos
No ratings yet

01_03_calculating-entropy-with-python.en

Uploaded by

01_03_calculating-entropy-with-python.en

Uploaded by

Hello and welcome

You might also like