0% found this document useful (0 votes)
3 views

01_04_detecting-encodings-with-python.en

This video discusses encoding schemes, particularly URL encoding and Base64 encoding, which are used to transmit data that doesn't conform to specific protocol rules. It explains how these encoding methods can be utilized for obfuscation in cybersecurity, making it harder for unauthorized users to identify sensitive information in network traffic. The video also introduces a helper function to check if data is likely encoded, demonstrating the process with examples and potential pitfalls in identifying encoded data.

Uploaded by

rasha.ziad.share
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

01_04_detecting-encodings-with-python.en

This video discusses encoding schemes, particularly URL encoding and Base64 encoding, which are used to transmit data that doesn't conform to specific protocol rules. It explains how these encoding methods can be utilized for obfuscation in cybersecurity, making it harder for unauthorized users to identify sensitive information in network traffic. The video also introduces a helper function to check if data is likely encoded, demonstrating the process with examples and potential pitfalls in identifying encoded data.

Uploaded by

rasha.ziad.share
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Hello and welcome back to this course.

In the past few videos, we've been talking


about identifying
a good network protocol and fields within those packets for
command and control. And the first of the three videos, we talked about the code
that
we use to accomplish this. In the previous video, we talked
about entropy, one of our measures of suitability, and now, in this video, we're
going to talk about encoding schemes. So encoding schemes were originally
designed to allow data that doesn't follow the rules of a particular protocol
to be transmitted over that protocol. So this could mean that in some cases, we
have protocols that can
only carry principal data. And so,
if you have unprincipled characters, if you want to send them over
that particular protocol, you need to convert the unprincipled
characters to principal ones. And another case is where you have
protocols that have reserved or special characters. So for example, in a URL, a
question
mark is a reserved character, and so if you want to use a question
mark somewhere in the URL and don't want it interpreted,
is that reserved character? You need to encode it, in a moment,
we'll talk about URL encoding or percent encoding which is
designed to do exactly that. And so, these are the original
purposes for various encoding schemes. However, they are also commonly applied,
especially in offensive cybersecurity for
obfuscation. So for example, if you're sending
a username and a password or other sensitive data over the network,
then it's easy for anyone to monitor that network traffic. And identify, okay,
if I do a keyword search for username or password, I found the packet
that I want and see that data. However, if that username or
password is encoded, then that keyword search won't match
unless you know to reverse the encoding. And so,
we're talking about encoding schemes here because if we're going to use a network
protocol for command and control. And put our data in a particular field, we might
want to have
the option to encode that data. And if so,
it would be useful if we choose a field where encoded data is
not unusual if possible. And so, in this video, we're going to
talk about two encoding schemes, URL encoding and Base64 encoding. And so, our main
function here or
the helper function is called check encoding, so
we'll give it some data, and they will tell us whether or
not that data is likely to be encoded. And so, our first test is if
the length of the day to zero, then return false because zero
length data can be successfully decoded by any scheme, so
it would be confusing. If we have a non-zero length data,
we're going to check for URL encoding and Base64 encoding. If we find that it
matches our rules for
those, then we'll return either URL or
Base64 respectively. And that will go back to our traffic
analyzer script we looked at a couple of videos ago, which includes that
information that's output as we saw. So let's talk about URL encoding first. So
with our URL encoding, we're going to focus on things
that are completely encoded. So often in the URL, the only characters
that are encoded are the ones that break the rules,
the ones that are reserved characters. So you might have something
that's mostly principle, and then the occasional encoded character. And so, we
certainly could use
that approach for command and control by randomly encoding
characters to break up text matching, and we could easily modify this code
to look for those opportunities. However, in this case,
we're just going to look for something that's completely URL encoding. And so, URL
encoding gets other name,
percent encoding from how it encodes data. So each character that's encoded in
the string is written as a percent sign followed by the hexadecimal representation
of the corresponding asking character. So for example,
a space which has an x value of 20 would be represented as percentage
to zero in percent encoded. And so, for
our check URL encoding function here, we're going to look for
things that match a rule that says it should be a percent followed
by two hexadecimal digits, followed by potentially more of
the same that percent x has. And we're going to use python's ARI
library to do that because it lets us match the string using
regular expressions, and here is our regular expression
that we'll be using here. So starting in the middle here,
let's take a look, so we've got our percent sign that we
want to match, and then we have this section in square brackets, so
square brackets mean any of these. And so, this particular section
says if it is a number 0-9 or capital A through F or lowercase
A through F, then match that character. Because those are the allowable values for
hex values, and then we also after that have
this two in curly braces, and so what this means is match exactly
two of whatever's previous. So we have a percent sign,
something that matches a hex character, and we want to of those which would
match something like percent to zero, which is our URL encoding for a space. And
so, all of this is wrapped
up in a set of parentheses, saying treat this all as one unit, so we only want to
match if we see percent,
our hex, hex, percent hex, hex. And then, we want one or more of them, so if we
can't match at least one,
we want to return false. And so, then, we pass in our data and if
the entire string of data that we pass and matches this, so it's percent hex,
hex, percent hex, hex etcetera. Then, we return true saying, yes,
it is URL all encoded, otherwise, you return false saying, well,
it doesn't match our rules. So it's entirely possible that it
is a field that uses URL encoding, but only some characters in URL encoding,
the ones that are reserved. And because we're using full match for
this, we won't match, but we could modify this to allow
partial URL encoding if we chose. The other and more difficult one that
we want to test for is Base64 encoding. So Base64 encoding gets its
name from the fact that it uses 64 characters as an alphabet for
it's encoding data. So those are alphanumeric characters,
so capital A to Z, lowercase A to Z, 0-9, and
then a couple of special characters. And so, if you add that up,
number of letters, double that, add 10 for 0-9, and
then add 2, you get 64. And so, the simple way to test for Base64 encoding is to
try to decode it and
see if it fails, so python has a Base64 library from
which we can import Base64 decode. And so, if we do B64 decoded data,
and it decodes to a plain text, we'll return true,
meaning that it could be Base64 encoded. If something goes wrong,
that means that it wasn't a valid Base64 encoding,
and so we'll return false. And so, as we're going to see
in our main function when we run this in a moment,
this is a bit of a shaky way of testing. And the reason why is we
don't know the data that's stored within our Base64 encoding data. So all we're
testing for
is does it decode to something in Base64, which just essentially means that
it's a multiple four characters. And it's limited to those 64 character
alphabet that I just mentioned, or it ends with one or two equal signs,
which are used for padding in Base64. And so, down here in our main function, we
have three messages that we're
going to check our encoding for. So we'll use Hello World,
that's actually Base64 encoded, we'll use URL encoded string, so
see the percent hex, hex et cetera. And then, we'll use the strength FFFF,
so eight apps, and so for each of these, we'll call check
encoding, and we'll print out the results. So now I'll call python CheckEncoding.py
hit Enter, and we see our three results. So Hello World,
B64 encodes to this string here, and on testing that, it determines, yes, it does
successfully decode to something. When we use our regular expression
to test for URL encoding, this matches because we
have our percent sign, two characters that are valid hex
characters percent to valid hex, etcetera. And so, those are both good because
they mean that in the correct case or the positive case,
we successfully identify something, that's Base64 encoded and
something that's URL encoded. However, we also get some false positives,
so FFF, essentially F eight times, is technically
a valid Base64 encoded string. However, it's not a particularly
useful one if you decode it to the plain text because it's just
the same thing continuously, and so this probably wasn't actually
intended to be Base64 encoded. It's probably padding or
something else, however, we match it as a valid Base64 encoded
string because it is decodable. And so, without knowledge about
the plain text that goes into it and what's considered a valid plain text, which we
don't necessarily have
when we're analyzing packet fields. Then, we can't be 100%
certain if our result for decoding actually means that this
field carries encoding data or if it's just that it
happens to be decodable. But identifying things that say
the majority of them are decodable indicates that we might have
a field where it actually is encoded which would be useful for
command and control. And so, again, this is just one of the two
helper functions that we are looking at in relation to the traffic analyzer
script from a couple videos ago. For identifying fields and
network packets that might be useful for command and control infrastructure. Thank
you.

You might also like