Breaking The Windows Script Encoder by MR Brownstone
Breaking The Windows Script Encoder by MR Brownstone
The Windows Script Encoder (screnc.exe) is a Microsoft tool that can be used to encode
your scripts (i.e. JScript, ASP pages, VBScript). Yes: encode, not encrypt. The use of this
tool is to be able to prevent people from looking at, or modifying, your scripts. Microsoft
recommends using the Script Encoder to obfuscate your ASP pages, so in case your
server is compromised the hacker would be unable to find out how your ASP applications
work.
You can download the Windows Script Encoder at
http://www.microsoft.com/downloads/details.aspx?FamilyID=e7877f67-c447-4873-
b1b0-21f0626a6329&displaylang=en
The documentation already says the following:
Note that this encoding only prevents casual viewing of your code; it will not prevent the
determined hacker from seeing what you've done and how.
(By the way, because of this text, I did not deem it necessary to inform Microsoft of this
article).
Also, an encoded script is protected against tampering and modifications:
After encoding, if you change even one character in the encoded text, the integrity of the
entire script is lost and it can no longer be used.
So we can make the following observations:
• Anyone using this tool will be convinced that it's safe to hard-code all usernames,
passwords, and "secret" algorithms into their ASP-pages. And any "determined
hacker" will be able to get to them anyway.
Okay. So even Microsoft says this can be broken. Can't be difficult then. It wasn't.
Writing this article took me at least twice the time I needed for breaking it. But I think
this can be a very nice exercise for anyone who wants to learn more about analysing
codes like this, with known plaintext, known cihpertext, and unknown key and algorithm.
(Actually, a COM object that can do the encoding is shipped with IE 5.0, so reverse
engineering this will reveal the algorithm, but that's no fun, is it?)
So, how does this work?
The Script Encoder works in a very simple way. It takes two parameters: the filename of
the file containing the script, and the name of the output file, containing the encoded
script.
What part of the file will be encoded depends on the filename extension, as well as on the
presence of a so-called "encoding marker". This encoding marker allows you to exclude
part of your script from being encoded. This can be very handy for JavaScripts, because
the encoded scripts will only work on MSIE 5.0 or higher.... (of course this is not an issue
for ASP and VB scripts that run on a web server!).
Say, you've got this HTML page with a script you want to hide from prying eyes:
<HTML>
<HEAD>
<TITLE>Page with secret information</TITLE>
<SCRIPT LANGUAGE="JScript">
<!--//
//**Start Encode**
alert ("this code should be kept secret!!!!");
//-->
</SCRIPT>
</HEAD>
<BODY>
This page contains secret information.
</BODY>
</HTML>
<HTML>
<HEAD>
<TITLE>Page with secret information</TITLE>
<SCRIPT LANGUAGE="JScript">
<!--//
//**Start Encode**#@~^QwAAAA==@#@&P~,l^+DDPvEY4kdP1W[n,/tK;V9P4
~V+aY,/nm.nD"Z"eE#p@#@&&JOO@*@#@&qhAAAA==^#~@&
</SCRIPT>
</HEAD>
<BODY>
This page contains secret information.
</BODY>
</HTML>
As you can see, the <script language="..."> has been changed into "JScript.Encode". The
Script Encoder uses the Scripting.Encoder COM-object to do the actual encoding. The
decoding will be done by the script interpreter itself (so we cannot simply call a
Scripting.Decoder, because that doesn't exist).
Okay, let's play!
Plaintext Encoded
Hoi #@~^FQAAAA==@#@&CGb@#@&zz O@*@#@&WwIAAA==^#~@
Hai #@~^FQAAAA==@#@&CCb@#@&zz O@*@#@&TQIAAA==^#~@
#@~^IgAAAA==@#@&CCbCmk@#@&CmrCmk@#@&JzRR@*@#@&m
HaiHai HaiHai
gUAAA==^#~@
Cute. As you can see, @#@& appears to be a newline (@# = CR, @& = LF), and the
position of a character does (sometimes...) matter (the first time HaiHai becomes
CCbCmk and the second time it's CmrCmk).
Let's just encode a line with a lot of A's:
//**Start
Encode**#@~^lgAAAA==@#@&b)zbzbbzbz)bzb)bzb))zbbz)bzbbz))bzbzb)b))zb)bz)bz
b))zbb))zb)bz )
zb)zbzbbzbz)bzb)bzb))zbbz)bzbbz))bzbzb)b))zb)bz)bzb))zbb))zb)bz)zb)zb@#@&zJO
@*@#@&vyIAAA==^#~@
The algorithm
After staring at this for some time, I discovered that the red part was repeating (actually,
the entire string is repeating itself after 64 characters). Also, it seems to be that the
character 'A' has three different representations: b, z, and ). If you encode a string of B's
you'll see the same pattern, but with different characters.
This means the encoding will look something like this:
d7i P~, "Ze JEr a:[ ^yf ]Yu ['L BvE `cv #b* eMC _Q3 ~SB OR R c z&J !TZ Fq8 +y &f2
c*W *Xl v+ G{F %0R ,1O )l= iIp @!@!@! 'x{ @*@*@* g_Q @$@$@$ b)z A$~ Z/;
f9G 23A sow M!V Cu_ q(& 9Bx |Fn SJd H\t 1Hg r6} nKh p}5 I]" ?j U KP: ji` .#j q (po
5eI }t\ $,] -w' TDY 7?% {m| =|# lCm 48( m^1 N[9 +n 0W6 oLT t44 krb L%N 3V0
Vs^ :hs xU WGK w2a ;5$ D .M /dk YOD E;! \-7 hAS 6aX XzH y". `P uk- 8N) U=?
So what is this? It's the encoded representation of the ASCII characters 9, and 32 through
126. Every character has got three different representations, so this sums up to 3*(127-32
+ 1) = 288 characters.
You'll see that the < , > and @ characters are escaped too, resulting in the following
table:
Esc Org
@# \r
@& \n
@! <
@* >
@$ @
I've removed the @!, @* and @$ from the encoded text too and replaced them with
question marks, so the table will stay nice. This is what you get as a hex dump:
unsigned char encoding[288] = { 0x64,0x37,0x69, 0x50,0x7E,0x2C, 0x22,0x5A,0x65,
0x4A,0x45,0x72, 0x61,0x3A,0x5B, 0x5E,0x79,0x66, 0x5D,0x59,0x75,
0x5B,0x27,0x4C, 0x42,0x76,0x45, 0x60,0x63,0x76, 0x23,0x62,0x2A, 0x65,0x4D,0x43,
0x5F,0x51,0x33, 0x7E,0x53,0x42, 0x4F,0x52,0x20, 0x52,0x20,0x63, 0x7A,0x26,0x4A,
0x21,0x54,0x5A, 0x46,0x71,0x38, 0x20,0x2B,0x79, 0x26,0x66,0x32, 0x63,0x2A,0x57,
0x2A,0x58,0x6C, 0x76,0x7F,0x2B, 0x47,0x7B,0x46, 0x25,0x30,0x52, 0x2C,0x31,0x4F,
0x29,0x6C,0x3D, 0x69,0x49,0x70, 0x3F,0x3F,0x3F, 0x27,0x78,0x7B, 0x3F,0x3F,0x3F,
0x67,0x5F,0x51, 0x3F,0x3F,0x3F, 0x62,0x29,0x7A, 0x41,0x24,0x7E, 0x5A,0x2F,0x3B,
0x66,0x39,0x47, 0x32,0x33,0x41, 0x73,0x6F,0x77, 0x4D,0x21,0x56, 0x43,0x75,0x5F,
0x71,0x28,0x26, 0x39,0x42,0x78, 0x7C,0x46,0x6E, 0x53,0x4A,0x64, 0x48,0x5C,0x74,
0x31,0x48,0x67, 0x72,0x36,0x7D, 0x6E,0x4B,0x68, 0x70,0x7D,0x35, 0x49,0x5D,0x22,
0x3F,0x6A,0x55, 0x4B,0x50,0x3A, 0x6A,0x69,0x60, 0x2E,0x23,0x6A,
0x7F,0x09,0x71, 0x28,0x70,0x6F, 0x35,0x65,0x49, 0x7D,0x74,0x5C, 0x24,0x2C,0x5D,
0x2D,0x77,0x27, 0x54,0x44,0x59, 0x37,0x3F,0x25, 0x7B,0x6D,0x7C, 0x3D,0x7C,0x23,
0x6C,0x43,0x6D, 0x34,0x38,0x28, 0x6D,0x5E,0x31, 0x4E,0x5B,0x39,
0x2B,0x6E,0x7F, 0x30,0x57,0x36, 0x6F,0x4C,0x54, 0x74,0x34,0x34, 0x6B,0x72,0x62,
0x4C,0x25,0x4E, 0x33,0x56,0x30, 0x56,0x73,0x5E, 0x3A,0x68,0x73, 0x78,0x55,0x09,
0x57,0x47,0x4B, 0x77,0x32,0x61, 0x3B,0x35,0x24, 0x44,0x2E,0x4D, 0x2F,0x64,0x6B,
0x59,0x4F,0x44, 0x45,0x3B,0x21, 0x5C,0x2D,0x37, 0x68,0x41,0x53, 0x36,0x61,0x58,
0x58,0x7A,0x48, 0x79,0x22,0x2E, 0x09,0x60,0x50, 0x75,0x6B,0x2D, 0x38,0x4E,0x29,
0x55,0x3D,0x3F } ;
So, encoding character c at position i goes as follows:
• encoded character =
encoding[c*3 + pick_encoding[i%64]];
Because the table starts at 9 and then goes to 32, you'll have to do some corrections. But
we'll get to that later, as we are not really interested in encoding after all. We want to be
able to do some decoding!
The decoding tables
The pick_encoding table will stay the same. This is because each character (except for
the escaped ones, of course) will be in the same place as the original. Then, we could just
look up the encoded character in the table. For instance, an 'A' in encoded text (hex
0x41), occurs on these places in the 'encoding' table:
2 23 40 7E 5E 41 67 41 41-41 41 3D 3D #@^EgAAAA==
3 23 40 7E 5E 41 77 41 41-41 41 3D 3D #@^EwAAAA==
4 23 40 7E 5E 42 41 41 41-41 41 3D 3D #@^FAAAAA==
5 23 40 7E 5E 42 51 41 41-41 41 3D 3D #@^FQAAAA==
6 23 40 7E 5E 42 67 41 41-41 41 3D 3D #@^FgAAAA==
7 23 40 7E 5E 42 77 41 41-41 41 3D 3D #@^FwAAAA==
8 23 40 7E 5E 43 41 41 41-41 41 3D 3D #@^GAAAAA==
9 23 40 7E 5E 43 51 41 41-41 41 3D 3D #@^GQAAAA==
32 23 40 7E 5E 49 41 41 41-41 41 3D 3D #@^IAAAAA==
48 23 40 7E 5E 4D 41 41 41-41 41 3D 3D #@^MAAAAA==
80 23 40 7E 5E 55 41 41 41-41 41 3D 3D #@^UAAAAA==
96 23 40 7E 5E 59 41 41 41-41 41 3D 3D #@^YAAAAA==
The length seems to be encoded in the 5th to 10th byte, and 41 appears to be representing
zero. The first byte of the length seems to be increasing with one when the length
increases with 4. Also, the second byte alternates between 41, 51, 67, and 77.
If you look at length 166, this value is 0x70, where it should be 0x41 + (166/4) = 0x6a.
So something goes wrong, and it can be narrowed down to length 104, where it suddenly
jumps from 0x5a to 0x61. This puzzled me for a long time, until I realised that 0x5a = 'Z'
and 0x61 = 'a'. And yes, the length turns out to be Base64 encoded indeed :)
The checksum
At the end of the encoded data is apparently some kind of checksum. I did not look into
this any further.
The decoder program
The further working of the decoder program, which can be downloaded from the scrdec
home page, is left as an exercise to the reader. It's implemented as a "Turing-like" state
machine. The decoder will treat .js and .vbs files as fully encoded, while .htm(l) and .asp
files are seen as files that contain script amongst other things - like HTML code.
The decoder simply takes two arguments: input filename (encoded), and output filename
(decoded).
There is one thing lacking in the decoder: the value of the <SCRIPT LANGUAGE="...">
attribute, is not changed back into the original form. You'd better use a tool like sed for
that.
Conclusion
It's not just sad that Microsoft made a tool like this. They've probably asked Bill Gates'
little nephew to write this code. The really bad part is that Microsoft actually
recommends people to use this piece of crap, and because of that, people will rely on it,
even though the documentation hints that it's unsafe. (Nobody reads the docs anyway...)
Security by obscurity is a bad, bad idea. Instead of encouraging that approach, Microsoft
should educate programmers to find other ways to store their passwords and sensitive
data, and tell them that an algorithm or any other piece of code that needs to be 'hidden',
is just bad design.