0% found this document useful (0 votes)
483 views

Funky File Formats

This document summarizes Ange Albertini's 2014 talk on "Funky File Formats". The talk discusses how file formats can be manipulated in unconventional ways, such as encrypting one file format and having it decrypt to a different valid format, or combining multiple file formats into a single "polyglot" file. Specific examples are given of files that behave unexpectedly, such as encrypting a JPEG and decrypting a PDF. The talk aims to explore the flexible and sometimes ambiguous boundaries between different file formats.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
483 views

Funky File Formats

This document summarizes Ange Albertini's 2014 talk on "Funky File Formats". The talk discusses how file formats can be manipulated in unconventional ways, such as encrypting one file format and having it decrypt to a different valid format, or combining multiple file formats into a single "polyglot" file. Specific examples are given of files that behave unexpectedly, such as encrypting a JPEG and decrypting a PDF. The talk aims to explore the flexible and sometimes ambiguous boundaries between different file formats.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

Funky file Formats

Ange Albertini
2014/12 - 31C3

Funky
File

Ange Albertini
reverse engineering &
visual documentations

@angealbertini
[email protected]
http://www.corkami.com

So, this talk is about files what are the usual files categories?

It depends if youre a newbie, a user, a dev, a hacker...

...but in general, valid files arent very sexy!

However, the frontier between valid and corrupted is not straight and clear !

Here is a valid file


f76f5dafdcf0818c457e6ffb50ea61a67196dcd4 *ccc.jpg

(ok, maybe not a standard file)

This is a JPEG picture...

...thats also a Java file.

AES(
If you encrypt it with AES...

you get a PNG picture.

3DES(
If you decrypt it with Triple DES...

...you get a PDF document.

AESK (
2

If you encrypt the original file with AES again, but with a different key...

...you get a Flash Video


..that oh well, nevermind, I could go on for hours...

JPG

AESK

PNG

JAR
(ZIP + CLASS)

AESK

3DES
FLV
PDF

So, as you can see, Im just a normal guy (who likes to play with binary).

I also like to explain binary pics.corkami.com / prints.corkami.com

Lets talk about...

Identification
How do you identify a cow?

By its head?

By its body?

By sound?

in practice...

early filetype
identifier

Obvious
PE\0\0 \x7FELF BPG\xFB
\x89PNG\x0D\x0A\x1A\x0A
dex\n035\0 RAR\x1a\7\0 BZ
GIF89a BM RIFF
Not obvious
GZip 1F 8B
JPG
FF D8
Not obvious, but l33tsp34k ^_^
CAFEBABE Java / universal (old) Mach-O
DOCF11E0 Office
FEEDFACE Mach-O
FEEDFACF Mach-O (64b)

Egocentric
MZ (DOS header)
PK\3\4 (ZIP)
BPG\xFB

Mark Zbikowski
Philip Katz
Fabrice Bellard

Specific logic
TIFF:
II Intel (little) endianness
MM Motorola (big) endianness
Flash:
FWS ShockWave Flash (Flat)
CWS (zlib) compressed
ZWS LZMA compressed

Magic signatures, enforced at offset 0

not enforcing signature at offset 0: ZIP, 7z, RAR, HTML


actually enforcing signature at offset 0: bzip2, GZip

File formats not enforcing signature at offset 0


(ZIP is used in many formats: APK, ODT, DOCX, JAR)

ZIP actually enforces finishing near the end of the file.

TAR: Tape Archive


Disk images: ISO, Master Boot Record
TGA (image)
(Console) roms

Hardware-bound formats: code/data at offset 0


header often (optionally) later in the memory space

a good magic signature:


enforced at offset 0
unique
no magic no excuse

Standard tool: checks magic,


chooses path, never returns...

Another common
yet important property
(useful for abuses)

Its a complete cow (you can see its whole body), with something next:
appending something doesnt invalidate the start.

Remember:
theres nothing to parse
after the terminator.

PE
PDF

HTML

formats not enforced at offset 0


+ tolerating appended data
= polyglots

by concatenation
ZIP

a JAR(JAR) || BINK polyglot


JAR = ZIP(CLASS)

host/parasite polyglots

If a cow keeps a frog in its mouth, it can also speak 2 languages!


(the outer leaves space for an inner)

Ok, I know here is a more realistic analogy...

...if our cow swallows a microSD, its still a valid cow!


Even if it contains foreign data, that is tolerated by the system.

2 infection chains in one file:

the PDF part is stored in a Java buffer

a JavaScript || GIF polyglot (useful for pwning - also in BMP flavor)

Such parasites exist already in the wild


(they just use unallocated space)

PoC||GTFO 0x2: MBR || PDF || ZIP

by Travis Goodspeed

PoC||GTFO 0x3: JPG || AFSK || AES(PNG) || PDF || ZIP

PoC||GTFO 0x4: TrueCrypt || PDF || ZIP

by Alex Infhr

PoC||GTFO 0x5: Flash || ISO || PDF || ZIP

PoC||GTFO 0x6: TAR || PDF || ZIP

$ tar -tvf
-rw-r--r--rw-r--r--rw-r--r--

pocorgtfo06.pdf
Manul/Laphroaig
0 2014-10-06 21:33 %PDF-1.5
Manul/Laphroaig 525849 2014-10-06 21:33 1.png
Manul/Laphroaig 273658 2014-10-06 21:33 2.bmp

$ unzip -l pocorgtfo06.pdf
Archive: pocorgtfo06.pdf
warning [pocorgtfo06.pdf]: 10672929 extra bytes at...
(attempting to process anyway)
Length
Date
Time
Name
--------- ---------- -------4095 11/24/2014 23:44
64k.txt
818941 08/18/2014 23:28
acsac13_zaddach.pdf
4564 10/05/2014 00:06
burn.txt
342232 11/24/2014 23:44
davinci.tgz.dvs
3785 11/24/2014 23:44
davinci.txt
5111 09/28/2014 21:05
declare.txt
0 08/23/2014 19:21
ecb2/

unicode //

a Java || JavaScript polyglot (at source level)

a Java || JavaScript polyglot (at binary level)

Java = JavaScript
Yes, your management was right all along ;)

Extreme files bypass filters

Farmer got denied permit to build a horse shelter.


So he builds a giant table & chairs which dont need a permit.

a mini PDF (Adobe-only, 36 bytes) skipped by scanners yet valid !

a 64K sections PE (all executed) crashes many softwares, evades scanning

Parsing

This is a how a user sees a cow.

This is how a dev sees a cow

This is how another dev sees a cow !


(this one: brazilian beef cut - previous: french beef cut)

Same data, different parsers


it would have been too easy ;)

commented line

missing trailer keyword

a schizophrenic PDF: 3 different trailers, seen by 3 different readers

a schizophrenic PDF (screen printer)

PDF viewer

PDF slides

a (generated) PDF || PE || JAR [JAVA+ZIP] || HTML polyglot...

...which is also a schizophrenic PDF

$ du -h stringme
141
stringme
$ strings stringme
Segmentation fault (core dumped)
Extra problem: parsers can be present in unexpected places
http://lcamtuf.blogspot.de/2014/10/psa-dont-run-strings-on-untrusted-files.html (CVE-2014-8485)

metadata
Whos the owner?

A hidden cow just looks like another cow...

so cattle is branded.

But brandings can be faked!


or patched into another symbol
attribution is hard

and in a pure PoC||GTFO fashion,


@munin forged a branding iron !

an encrypted file is not always encrypted


encrypt(file) is not always random
encrypt(file) can be valid

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D

?
.T.E.X.T0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
We want to encrypt a DATA file to a TEXT file.
DATA tolerates appended data after its END marker
TEXT accepts /* */ comments chunk (think parasite in a host)

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D

<random>

if we encrypt, we get random result. we cant control AES output & input together.

AES works with blocks


File encryption applies AES via a mode of operation

Electronic Code Book:

penguin = bad

choose the IV to control


both first blocks (P1 & C1)

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D

+IV1

.T.E.X.T <something we control>


<random rest>

Encrypt with pure AES, then determine IV to control the output block

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D

+IV2

.T.E.X.T./.*
<ignored random rest>

We cant control the rest of the garbage so lets put a comment start in the first block

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D

.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
If we close the comment and append the target files data in the encrypted file.
then this file is valid and equivalent to our initial target.

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D
<pre-decrypted ignored random>

+IV2

.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
...then we decrypt that file: we get the original source file,
with some random data, that will be ignored since its appended data.

.D.A.T.A.[.1.2.3.4.5.6.7.8.9.A.B
.C.D.E.F.].E.N.D
<pre-decrypted ignored random>

+IV2

.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
Since AES CBC only depends on previous blocks,
this DATA file will indeed encrypt to a TEXT file.

AngeCryption PoC layout

00:
10:
20:
30:

4441
4344
f6fe
2e8e

5441
4546
17cf
6996

5b31
5d45
0802
5854

3233
4e44
7449
824c

3435
0000
58de
c09c

3637
0000
cdf2
1b7d

3839
0000
f9c4
4898

4142
0000
45ce
a29e

DATA[123456789AB
CDEF]END........
......tIX.....E.
..i.XT.L...}H...

openssl enc -aes-128-cbc -nopad


-K `echo OurEncryptionKey|xxd -p`
-iv A37A69F13417F5AB3CC4A1546B97FD76

00:
10:
20:
30:

5445
3f81
2a2f
740a

5854
11a9
0a74
454e

2f2a
2540
6869
4400

0000
ded5
7320
0000

0000
096a
6973
0000

0000
83c9
2061
0000

0000
f191
2074
0000

0000
d8bb
6578
0000

TEXT/*..........
?...%@...j......
*/.this is a tex
t.END...........

You can even try it at home :)

Chimera
(if you skip identified bodies, youll miss other files)

a JPEG || ZIP || PDF Chimera

image data

a chimera defeats sequential parsing with optimization

a Picture of Cat
(BMP ! uncompressed ! OMG)

BMP let us define bit masks for each color:


32 bits: 0000000000000000rrrrrggggggbbbbb (no alpha)
16 bits of free space!

lets play the picture!


no, seriously :)

Consider the BMP


as RAW 32b PCM

1. store sound in the lower 16 bits:


sound ignored by BMP
image data too low to be audible
2. store a picture encoded as sound
viewable as spectrogram
http://wiki.yobi.be/wiki/BMP_PCM_polyglot

an RGB BMP || raw (3-channel spectrogram) polyglot by @doegox

Cerbero
same type of heads, one body

an RGB picture...
RGB picture data = bytes triplets for R, G, B colors

...with an unused palette


palette picture data = each byte is an index in the palette

in theory, it could be used:

How to make a pic-ception


adjust each RGB value to the closest palette index
store a second picture with the same data.
(original idea by @reversity)

We get another picture of


the same type from the
same data!
BTW, thats a barcode inception:
a DataMatrix barcode inside a QRCode, both valid
https://www.iseclab.org/people/atrox/qrinception.pdf

Hash collisions

This is the actual SHA-1 with only 4 of its 5 constants modified


This doesnt give a collision in the actual SHA-1

2 colliding blocks: mostly random and unpredictable


At most three consecutive bytes without a difference.
Typically, in every dword, only the middle two bytes have no differences.

Abusing JPEGs multiple unused APPx (FF Ex) markers

Much better! (images chosen at random)

a polyglot collision (multiple use for a single backdoor)

Pwnie award for the best song! err what is it pwning exactly ?

Even songs should also have a nice PoC


(never forget to load your PDFs in your favorite NES emulator)

Do you remember this ?

A Super NES & Megadrive rom


(and PDF at the same time)

Conclusion

Anges recipes :)
Never forget to:
open your PDFs in a hex editor
open your pictures in a sound player
run your documents in a console emulator
encrypt/decrypt with any cipher
double-check what you printed

Security advice:

DONT *
Its easy to blame others - new insecure paths appear everyday

Research advice:

DO *
PoC||GTFO ! stop the marketing! cheap blamers blatant marketers?

F.F.F. conclusion
many abuses of the specs
specs often are wrong or misleading

few parsers, even fewer dissectors


standard tools evolve the wrong way
try to repair corrupted file outside the specs
standard and recovery mode

For technical details, check my previous talks.

ACK
@doegox @pdfkungfoo @veorq @reversity
@travisgoodspeed @sergeybratus qkumba
@internot @gynvael @munin
@solardiz @0xabadidea @ashutoshmehra
lytron @JacobTorrey @thicenl
and anybody who gave me feedback!

Bonus
after the talk, we tried some PoCs on professional
(very expensive!) forensic softwares:
polyglot files
a single file format found + no warning whatsoever

schizophrenic files:
no warning yet different tabs of the same software showing
different content :D

BIG FAIL - yet we trust them for court cases ?

**
*this is a valid..
**
Albertini
...TAR & Adobe PDF:
PoC or
____ _____ _____ ___
_
/ ___|_
_| ___/ _ \ | |
| | _ | | | |_ | | | ||_|
| |_| | | | | _|| |_| | _
\____| |_| |_|
\___/ |_|
%PDF-1.
trailer<</Root<</Pages<<>>>>>>

The initial abstract of this talk:


ASCII-only, PDF/TAR polyglot

Solar Designer made a great keynote - thats actually a real game to play!
But one have to load and play through the game - not so accessible!
http://openwall.com/presentations/ZeroNights2014-Is-Infosec-A-Game/

a PDF:
containing the game as ZIP
hand-written
with walkthroughs screenshots
(in original resolution)
a lightweight title
while maintaining compatibility
a good way to distribute as a single file!

$ unzip -t ZeroNights2014-Is-Infosec-A-Game.pdf
Archive: ZeroNights2014-Is-Infosec-A-Game.pdf
warning [ZeroNights2014-Is-Infosec-A-Game.pdf]:
(attempting to process anyway)
testing: ZN14GAME/
OK
testing: ZN14GAME/COMMON/
OK
...

6381506 extra bytes

Quine
prints its own source

a PE quine (in assembler, no linker)

Most quines arent very sexy


Using a compiler is cheap :p

Quine Relay
A prints Bs source
B prints As source

a PE ELF quine relay


(no linker)

a 50-languages quine relay


https://github.com/mame/quine-relay

other AngeCryption PoCs (PDF, PNG, JPG)

A bit of everything

@angealbertini
corkami.com
Damn, that's the second time those alien bastards shot up my ride!

You might also like