Phrack Issue 68 #1
Phrack Issue 68 #1
==
|=-----------------------------------------------------------------------=|
|=-------------=[ The Art of Exploitation ]=-----------------=|
|=-----------------------------------------------------------------------=|
|=-------------------------=[ Exploiting VLC ]=---------------------------|
|=------------=[ A case study on jemalloc heap overflows ]=--------------=|
|=-----------------------------------------------------------------------=|
|=------------------------=[ huku | argp ]=------------------------=|
|=--------------------=[ {huku,argp}@grhack.net ]=---------------------=|
|=-----------------------------------------------------------------------=|
1 - Introduction
1.1 - Assumptions
2 - Notes on jemalloc magazines
2.1 - Your heap reversed
2.2 - Your reversed heap reversed again
2.3 - Sum up of jemalloc magazine facts
3 - 'MP4_ReadBox_skcr()' heap overflow vulnerability
3.1 - MP4 file format structure
3.2 - Vulnerability details
3.3 - Old is gold; 'unlink()' style ftw
3.4 - Controlling 'p_root' data
3.5 - MP4 exploitation sum up
4 - Real Media 'DemuxAudioSipr()' heap overflow vulnerability
4.1 - VLC as a transcoder
4.2 - RMF? What's that?
4.3 - Vulnerability details
4.4 - 'p_blocks' all over the place
4.5 - RMF summary
5 - Building a reliable exploit
5.1 - Overall process
5.2 - Detecting 'p_root' address candidates
6 - Demonstration
7 - Limitations
8 - Final words
9 - References
10 - T3h l337 c0d3z
--[ 1 - Introduction
This phile was at first meant to be part of our jemalloc research also
presented in this Phrack issue. Nevertheless, the Phrack staff honored us
by asking if we were willing to write a separate text with an in-depth
analysis of all that voodoo we had to perform. Readers might agree that
VLC is not the most exotic target one can come up with, but we decided
not to disclose any 0day vulnerabilities and keep it going with a list
of already published material, found by carefully looking for advisories
tagged as 'heap based overflows' (we could have googled for 'potential
DoS' as well, since it usually means ring0 access ;). Keep in mind that
we wouldn't like to present a vulnerability that would be trivial to
exploit. We were looking for a target application with a large codebase;
VLC and Firefox were our primary candidates. We finally decided to deal
with the first. The result was a local exploit that does not require
the user to give any addresses; it can figure out everything by itself.
1 - We assume that the attacker has local access on a server running VLC.
The VLC instance must have at least one of its several control interfaces
enabled (HTTP via --extraintf, RC via --rc-host or --rc-unix), that
will be used to issue media playback requests to the target and make
the whole process interactive, that is VLC should be running in 'daemon'
mode. Most people will probably think that those control interfaces can
also be used to perform a remote attack; they are right. Although, the
MP4 vulnerability exploited in this article cannot be used for remote
exploitation, developing a reliable remote exploit is, indeed, feasible
and in fact, it's just a matter of modifying the attached code.
2 - VLC cannot be run as root, so, don't expect uid 0 shells. We will
only try to stress the fact that some people will go to great lengths
to have your ass in their plate. Hacking is all about information,
the more information the easier for you to elevate to root.
3 - We assume our target is a x86 machine running FreeBSD-8.2-RELEASE,
the exact same version we used during our main jemalloc research.
4 - Last but not least, we assume you have read and understood our
jemalloc analysis. We don't expect you to be a jemalloc ninja, but
studying our work the way you do your morning newspaper will not get
you anywhere either ;)
#ifdef MALLOC_MAG
static __thread mag_rack_t *mag_rack;
#endif
...
...
}
bin_mags->curmag = mag;
mag_load(mag);
}
ret = mag_alloc(mag);
...
return (ret);
}
mag->rounds[i] = round;
}
...
mag->nrounds = i;
...
}
mag_alloc(mag_t *mag) {
if (mag->nrounds == 0)
return (NULL);
mag->nrounds--; /* (1) */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
void start_allocs(void) {
int i;
printf("Allocating regions\n");
for(i = 0; i < 10; i++) {
allocs[i] = malloc(192);
printf("%p\n", allocs[i]);
}
return;
}
void free_allocs(void) {
int i;
printf("Freeing the regions\n");
for(i = 0; i < 10; i++)
free(allocs[i]);
return;
}
void free_allocs_rev(void) {
int i;
printf("Freeing the regions in reverse order\n");
for(i = 10 - 1; i >= 0; i--)
free(allocs[i]);
return;
}
sleep(1);
if(rev)
free_allocs_rev();
else
free_allocs();
start_allocs();
return NULL;
}
if(argc > 1)
rev = atoi(argv[1]);
start_allocs();
pthread_create(&tid, NULL, thread_runner, (void *)&rev);
pthread_join(tid, NULL);
return 0;
}
Once the new thread calls 'start_allocs()', the exact same regions that
were previously freed will be eventually be returned to the caller. The
order by which they will be returned, depends on the way they were freed
in the first place. Let's run our test program above by passing it the
value 0 in 'argv[1]'; this will ask the thread to free the regions in
the normal way.
As you can see, the calls to 'malloc()' performed by the thread, return
the regions in reverse order; this is very similar to what the previous
section explained. Now let's free the regions allocated by 'main()'
by calling 'free_allocs_rev()':
Interestingly, the regions are returned in the same order as they were
allocated. You can think of that as the 'rounds[]' array in 'mag_load()'
being reversed; the allocations are freed in the reverse order and
placed in 'rounds[]' but 'mag_alloc()' gives out regions in reverse
order too... Reverse + reverse = obverse ;)
So why this is important? Regions of a commonly used size (e.g 64), are
usually allocated by a program before 'pthread_create()' is called. Once a
thread is created and '__isthreaded' is set to true, freeing those regions
may result in some thread becoming their master. Future allocations from
the thread in question, may return regions in the normal way rather than
in decreasing memory addresses as shown in the previous section. This
a very important observation that an exploit coder must keep in mind
while targeting FreeBSD applications or any program utilizing jemalloc.
To sum up:
1 - While in glibc and dlmalloc you were used to seeing new memory
regions getting allocated in higher addresses, this is not the case with
jemalloc. If magazines are enabled, continuous allocations may return
regions in decreasing memory order. It's quite easy for anyone to verify
by pure observation.
2 - Don't get 1 for granted. Depending on the order the allocations were
performed, even if thread magazines are enabled, memory regions may end
up being returned in the normal order. This, for example, can happen when
memory regions that were allocated before the first thread is spawned,
are eventually freed by one of the threads.
For each box type, a dispatch table is used to call the appropriate
function that handles its contents. For 'skcr' boxes, 'MP4_ReadBox_skcr()'
is responsible for doing the dirty work.
/* modules/demux/mp4/libmp4.c:2248 */
static int MP4_ReadBox_skcr(..., MP4_Box_t *p_box) {
MP4_READBOX_ENTER(MP4_Box_data_frma_t);
MP4_GET4BYTES(p_box->data.p_skcr->i_init);
MP4_GET4BYTES(p_box->data.p_skcr->i_encr);
MP4_GET4BYTES(p_box->data.p_skcr->i_decr);
...
}
The very first thing to note is the size of the victim structure (the
one being overflown). 'MP4_Box_data_frma_t' has a size of 4 bytes, so,
it is handled by jemalloc's bin for this specific size class (depending on
the variant, 4 may or may not be the smallest bin size). As a consequence,
the 8 bytes written outside its bounds can only influence neighboring
allocations of equal size, namely 4. Exploit developers know that the
heap has to be specially prepared before triggering an overflow. For
this specific vulnerability, the attacker has to force VLC place
4-byte allocations of interest next to the victim structure. Looking
carefully in libmp4.h, reveals the following two box types which seem
to be interesting:
typedef struct {
char *psz_text;
} MP4_Box_data_0xa9xxx_t;
...
Obviously, both structures are 4 bytes long and thus good target
candidates. 'MP4_Box_data_0xa9xxx_t' holds a pointer to a string we
control, and 'MP4_Box_data_cmov_t' a pointer to some 'MP4_Box_t' whose
type and contents may be partially influenced by the attacker. Let's focus
on the 'cmov' box first and why that 'p_moov' pointer is interesting. What
can we do if we eventually manage to place a 'cmov' box next to the victim
'frma' structure?
/* modules/demux/mp4/libmp4.c:2882 */
MP4_Box_t *MP4_BoxGetRoot(...) {
...
/* If parsing is successful... */
if(i_result) {
MP4_Box_t *p_moov;
MP4_Box_t *p_cmov;
...
p_moov = p_cmov->data.p_cmov->p_moov; /* (1) */
...
p_moov->p_father = p_root; /* (2) */
...
}
}
return p_root;
}
For now, let's forget about 'p_root' and find a way of overwriting the
'p_moov' field of a 'cmov' box. First we need to perform several 4-byte
allocations to stabilize the heap and make sure that the 8 bytes to
be written to the adjacent regions will not end up in neighboring
run/chunk metadata. Such a situation may cause a segmentation fault on
the next call to 'malloc()'; that's something we would definitely like
to avoid. The tool for performing user controlled allocations is called
'MP4_ReadBox_0xa9xxx()', the function responsible for parsing boxes of
type 'MP4_Box_data_0xa9xxx_t'. A careful look at its code reveals that
we can allocate a string of any size we please; 'AAA\0' is exactly what
we need right now ;)
Now recall that in certain cases, when the target application is
threaded and has 'opt_mag' enabled, jemalloc will return memory regions
in descending memory addresses which is the case with VLC during the
MP4 parsing process. Extra threads are created and used to pre-process
the files, download album artwork and so on. What we really need to do
is force the heap to be shaped as shown below:
...[SKCR][JUNK][CMOV][AAA\0][AAA\0][AAA\0]...[AAA\0]...
- +
Long nights of auditing VLC revealed that there's no easy way for us to
control the memory contents pointed by 'p_root'. Although we had began
feeling lost, we came up with a very daring idea that, although dangerous
at first hearing, we were quite confident that would eventually work
fine: Why not somehow 'free()' the 'p_root' region? Releasing 'p_root'
memory and performing several 64-byte (= sizeof(MP4_Box_t)) allocations
will force jemalloc give 'p_root' back to us. '0xa9xxx' boxes can be
used to perform user controlled allocations, so, theoretically are ideal
for what we need. Suppose 'p_root' is freed, then a series of 'a9xxx'
boxes that contain 64-byte opcodes will result in 'p_root' eventually
holding our shellcode payload... Right?
Two questions now arise. First, how can one know the address of 'p_root'
in order to free it? This is a good question, but it's something we will
be dealing with later. Second, each '0xa9xxx' box results in two 64-byte
allocations; one for the 'MP4_Box_t' structure to hold the box itself and
one for the string that will contain our shellcode. How can we guarantee
that 'p_root' will be given back by jemalloc for the string allocation and
thus not for the 'MP4_Box_t'? This is where 'chpl' boxes come in handy:
/* modules/demux/mp4/libmp4.c:2413 */
static int MP4_ReadBox_chpl(..., MP4_Box_t *p_box) {
MP4_READBOX_ENTER(MP4_Box_data_chpl_t);
...[SKCR][A9XXX][A9XXX]...
- +
/* modules/demux/mp4/libmp4.c:788 */
static int MP4_ReadBox_stts(..., MP4_Box_t *p_box) {
MP4_READBOX_ENTER(MP4_Box_data_stts_t);
...
MP4_GET4BYTES(p_box->data.p_stts->i_entry_count);
p_box->data.p_stts->i_sample_count =
calloc(p_box->data.p_stts->i_entry_count, sizeof(uint32_t)); /* (1) */
...
To sum up, for this first part of the exploitation process the attacker
must perform the following steps:
1 - Overwrite 'p_moov'
1c - Allocate and fill an 'skcr' box. The 'skcr' handling code will
allocate an 'frma' structure (4 bytes) and write 12 bytes in its region
thus eventually overwriting 'cmov->p_moov'
2c - Add an 'skcr' that will overwrite the adjacent '0xa9xxx' boxes. The
overwritten values should be the address of 'p_root' and a random 64
byte allocation in this specific order.
2d - Add an invalid 'stts' box that will force the parsing process
to fail, the 'cmov' and its children (two '0xa9xxx' and one 'skcr')
to be freed, the 'psz_text' members of 'MP4_Box_data_0xa9xxx_t' to be
passed to 'free()' and consequently, 'p_root' to be released.
2e - Add several 'chpl' boxes. Each one will result in 254 64byte
allocations with user controlled contents. Pray that jemalloc will give
'p_root' back to you (most likely).
Apart from a full blown media player, VLC can also work as a
transcoder. Transcoding is the process of receiving numerous inputs,
applying certain transformations on the input data and then sending the
result to a set of outputs. This is, for example, what happens when you
rip a DVD and convert it to an .avi stored on your hard disk. In its most
simple use, transcoding may be used to duplicate an input stream to both
your soundcard as well as an alternate output, for example, an RTP/HTTP
network stream, so that other users can listen to the music you're
currently listening; a mechanism invaluable for your favorite pr0n. For
more information and some neat examples of using VLC in more advanced
scenarios, you can have a look at the VideoLan wiki and especially at [5].
Trying to find a way to leak memory from VLC, we carefully studied several
examples from the wiki page at [5] and then started feeding VLC with a
bunch of mysterious options; we even discovered a FreeBSD kernel 0day while
doing so. After messing with the command line arguments for a couple of
minutes we settled down to the following:
The sound information, the one you hear when playing a file, is split in
packets, each one carrying the track ID for the track whose data it
contains (as we have already mentioned, track data may be interleaved, so
the file parser has to know what packet belongs to what track). The sipr
codec goes further by allowing a packet to contain subpackets. When a
packet with multiple subpackets is encountered, its contents are buffered
until all subpackets have been processed. It's only then when the data
in sent to your audio card or to any pending output streams ;)
Every time a new packet is encountered in the input stream, VLC will check
the track it belongs and figure out the audio codec for the track in
question. Depending on this information, the appropriate audio demuxer is
called. For sipr packets, 'DemuxAudioSipr()' is the function responsible
for this task.
/* modules/demux/real.c:788 */
static void DemuxAudioSipr(..., real_track_t *tk, ...) {
...
tk->p_sipr_packet = p_block;
}
/* (2) */
memcpy(p_block->p_buffer + tk->i_sipr_subpacket_count * tk->i_frame_size,
p_sys->buffer, tk->i_frame_size);
...
/* Checks that all subpackets for this packet have been processed, if not
* returns to the demuxer.
*/
if(++tk->i_sipr_subpacket_count < tk->i_subpacket_h)
return;
...
struct block_t {
block_t *p_next;
uint32_t i_flags;
mtime_t i_pts;
mtime_t i_dts;
mtime_t i_length;
unsigned i_nb_samples;
int i_rate;
size_t i_buffer;
uint8_t *p_buffer;
block_free_t pf_release;
};
p_buffer
.-----.
| |
| v
.---------.----------------------.
| block_t | ... audio data ... |
'---------'----------------------'
struct block_sys_t {
block_t self;
size_t i_allocated_buffer;
uint8_t p_allocated_buffer[];
};
...
#define BLOCK_ALIGN 16
...
#define BLOCK_PADDING 32
...
/* (1) */
const size_t i_alloc = sizeof(*p_sys) + BLOCK_ALIGN +
(2 * BLOCK_PADDING) + ALIGN(i_size);
p_sys = malloc(i_alloc);
...
return &p_sys->self;
}
1 - We know that if one packet has two, for example, subpackets, then
its 'p_block' will be alive until all subpackets have been processed;
when they are no longer needed, they will be freed resulting in a small
hole in the heap. Obviously, the lifetime of a 'p_block' is directly
related to the number of its subpackets.
1 - Use the RMF metadata (MDPR chunks) to define two tracks. Both
tracks must use the sipr audio codec. Each packet of the first must
have 2 subpackets and each packet of the second 0 subpackets for the
vulnerability to be triggered.
2 - Force VLC play the first subpacket of a packet of the first track. A
new 'block_t' will be allocated. In the diagram below, 't1s1' stands for
'track 1 subpacket 1'.
.---------.-------.
| block_t | t1s1 |
'---------'-------'
3 - Force VLC to play the packet of the second track; the one that has
0 subpackets. A new 'block_t' will eventually be allocated. We have
to specially prepare the heap so that the new block is placed directly
behind the one initialized at step 2.
.---------.------..---------.------.
| block_t | t2s0 || block_t | t1s1 |
'---------'------''---------'------'
An overflow will take place thus overwriting the block header of the
block allocated in the previous step. We are interested in overwriting
the 'p_buffer' to make it point to a memory region of our choice and
'i_buffer' to the number of bytes we want to be leaked.
4 - Feed VLC with the second subpacket for the first track. Since the
first subpacket was processed at step 2, the old 'block_t' will be
used. If everything goes fine, its 'p_buffer' will point where we set
it to and 'i_buffer' will contain a size of our choice. The 'memcpy()'
call at (2) in 'DemuxAudioSipr()' will write 'i_frame_size' bytes at our
chosen address thus trashing the memory a bit, but when 'es_out_Send()'
is called, 'i_buffer' bytes starting at the address 'p_buffer' points to
will be sent to the soundcard or any output stream requested by the user!
Careful readers would have probably noticed that we took for granted that
the victim block will be allocated right before the target. Such a result
can easily be achieved. The technique we use in our exploit is very
similar to one of the techniques used in browser exploitation. Briefly,
we create several tracks (more than 2000) holding packets of 2 subpackets
of 20 bytes each so that all packets end up being allocated in bin-192. We
then force the release of two consecutive allocations thus creating two
adjacent holes in the heap. Then, by following what was said so far,
we can achieve a reliable information disclosure. Our tests show that
we can repeat this process around 40 times before VLC crashes (yet this
is only statistics, beautiful Greek statistics ;p).
It's now time to sum up the information leak process. For a successful
information disclosure, the attacker must perform the following steps:
3 - Play the first subpacket of the target track. The hole in the higher
address will be assigned to the new block.
4 - Play the packet of the victim track. The new block will be given the
lower heap hole and the overflow will reach the block allocated at step 3.
5 - Play the second subpacket of the target track. The memory we want
to read will be trashed by 20 bytes (= frame size) and then returned in
the output stream.
1 - Forces VLC to play an innocent MP4 file so that the target plugin
is loaded.
2 - Parses the ELF headers of the VLC binary in order to locate the
absolute address of its .got section.
3 - Uses a specially crafted RMF file to leak 65k starting at the address
of the binary's .got.
4 - The second entry in the .got table points to the linkmap; a linked
list that keeps track of the loaded libraries populated by the loader
on each call to 'dlopen()'. Each entry holds the name of a library, the
address it's mapped at and so on. We proceed by leaking 1MB of data
starting from the address of the first linkmap entry.
7 - The relocation entries, the string table and the symbol table
indicated by the .dynamic section of the MP4 plugin can be properly
combined to figure out what .got entry corresponds to what symbol name. We
choose to overwrite the .got entry for 'memset()' (more on this later).
The absolute address of the 'memset()' .got entry is then calculated
and used as the value that will be written in 'p_moov'.
9 - A final MP4 file is created. The MP4 file frees all 'p_root'
candidates, uses 'chpl' boxes containing the shellcode to force jemalloc
give the original 'p_root' region back and lands VLC on the 'unlink()'
style pointer exchange. The address of 'p_root', which now contains user
supplied data, is written in the .got entry of 'memset()'.
So why did we choose to hook 'memset()'? Turns out that once the MP4 file
parsing has finished and the 'unlink()' tyle code has been triggered,
VLC calls 'MP4_BoxDumpStructure()' to print the layout of the MP4 file
(this is always done by default; no verbose flags required). Since we
have corrupted the boxes, 'MP4_BoxDumpStructure()' may access invalid
memory and thus segfault. To avoid such a side effect, we have to hook
the first external function call. As shown below, this call corresponds to
'memset()' which suits us just fine ;)
if( !i_level )
{
...
}
else
{
...
memset(str, ' ', sizeof(str));
}
...
p_child = p_box->p_first;
while(p_child)
{
__MP4_BoxDumpStructure(..., p_child, ...);
p_child = p_child->p_next;
}
}
At first we thought that this would be the easier part of the exploitation
process; it turned out that it was actually the most difficult. Our first
idea was to play an MP4 file several times and then leak memory in the
hope that certain 'MP4_Box_t' signatures may be present somewhere in the
heap. Unfortunately, the 64-byte allocations used by the MP4 plugin, are
later used by the RMF parser thus destroying any useful evidence. After
long nights and lots of tests, we came up with the following technique
which turned out to be successful:
4 - We leak 65k starting from the binary's .bss section. The bins array
of the main arena lies somewhere around. We analyze the data and locate
the address of bin-64.
--[ 6 - Demonstration
An art of exploitation paper serves nothing without the proper show off ;)
This section was specially prepared to be hard sex for your eyes. We were
very careful and, in fact, we spent many hours trying to figure out the
leetest shellcode to use, but we couldn't come up with something more
perfect than 'int3'.
Let's run the exploit. The actual output may differ since the logs shown
below do not correspond to the latest version of our code (oh and by
the way, we are not fucking Python experts).
The exploit output informs us that the .got entry for memset() lies
at 0x284edea8. Let's verify...
(gdb) quit
A debugging session is active.
Quit anyway? (y or n) y
If you decide no to use gdb (that's what real men do), the following
message will pop up upon successful exploitation.
Trace/BPT trap: 5 (core dumped)
--[ 7 - Limitations
2 - For some reason we are not aware of, requesting a memory leak of
more than 8MB returns no data at all. Maybe this is related to the output
filters splitting the 'p_blocks' in smaller parts, or maybe not ;p This
is a very important limitation, since smaller leaked data chunks means
more requests for leaked memory which in turn implies more memory being
trashed. Consequently, more data we shouldn't touch may be modified
resulting in an unexpected crash of VLC.
4 - The exploit assumes that 64-byte regions usually lie between 0x28700000
and 0x28e00000 and tries to locate them. Some times the heap extends
beyond that range. We have to find a way to figure this out, get the
topmost heap address and explore the whole region. Doing that in a
reliable way requires problem 2 to be solved first.
5 - In section 5.2 we analyzed how the 'p_root' candidates are located. The
process described in the aforementioned section takes into account only the
bins of the first arena, but VLC, being a multithreaded application,
initializes more than one. We believe it's possible to detect those extra
arenas, locate their bin-64 addresses and take them into account as well.
Alternatively, one may leak and analyze the TLS data of each thread thus
locating their magazine racks, their magazines and the 'rounds[]' array
corresponding to 64-byte regions.
6 - In step 6 of section 5.2 we said that all regions of the detected runs
will eventually be freed by our special MP4 file in hope that 'p_root' will
lie somewhere within them. Although we do our best to fill heap holes, this
process may result in a segmentation fault due to the fact that regions
already freed are freed for a second time. It is possible to avoid this by
having a look at the target runs' region bitmap and freeing only those
regions that seem to be allocated. We didn't have the time to implement
this but we believe it's trivial (take a look at the comments in the
exploit's 'main.py').
If you manage to solve any of these problems, please let us know; don't
be a greedy pussy ;)
Exploit development is definitely a hard task; Do you think that the money
offered by [censored] is worth the trouble?
In this article, which is short compared to our efforts during the exploit
development, we tried to give as much detail as possible. Unfortunately
there's no way for us to present every minor detail; a deeper look into
VLC's source code is required. All that jemalloc stuff was fun but
tiresome. We think it's about time we take some time off :) We would like
to thank the Phrack staff for being the Phrack staff, our grhack.net
colleagues and all our friends that still keep it real. Our work is
dedicated to all those 'producers' of the security ecosystem that keep
their mouth shut and put their brains to work. Love, peace and lots of #.
--[ 9 - References
[7] RealAudio
http://en.wikipedia.org/wiki/RealAudio
--[ EOF