0% found this document useful (0 votes)

11 views7 pages

Format Strings

The document discusses format string vulnerabilities that can allow attackers to perform arbitrary memory reads and writes. These vulnerabilities occur when an attacker can control the format string passed to printf and related functions. The attacker can use format specifiers to read values from memory addresses of their choice or overwrite memory locations with attacker-chosen values. This allows bypassing security defenses like stack canaries. The document then explains how attackers can exploit these vulnerabilities to read data from the call stack or write to arbitrary memory.

Uploaded by

misschuachin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Format Strings

Uploaded by

misschuachin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Format Strings

G. Lettieri
18 October 2023

1 Introduction
We now introduce a class of vulnerabilities and attack vectors involving format
strings.
“Format strings” are the control strings that are passed to the printf()
family of functions and contain the output template for the functions. These
functions are vulnerable whenever the attacker can control the format string
itself.
These vulnerabilities can be very powerful in the hands of a skilled attacker.
In the worst case, the attacker will be able to perform arbitrary memory reads
and even arbitrary memory writes. That is, the attacker can be able read words
from memory addresses chosen by the attacker, or overwrite memory locations
chosen by the attacker with values chosen by the attacker.
It should be clear how these powers allow an attacker to completely defeat
stack canaries, e.g., by reading the canary from memory, or by overwriting the
global canary, or by overwriting a return address without touching the canary.

2 Format string bugs

The attack vectors come from the way variadic functions are implemented in
C. Variadic functions are declared by ending the list of their arguments with
“...”. For example, printf() can be declared as
int printf(const char *fmt, ...);

Basically, the C compiler handles variadic functions by simply not checking the
number and types of the arguments that are passed to the function in the “...”
position. All the arguments found in the call site are put in their place in the
registers or on the stack. If the called function needs one of these arguments,
it reads the expected location for that argument. The function has no way of
knowing if the argument was actually passed by the caller, or if the argument
type was the correct one: it will read whatever the expected argument location
currently contains, and interpret it as a value of the expected type. Correct
functionality depends entirely on the conventions between the caller and the

1
called program. The programmer must follow these conventions, making sure
to pass all the arguments that are actually needed in each call.
In the printf() family of functions, the convention is that each format
specifier takes an additional argument. For example, in
printf("a is %d and b is %d\n", a, b);

the first “%d” will read the first argument (a) after the format string, interpret
it as an integer, and print its decimal value; the second “%d” will read the next
argument (b). On 32b systems, the first argument is on the stack, just below
the pointer to the format string; the second argument is below the first one,
and so on. On 64b systems the first 6 arguments (including the pointer to the
format string) are passed in registers, and any additional arguments are pushed
on the stack.
Now consider a call like this
printf("a is %d and b is %d\n", a);

where there are two “%d”s, but only one additional argument. This code will
compile. At runtime, the printf() function will read and print the value of
a correctly, but then it will also print whatever is stored under a on the stack
(32b), or the current contents of the rdx register (64b).1
Finally, consider a statement like this
printf(buf);

where the contents of buf are controlled by the attacker. The programmer
simply wanted to print a string, but printf() interprets every “%” character
inside buf as a format specifier. Each one of these format specifiers needs a
corresponding argument and printf() will read the registers or the memory lo-
cations where that argument should have been, under the attacker’s control (the
correct way to print a string is either puts(buf) or printf("%s", buf)).

3 Exploiting format string bugs

Now let us play the role of the attacker and assume that we can control a format
string used by a victim program.
Probably the best way to think about what we can do, is to think of
printf() as a new machine with its own programming language. The format
string is the program and the instructions are normal characters and format
specifiers. The printf() machine has its own instruction pointer, pointing
to the next character/format specifier to “execute”. This pointer moves only
forward without jumps in either direction: there are no loops and no conditional
branches.
The instructions update the machine state which includes its instruction
pointer and:
1 Modern compilers can be instructed to recognize “printf-like” functions and will issue

a warning if the conventions are not followed.

2
1. an argument pointer, pointing to the argument to be used by the next
format specifier;
2. an output counter, containing the number of characters that have been
output so far.
The machine also produces output—the characters sent to the standard output.
For example, any ordinary character, such as “a”, can be seen as an instruc-
tion to print the character itself. As a side effect, the instruction pointer moves
past the character in the string and the output counter is incremented by one,
while the argument pointer doesn’t change. As another example, a “%d” spec-
ifier reads the argument pointed to by the argument pointer and moves the
argument pointer to the next position, interprets and outputs the argument
as an integer, and increments the argument counter by the number of output
characters; finally, the instruction pointer moves past the “%d” in the string.
Surprisingly, the printf() machine can also write to memory: see the man
page for the little-known “%n” format specifier. The argument to this specifier
must be a pointer to an integer variable. printf() will execute it by writing
the current output counter into the variable. For example, assume that cnt1
and cn2 are two int variables; then, the following statement
printf("AAAAA%nBBB%nCCCC", &cnt1, &cnt2);

will assign 5 to cnt1 and 8 to cnt2.

3.1 Stack reads

The simplest way to exploit a format string vulnerability is to leak information
from the stack of the process under attack. On 32b systems, a sequence of %x
specifiers will cause printf() to print successive lines from the stack. On
64b systems, the first 5 %lx will print the contents of the rsi, rdx, rcx, r8,
and r9, and any additional %lx will start printing successive stack lines. By
studying the binary, or simply by observing the output, the attacker may be
able to determine which of these lines contains the stack canary. On 32b systems
the canary can be read with %x, but on 64b you need %lx, because %x will only
read 4 bytes in both systems.
The only real difficulty for the attacker comes from space limitations in the
controlled buffer, since the argument pointer is only moved forward by a format
specifier. Any format specifier will move the argument pointer by at least one
stack line (which is 4 bytes in 32b systems and 8 bytes in 64b systems), since
arguments are always aligned to stack lines. Assume that the buffer size is s:
the attacker can only move the argument pointer by ⌊s/2⌋ lines, which may not
be enough to reach the canary’s position.

3.2 Random access to arguments

Another little known fact is that the format string can also access its argu-
ments in random order using the “%n$” syntax, which selects the nth argument

3
directly. For example,
printf("%4$d %1$d %3$d %2$d\n", 10, 20, 30, 40);

will print “40 10 30 20”.

In some cases, this syntax can be used to easily overcome the space limita-
tions that we have mentioned above. If we know that the canary is n stack-lines
below the pointer to the format string, “%n$x” will print it directly on 32b
systems, while “%(n + 5)$lx” will do the same on 64b ones.
This was possible in old versions of glibc, or even in modern versions if
some compile options were not been enabled (see FORTIFY_SOURCE below).
Instead, according to the standard, random access and (the normal) sequential
argument access are mutually exclusive (i.e., the same format string cannot
contain both forms), and more importantly, once all the argument numbers have
been collected, there can be no gaps left. This means a that a format string
like “%n$x” with n > 1 is non-standard, since it references the nth argument
without also referencing all the arguments from the 1st to the (n − 1)th. We
can understand why the standard imposes this no-gaps requirement: To jump to
the nth argument, printf() must know how many stack lines (and registers)
are occupied by the arguments up to the (n − 1)th. However, arguments can
occupy a variable number of stack lines, depending on their type. For example,
long long occupies two lines on 32b systems, while long double takes three
lines on 32b systems and 2 lines on 64b systems. To implement random access
arguments, the printf() function should scan the format string a first time,
without producing any output, to collect all the argument types. Then it should
start the normal scan, using the types collected in the first scan to compute the
correct stack line of each argument. For this algorithm to work, however, the
first scan must eventually see all the arguments from the 1st to the highest
referenced number. This is how musl libc works, for example.
We can see that, if the no-gaps rule is enforced, random access arguments
cannot be used to overcome the space limitations in the buffer. When glibc
allows this behaviour, though, it simply assumes that all non-referenced argu-
ments occupy one stack-line each.
However, there may be limits on the maximum number of arguments, so you
will usually not be able to use this feature to read memory very far down the
stack, or especially at addresses lower than the top of the stack.

3.3 Arbitrary memory reads

The above limitation can be overcome if the attacker can control both the
printf() program (i.e., the format string) and at least some of its arguments.
This may be the case, for example, if the format string controlled by the attacker
is itself on the stack and can be accessed by the argument pointer.
Suppose that there are o stack-lines between the line pointed to by first
argument of printf() (included) and the first line of the copy of the format
string (excluded). In 32b systems, arguments number 1 to o will read from these

4
o stack-lines, while argument number o + 1 will read from the first line of the
format string. In 64b systems, arguments 1–5 will read from the usual registers,
arguments 6 to o + 5 will read from the o stack-lines, and argument o + 6 will
read from the first line of the format string. The attacker can therefore put both
the instructions and their arguments in the same format string “program”.
This is rather useless for instructions like “%x”, but consider the “%s” in-
struction, instead. Normally, this prints a string, but when reinterpreted as in
instruction for our printf() machine, it prints the contents of memory start-
ing from the address specified by its argument and stopping at the first null
byte. If the attacker can choose the address that the instruction will use, it is
an arbitrary memory read instruction.
For example, suppose that o is 2 and the victim program is a 32b one.
To read bytes from address 0x11223344 the attacker can prepare the string
“\x44\x33\x22\x11%c%c%s”. The purpose of the two “%c” instructions is
to move the argument pointer until it points to the beginning of the format
string, so that the “%s” instruction can take the 0x11223344 address as an
argument. Note that we also need the buffer to be stack-line aligned, which
may not always be the case. This just means that you may need some padding
bytes at the beginning before writing the address.
A problem may arise if there are no null bytes to stop printf() before it
reaches some unreadable addresses, which may cause the process to be termi-
nated. We can easily overcome this limitation by using a “%.ms” instruction,
which will always read (and print) at most m bytes.
Null bytes in the address, however, can be a problem, since the null byte
is a halt instruction for printf(). For example, in the format string above
a null byte in the address would stop the printf() before it could even see
the first “%c” instruction. However, if null bytes are otherwise allowed in the
format string, this is not really a problem: the address can be placed after the
instructions. For example, suppose we want to read address 0x44002211, the
program is 32b and that o is 1, with the format string stack-line aligned. Then,
we can send the string “%c%c%c%s\x11\x22\x00\x44”. Note that we added
an extra “%c” to move the argument pointer one step further. If random access
is available, this is even easier: “%3$s\x11\x22\x00\x44”. If null bytes are
not allowed anywhere, but the address only contains null bytes in the most
significant positions, the attacker can still succeed by placing the non-null bytes
of the address at the very end of the string and exploiting any null bytes that
might accidentally follow the string in memory.

3.4 Arbitrary memory writes

The ultimate power comes from the ability to overwrite arbitrary memory words
with arbitrary values. This can be accomplished by using the “%n” instruction,
taking the address from the format string itself, and by precisely controlling the
output counter.
Controlling the output counter is less difficult than it may seem, since an
instruction like “%mc” will always increment the output counter by exactly m.

5
If there are also other instructions in the format string, you must be careful to
control the number of bytes that they output. This can be done by adding width
specifiers to each one of them, but be aware of the exact semantics: “%ms” will
always output at least m bytes, while “%.ms” will always output at most m
bytes. If you want exactly m bytes, you need both: “%m.ms”.
Another possible difficulty comes from the fact that, if you want to write a
very large value (say, the address of a function), you may have to output an
impractical or impossibly large number of bytes. This difficulty can be overcome
by using the “%hn” instruction, which truncates the counter to a short (2 bytes),
or even “%hhn”, that truncates it to a char. If you use the latter instruction 4
times on consecutive addresses, for example, you can write any 32 bit value one
byte at a time, always incrementing the output counter by a maximum of 255
bytes. Note that, if the LSB of the counter is c and you need a value v < c, you
cannot subtract from the counter, but you can increment it by 256 − c + v bytes
and the LSB will become v.
As an example, suppose that you want to write the value 0x44552233 and
the LSB of the output counter starts at 32. You can send
"%36c%hhn%17c%hhn%205c%hhn%17c%hhn"

The first instruction sets the counter to 32 + 36 = 68 = (44)16 and the second
instruction writes it to memory; the third instruction sets the counter to 68 +
17 = 85 = (55)16 ; the fourth instruction writes the new counter to memory;
the fifth instruction sets the counter to 205 + 85 = 290 = (122)16 and the
sixth instruction writes its LSB—i.e., (22)16 —to memory; finally, the seventh
instruction sets the counter to 290+17 = 307 = (133)16 and the eight instruction
writes the final (33)16 .
Of course, the above format string is incomplete, since we need to provide
arguments for all of the “%hhn” instructions. Since we are moving the argument
pointer sequentially, we also need to provide a dummy argument to each “%mc”.
For example, suppose that o is zero, the format string is stack line aligned,
the system is 32b, and we want to write 0x44552233 to memory address
0x01020304. We can complete the above format string by prefixint it with
the following
"AAAA\x04\x03\x02\x01BBBB\x05\x03\x02\x01"
"CCCC\x06\x03\x02\x01DDDD\x07\x03\x02\x01"

The “AAAA”, “BBBB”, and so on, serve as dummy arguments for the c instruc-
tions and to re-align the next argument to the stack line. The other arguments
are the addresses of all the bytes of the target memory location, starting from
the least significant one.
Note that printf() will also process this part of the string as a program
before reaching the part that will reuse this same string for the arguments.
As a program, this part of the string only prints bytes, since it contains no
format specifications. Howiever, it does increment the output counter, which

6
will end up being 32. For this reason we assumed an initial counter of 32 in the
calculations above.

4 Mitigations
The gcc compiler and glibc library include a number of mitigations for this type
of attack. The mitigations are enabled when the _FORTIFY_SOURCE macro is
defined and the optimization level is at least one (-O or higher). The macro
can be set to either 1 or 2, with the latter enabling stricter checks that may
break some program. It is often the case that _FORTIFY_SOURCE has already
been defined for you, so you only need to enable optimizations to include these
mitigations in your programs.
This option enables several checks, both at compile time and at run time,
that try to limit or prevent the effects of cetrain types of bugs. As far as
format string bugs are concerned, these are the most relevant changes when
_FORTIFY_SOURCE is set to 2:

• glibc will abort the process if a format string with random access argu-
ments does not use all the arguments;
• glibc will abort the process if a format string containing a “%n” operator
is read from writeable memory.

You can see how the most advanced uses of format string bugs, and in particular
the arbitrary memory write exploits, are made much more difficult to exploit
when these checks are in place.
Modern compilers also issue warnings when they see printf()-family func-
tions being used in possibily unsecure ways. In gcc you can enable these warn-
ings with the -Wformat-security compile option.

Lab3 Slides Format String
No ratings yet
Lab3 Slides Format String
36 pages
Unit 3 - V1
No ratings yet
Unit 3 - V1
15 pages
The C Handbook1
100% (1)
The C Handbook1
52 pages
Mod5 Chapter3
No ratings yet
Mod5 Chapter3
25 pages
C Programming
No ratings yet
C Programming
13 pages
What Is The Purpose of Main Function?: CHAPTER 3: Functions
No ratings yet
What Is The Purpose of Main Function?: CHAPTER 3: Functions
39 pages
CH 03
No ratings yet
CH 03
29 pages
Borland C
No ratings yet
Borland C
180 pages
C 1 Unit
No ratings yet
C 1 Unit
22 pages
Name: - Date: - : English Year 2 Unit 7: Get Dressed Match The Picture Correctly
No ratings yet
Name: - Date: - : English Year 2 Unit 7: Get Dressed Match The Picture Correctly
12 pages
Stdio Stream
No ratings yet
Stdio Stream
7 pages
06 Week6 8 Software Security B
No ratings yet
06 Week6 8 Software Security B
61 pages
Assignment DCA1107 - C Programming
No ratings yet
Assignment DCA1107 - C Programming
12 pages
Secrets of Printf
No ratings yet
Secrets of Printf
6 pages
Ivth Unit - Files in C
No ratings yet
Ivth Unit - Files in C
19 pages
Care Management of Small Ruminant
No ratings yet
Care Management of Small Ruminant
29 pages
Format String
No ratings yet
Format String
5 pages
Lecture 5 - Control Hijacking and Defenses: CS Department City University of Hong Kong
No ratings yet
Lecture 5 - Control Hijacking and Defenses: CS Department City University of Hong Kong
51 pages
Chapter 3: Formatted Input/Output
No ratings yet
Chapter 3: Formatted Input/Output
8 pages
Unit 5 - Handling Standard Input and Output
No ratings yet
Unit 5 - Handling Standard Input and Output
9 pages
Assignment DCA1107 - C Programming
No ratings yet
Assignment DCA1107 - C Programming
12 pages
Anatomy and Physiology Workbook FINAL
100% (1)
Anatomy and Physiology Workbook FINAL
66 pages
Format String Vulnerability
No ratings yet
Format String Vulnerability
12 pages
Secrets of Printf
No ratings yet
Secrets of Printf
6 pages
Format String Ex
No ratings yet
Format String Ex
2 pages
Unit 5 54
No ratings yet
Unit 5 54
14 pages
Format String Bug
No ratings yet
Format String Bug
17 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Basic Cio
No ratings yet
Basic Cio
10 pages
Lecture 5 Bufferoverflow
No ratings yet
Lecture 5 Bufferoverflow
27 pages
E 102 F 23 Lecture 02
No ratings yet
E 102 F 23 Lecture 02
24 pages
Secrets of "Printf": 1 Background
No ratings yet
Secrets of "Printf": 1 Background
6 pages
TNSTC
No ratings yet
TNSTC
1 page
0 Books
No ratings yet
0 Books
12 pages
001 Format String Attacks
No ratings yet
001 Format String Attacks
26 pages
4.input Output Operations
No ratings yet
4.input Output Operations
18 pages
Attachment 1
No ratings yet
Attachment 1
9 pages
ECE128 - LAB Assignment Chapter 1
No ratings yet
ECE128 - LAB Assignment Chapter 1
26 pages
Prelim Lecture
No ratings yet
Prelim Lecture
11 pages
GK-Kailash Satyarthi - Notes and Worksheet
No ratings yet
GK-Kailash Satyarthi - Notes and Worksheet
4 pages
Day 2 - (Managing IO and Operators)
No ratings yet
Day 2 - (Managing IO and Operators)
25 pages
Lab Work Book For C Programming
No ratings yet
Lab Work Book For C Programming
26 pages
TR Manual
No ratings yet
TR Manual
286 pages
Unit-3 (Input and Output) PDF
No ratings yet
Unit-3 (Input and Output) PDF
8 pages
Lab Manual Latest
No ratings yet
Lab Manual Latest
72 pages
The Attack and Defense of Computers
No ratings yet
The Attack and Defense of Computers
61 pages
Nimisha Final Project
No ratings yet
Nimisha Final Project
79 pages
Printf Case Study
No ratings yet
Printf Case Study
14 pages
Security Course
No ratings yet
Security Course
31 pages
Script 4 1
No ratings yet
Script 4 1
10 pages
Printf Format String Attacks
No ratings yet
Printf Format String Attacks
5 pages
More Control Hijacking Attacks: Format String Vulnerabilities
No ratings yet
More Control Hijacking Attacks: Format String Vulnerabilities
8 pages
Intro To C - Module 6
No ratings yet
Intro To C - Module 6
12 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
Printf
No ratings yet
Printf
4 pages
Formatted Output and The Printf Function
No ratings yet
Formatted Output and The Printf Function
11 pages
Theory About Data Input and Output in C Programming
No ratings yet
Theory About Data Input and Output in C Programming
6 pages
Buffer Overflow Attacks
No ratings yet
Buffer Overflow Attacks
16 pages
MBA II Sem (R19) RegulSup Results Aug-2023
No ratings yet
MBA II Sem (R19) RegulSup Results Aug-2023
40 pages
05 10 2022-3
No ratings yet
05 10 2022-3
9 pages
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
No ratings yet
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
115 pages
Format String Attack
No ratings yet
Format String Attack
20 pages
Input and Output Functions in C
No ratings yet
Input and Output Functions in C
10 pages
InductiveReasoningTest4 Questions
100% (1)
InductiveReasoningTest4 Questions
31 pages
162 CÂU TỪ VỰNG TỪ ĐỀ CÁC TRƯỜNG CHUYÊN
No ratings yet
162 CÂU TỪ VỰNG TỪ ĐỀ CÁC TRƯỜNG CHUYÊN
14 pages
LAS Biotech 8 MELC 1 Week 1
No ratings yet
LAS Biotech 8 MELC 1 Week 1
8 pages
10 Civics Ch-1 Notes
No ratings yet
10 Civics Ch-1 Notes
4 pages
MainCatalogue 2021 INT Lowres 0207
No ratings yet
MainCatalogue 2021 INT Lowres 0207
1 page
Vaishnavweekly Diary
No ratings yet
Vaishnavweekly Diary
14 pages
Kobelev Vladimir Durability of Springs
100% (1)
Kobelev Vladimir Durability of Springs
291 pages
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
No ratings yet
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
10 pages
1st Long Test PECS and SWOT
No ratings yet
1st Long Test PECS and SWOT
2 pages
Guitar Rig 4 Getting Started English
No ratings yet
Guitar Rig 4 Getting Started English
29 pages
C Programming: A Crash Course For C++ Programmers
No ratings yet
C Programming: A Crash Course For C++ Programmers
13 pages
SQL Joins Cheat Sheet
No ratings yet
SQL Joins Cheat Sheet
1 page
Mauna Kea Investigation
No ratings yet
Mauna Kea Investigation
17 pages
Questions That Need Be Answered
No ratings yet
Questions That Need Be Answered
10 pages
Grammar Quiz - Gustavo Millan
No ratings yet
Grammar Quiz - Gustavo Millan
2 pages
Contractor Monthly Performance KPI Report
No ratings yet
Contractor Monthly Performance KPI Report
1 page
Manual Library
No ratings yet
Manual Library
4 pages
Book Three Conditions of Employment
100% (1)
Book Three Conditions of Employment
14 pages
81 686 Katoomba To Scenic World Via Echo PT Loop Service 20180723
No ratings yet
81 686 Katoomba To Scenic World Via Echo PT Loop Service 20180723
4 pages
GD
No ratings yet
GD
18 pages
Board 4-CHN
100% (23)
Board 4-CHN
30 pages
Extension Activities For Australia Day and Waltzing Matilda: Science Connections
No ratings yet
Extension Activities For Australia Day and Waltzing Matilda: Science Connections
2 pages

Format Strings

Uploaded by

Format Strings

Uploaded by

Format Strings

2 Format string bugs

3 Exploiting format string bugs

a warning if the conventions are not followed.

will assign 5 to cnt1 and 8 to cnt2.

3.1 Stack reads

3.2 Random access to arguments

will print “40 10 30 20”.

3.3 Arbitrary memory reads

3.4 Arbitrary memory writes

You might also like