Fast Collision Attack On MD5
Fast Collision Attack On MD5
??
Tao Xie1,2 , Fanbao Liu2 , Dengguo Feng3
1
The Center for Soft-Computing and Cryptology, NUDT, Changsha, China
2
School of Computer, NUDT, Changsha, 410073, Hunan, China
3
State Key Lab of Information Security, Chinese Academy of Sciences, Beijing, 100190, China
[email protected]
[email protected]†
Abstract. We presented the first single block collision attack on MD5 with complexity of 247 MD5 com-
pressions and posted the challenge for another completely new one in 2010. Last year, Stevens presented a
single block collision attack to our challenge, with complexity of 250 MD5 compressions. We really appreciate
Stevens’s hard work. However, it is a pity that he had not found even a better solution than our original one,
let alone a completely new one and the very optimal solution that we preserved and have been hoping that
someone can find it, whose collision complexity is about 241 MD5 compressions. In this paper, we propose a
method how to choose the optimal input difference for generating MD5 collision pairs. First, we divide the
sufficient conditions into two classes: strong conditions and weak conditions, by the degree of difficulty for
condition satisfaction. Second, we prove that there exist strong conditions in only 24 steps (one and a half
rounds) under specific conditions, by utilizing the weaknesses of compression functions of MD5, which are
difference inheriting and message expanding. Third, there should be no difference scaling after state word q25
so that it can result in the least number of strong conditions in each differential path, in such a way we deduce
the distribution of strong conditions for each input difference pattern. Finally, we choose the input difference
with the least number of strong conditions and the most number of free message words. We implement the
most efficient 2-block MD5 collision attack, which needs only about 218 MD5 compressions to find a collision
pair, and show a single-block collision attack with complexity 241 .
Keywords: Hash Function; MD5 Differential Cryptanalysis; Collision Attack; Single-Block Collision
1 Introduction
Hash function, mapping input message with arbitrary lengths to fixed lengths output, is an one-way cryptographic
primitive. Hash functions are mainly used to generate digital fingerprint, and widely applied in the area of Random
Number Generation (RNG), message integrity check, password shadow, challenge-and-response, Message Authen-
tication Code (MAC), digital signature, digital certification, et al.
The most widely used hash functions are MD4 family iterated hash functions [6, 1], derived from MD4 [9]
designed by Rivest in 1990. The family includes MD4, MD5 [10], SHA [7, 3] and SHA-2 [8], et al. The first one used
in practical is MD5 [10], designed as the strengthened version of MD4 by Rivest in 1992.
We presented the first single block collision attack on MD5 with complexity of 247 MD5 compressions with no
details disclosed, and posted the challenge for another completely new one in 2010 [14]. In 2012, Stevens presented
a single block collision attack to answer our challenge, with complexity of 250 MD5 compressions [12]. The input
difference pattern of Stevens’s can be easily derived from ours.
In this paper, we propose a method how to choose the optimal input difference for generating MD5 collision
pairs. First, we divide the sufficient conditions into two classes: strong condition and weak condition, according to
the degree of difficulty for condition satisfaction. Second, we prove that there exist strong conditions in only 24 steps
(one and a half rounds) under specific conditions, by utilizing the weaknesses of compression functions of MD5,
which are difference inheriting and message expanding. Third, there should be no difference scaling after state word
q25 so that it can result in the least number of strong conditions for each differential path, in such a way we deduce
the distribution of strong conditions for each input difference pattern. Finally, we choose the input difference with
the least number of strong conditions and the most number of free message words. We further apply the divide-and-
conquer strategy to cut the MD5 collision searching into stages, to make the relations of all stages’ complexity to
be additive instead of multiplicative. We also propose a scheme named group satisfaction —to determinately satisfy
the strong conditions of the first three steps in the last tunnel under the divide-and-conquer strategy, and randomly
satisfy other strong conditions using the rest of free bits of the tunnel, so as to greatly reduce the complexity of
†
Corresponding author.
MD5 collision searching. Hence, we should construct differential paths with the most number of free bits to support
the divide-and-conquer strategy and tunnel technique. The details of such method will be appeared in our full paper
[15]. Applying the above methods, we implement the most efficient MD5 collision attack, which only needs about
218 MD5 compressions to find a collision pair. These methods are also applicable to other hash functions with MD
(Merkle-Damgård) construction.
We also show how to find right input differences for single block collision attack on MD5. Moreover, we compare
Stevens’ work [12] to ours and we find that his response may not achieve our original target of the challenge, and
that is why we have decided to give him a half of the award.
2 Preliminaries
MD5 [10] is a typical Merkle-Damgård structure hash function, it takes a variable-length message M as an input
and outputs a 128-bit hash value M D5(M ).
The input message M should be pre-processed before being hashed, which is divided into the following three
stages:
1. M is padded with padding bits (a ‘1’ followed by several ‘0‘s to 448 mod 512) and the length of M with 64
bits, to the exact multiples of 512 bits.
2. The padded M 0 is divided into chunks of 512-bit blocks (M0 , M1 , . . . , M(|M 0 |/512−1) ).
3. Each block Mi is further divided into sixteen 32-bit words (m0 , m1 , . . . , m15 ).
Compression Function of MD5. Each block is processed by MD5 compression function (CF ). CF takes Mi
and a 128-bit chaining variable Hi as input, and outputs Hi+1 . The initiate chaining variable H0 is set to certain
constants, a0 = 0x67452301, b0 = 0xefcdab89, c0 = 0x98badcfe, d0 = 0x10325476. The iterated procedure of
MD5 algorithm is shown as follows, where Hn is the exact M D5(M ).
CF consists of 64 steps. Steps 1-16, steps 17-32, steps 33-48 and steps 49-64 are called round r1 , r2 , r3 and
r4 , respectively. Let qi (1 ≤ i ≤ 64) represent the 32-bit state of step i, and qi, j stand for the value of the j-th
(j (0 ≤ j ≤ 31)) bit of qi . With initiated chaining variables q−3 = a0 , q0 = b0 , q−1 = c0 , q−2 = d0 , qi (1 ≤ i ≤ 64) is
updated in (2).
qi = qi−1 + (qi−4 + fi (qi−1 , qi−2 , qi−3 ) + wi + ti )≪si (2)
Each state word qi uses modular addition +, left rotation ≪ and round dependent Boolean function fi .
The details of fi are shown in (3).
F (B, C, D) = (B ∧ C) ∨ (¬B ∧ D), i∈r1 ,
G(B, C, D) = (B ∧ D) ∨ (C ∧ ¬D), i∈r ,
2
fi = (3)
H(B, C, D) = B ⊕ C ⊕ D, i∈r3,
I(B, C, D) = C ⊕ (B ∨ ¬D), i∈r4 .
where ⊕, ∧, ∨ and ¬ denote the logic operations XOR, AND, OR and NOT, respectively. B, C and D are 32-bit
state words.
Message word wi is one of (m0 , m1 , . . . , m15 ), the distribution of wi is called message expanding, which is shown
in (4).
mi−1 , i∈r1 ,
m
(5i−4) mod 16 , i∈r2 ,
wi = (4)
m (3i+2) mod 16 , i∈r3 ,
m7(i−1) mod 16 , i∈r4 .
2
≪si denote the left rotation of si bits, ≫ denote the corresponding right rotation. The details of the rotations
are shown in (6).
(7, 12, 17, 22), i = 1, 5, 9, 13,
(5, 9, 14, 20), i = 17, 21, 25, 29,
(si , si+1 , si+2 , si+3 ) = (6)
(4, 11, 16, 23), i = 33, 37, 41, 45,
(6, 10, 15, 21), i = 49, 53, 57, 61.
If all of the 64 steps are computed, the chaining variables are updated by adding the last four state words to
finish one call to the compression function.
3.1 Differences
Definition 1. Let z2 be the binary field, zn2 be an n-dimensional vector space over z2 , and X, X 0 ∈ zn2 . A bitwise
XOR difference (bitwise addition modulo 2) between X and X 0 is called XOR difference, denoted as 4⊕ X. A integer
subtraction modulo 2n between X and X 0 is called modular difference, denoted as ∆X. A bitwise subtraction modulo
2 difference between X and X 0 is called signed difference, denoted as 4± X.
For the sake of simplicity, let n = 10, X = 1001000101, X 0 = 0000111010, then 4⊕ X, ∆X and 4± X are
computed as (7), (8) and (9), respectively.
n−1
n
4⊕ X = X ⊕ X 0 = Xi ⊕ Xi0 = 1001111111 (7)
i=0
n−1
X n−1
X
0
∆X = (X − X ) n
mod 2 = ( i
2 Xi − 2i Xi0 ) mod 2n = 1000001011 (8)
i=0 i=0
n−1
n
4± X = (Xi − Xi0 ) = 1001 − 1 − 1 − 11 − 11 (9)
i=0
We omit those ‘0’s in the signed difference 4± X, and index the positions of the non-zero differences with their
signs (‘+’, ‘-’). For example, the signed difference 4± X = 1001 − 1 − 1 − 11 − 11 can be represented simply by
[9, −4 + 6, 2 − 3, 0 − 1].
Definition 2. The hamming weight of ω(∆X) denotes the number of non-zero difference of ∆X, and δi denote the
difference corresponding to the ith bit indexed starting from the least significant bit (LSB) of ∆X. Let νi denote the
number of signed differences of δi .
For example, for the modular difference ∆X = 20 + 25 + 229 , where ω(∆X) = 3, δ1 = 20 , δ2 = 25 and δ3 = 229 .
The values of νi can be computed as follows.
(
n − log δω(∆X) + 1 i = ω(∆X)
νi = (10)
log δi+1 − log δi i 6= ω(∆X)
Lemma 1. Let X, X 0 ∈ zn2 be a pair of n-dimensional vectors, a signed difference 4± X can determine only one
XOR difference 4⊕ X or modular difference ∆X. [2]
3
For each 4± Xi = Xi − Xi0 , the value of 4± Xi has three possibilities: 0, 1 and -1. If 4± Xi = 0, then Xi = Xi0
contributes nothing (0) for modular difference ∆X, we have 0 · 2i . If 4± Xi = 1, then Xi = 1 and Xi0 = 0, the
corresponding contribution to modular difference is 1 · 2i . Similarly, 4± Xi = −1 contributes −2i = −1 · 2i to
modular difference ∆X. Hence, we have:
n−1
n n−1
n n−1
X
4± X = (Xi − Xi0 ) = 4± Xi ⇒ 2i 4± Xi mod 2n = ∆X (12)
i=0 i=0 i=0
t
u
Lemma 2. Let X, X 0 ∈ zn2 be a pair of n-dimensional vectors, a signed difference 4± X can be determined uniquely
by a modular difference ∆X and a XOR difference 4⊕ X.
n−1
X n−1
X
0
∆X = (X − X ) mod 2 = (n i
2 Xi − 2i Xi0 ) mod 2n
i=0 i=0
n−1
X n−1
X
= 2i (Xi − Xi0 ) mod 2n = 2i (Xi − Xi0 )|Xi − Xi0 | mod 2n (13)
i=0 i=0
n−1
X
⇒ 2i 4± Xi 4⊕ Xi mod 2n = 4± X
i=0
Theorem 1. Let X, X 0 ∈ zn2 be a pair of n-dimensional vectors, a signed difference 4± X can be determined
uniquely by a modular difference ∆X and a XOR difference 4⊕ X, and vice versa. Hence, a signed difference 4± X
is equivalent to a modular difference ∆X combined with XOR difference 4⊕ X.
Definition 3. Difference scaling: If ∆qi = ∆qi0 and further ω(4± qi0 ) > ω(4± qi ), then we call (∆qi0 , 4± qi0 ) is a
scaling case of (∆qi , 4± qi ). The process from 4± qi to 4± qi0 is called a difference scaling.
Qω(∆X)
Theorem 2. ∆qi may have as many as i=1 νi − 1 ways of difference scaling.
Qω(∆q )
Proof. Each ∆qi may have i=1 i νi kinds of different signed differences. When it excludes the case of itself, ∆qi
Qω(∆X)
may have as many as i=1 νi − 1 different ways of difference scaling.
Proposition 1. The message modification technique can not be applied in state words after q26 of the second round
r2 of MD5.
Proof. If the differential path before the state word qi (i < 27) is not changed (at least keep the values of qi−1 to
qi−4 unchanged)when a message modification is applied, we call it a successful application of message modification.
The only way is to modify the free bits of input messages so that the differential path can be hold unchanged, hence
enough free bits are necessary for message modification.
The input words m0 , m1 , m4 , m5 , m6 , m9 , m10 , m11 , m14 and m15 are used in the state updating from q17
to q26 in round r2 , so these input words can not be used once more as free words after sate q26 . Otherwise, the
differential path before q26 may be destroyed and a message modification fails.
For example, considering w27 = m3 of q27 . Since m4 to m6 have been used in the second round r2 , a message
modification can not be successfully employed in q27 . Similarly, we can deduce that the rest of state words in r2
can not employ message modification technique.
4
Weak conditions and strong conditions Each sufficient condition is satisfied with probability of 1/2, by using
the random input messages. Therefore, we can divide these conditions into two classes: weak conditions and strong
conditions, in an ideal way1 , by the degree of difficulty of satisfaction.
Definition 4. The sufficient conditions before state q25 are called weak conditions, in a ideal situation. The suffi-
cient conditions after state q25 are called strong conditions.
The sufficient conditions after state q27 could be satisfied randomly, according to the limitation of applying
message modification technique. The sufficient conditions of state q25 and q26 may be satisfied determinately,
depending on the number of free bits. Hence, it is hard to some extent to satisfy the conditions after state q25 .
Proof. We have Pr(H(b, c, d) = ¬H(¬b, c, d)) = 1, and ω(231 ) = 1, applying the properties of round function H.
Hence, the following equation holds
Similarly, since Pr(H(b, c, d) = H(¬b, ¬c, d)) = 1 and Pr(H(b, c, d) = ¬H(¬b, ¬c, ¬d)) = 1, we have
Proposition 3. Four continuous state differences 231 can transfer with probability 1 in case of no input difference.
Proof. Let qj to qj+3 (j ≥ 29) be four continuous states, and their differences are 231 . By the state updating in r3 ,
if 33 ≤ i ≤ 48, we have
Proposition 4. Continuous state differences 231 can be generated and transferred with probability 1/2, by fixing
some input differences.
Proof. If all 5 sufficient conditions are satisfied, then four continuous state differences 231 can be generated with
probability 1, by Proposition 2. Since P four of them are input differences which
P can be fixed with probability 1, we
should only consider condition ω(4± qi ) = 1, with probability Pr(ω(4± qi ) = 1) = 1/2. Hence, with these
input differences pattern, four continuous state differences can be generated with probability 1/2.
Four continuous state differences 231 can transfer with probability 1 by Proposition 3.
1
We assume that before state q25 , the message modification techniques can be employed unconditionally, with sufficient
free bits.
5
The implementation strategy of practical MD5 collision attack We should just focus on choosing specific input
difference patterns and evaluating the complexity of r4 and r2 after step q25 . By counting the number of strong
conditions of these input differences, we can launch successful practical collision attacks if their number of strong
conditions are less than 64.
Table 1: The number of strong conditions and free words of some input differences
Input difference Strong conditions Free words
∆m5 = 210 26 5
∆m8 = 225 29 8
∆m11 = 221 27 11
∆m14 = 216 37 14
∆m5 = 231 30 5
∆m8 = 231 25 8
∆m11 = 231 38 11
∆m5 = 231 , ∆m11 = 231 25 5
∆m14 = 231 43 14
∆m8 = 231 , ∆m14 = 231 29 8
∆m4 = 220 , ∆m7 = 231 , ∆m13 = 231 40 4
∆m13 = 27 , ∆m0 = 231 , ∆m6 = 231 37 0
∆m6 = 28 , ∆m9 = 231 , ∆m15 = 231 35 6
∆m9 = 227 , ∆m12 = 231 , ∆m2 = 231 29 2
∆m11 = 215 , ∆m14 = 231 , ∆m4 = 231 28 4
In Crypt 2009, Stevens et al. presented a “fastest” collision attack on MD5, which needs about 216 MD5
compressions [13]. Such an attack is also based on a sufficient condition relaxing, which removes about ten sufficient
conditions on the chaining variable. In [5], the authors proved that such an attack is infeasible in practice.
Input Difference Choosing Considering both strong conditions and free words, we choose input difference
(∆M0 = (∆m8 = 231 ), ∆M1 = 0 or ∆M1 = (∆m8 = 231 )) to generate MD5 collision with two blocks. Based on
the initial chaining variables with non difference ∆H0 = 0, the first block with input difference ∆M0 is used to
generate difference of ∆H1 = (q−3 = 231 , q−2 = 231 , q−1 = 231 , q0 = 231 ). With the difference of chaining variable
∆H1 , the second block with input difference ∆M1 is used to generate ∆H2 = 0, hence an exact collision.
An overall analysis is shown as follows.
1. The first block M0 .
a. The target of M0 is to generate difference ∆H1 = (231 , 231 , 231 , 231 ). The MSB differences in r3 are inherited
from r2 , the number of strong conditions in r3 is 0, there are 16 strong conditions in total2 , the number of
strong conditions after q25 in r2 is 7. Therefore, the total number of strong conditions is 23.
b. The words before m8 are all free words, hence, there are enough free words.
2. The second block M1 .
a. If ∆H1 is the right dBB conditions, then the second block has difference ∆M1 = 0 for dBB collision
generation. There are no condition in r3 , there are 16 conditions in r4 , there are 4 strong conditions after
q25 in r2 , hence, there are 20 strong conditions in total. Since, each state word qi , (i ∈ r1 ) has only one
condition, there are enough free bits.
2
Two conditions q−2,31 = q−1,31 = q0,31 in H1 are relaxed.
6
b. If ∆H1 is not dBB conditions, then the second block has difference ∆M1 = (∆m8 = 231 ). The differential
path after q25 of such paths are the same as that of first block, hence, the number of strong conditions is
23.
Group Satisfaction Scheme Based on both of the tunnel and advanced message modification, we determinately
satisfy all conditions of the first three steps in the last tunnel or the last phase of divide-and-conquer strategy, so as
to achieve a full speed of collision searching.
Advantage. Most of the former collision searching algorithms [4, 11], intended to randomly satisfy sufficient
conditions of the last tunnel, but we satisfy all conditions of the first three steps in the last tunnel, which can greatly
increase the collision probability and improve the searching efficiency.
Algorithm Implementation We divide the searching of the first block to generate dBB conditions into five
stages, which are shown as follows.
– Stage 1: Set the conditions of q3 to q16 to be true by basic message modification, randomly set non-conditional
bits. Compute the values of m6 to m15 .
– Stage 2:
1. Set the conditions of q17 to be true, randomly set non-conditional bits. Check whether the carrier conditions of
q17 are satisfied, if not, goto step 1.
2. Set the conditions of q18 to be true, randomly set non-conditional bits. Check whether the carrier conditions
of q18 are satisfied, if not, goto step 1. Compute m6 by q18 , and recompute q7 and check its conditions, if not
satisfied, goto Stage 1.
3. Compute q19 and check its conditions, if not satisfied then apply advance message modification. Check the
carrier condition, if not satisfied, goto Stage 1.
– Stage 3:
a. Set the conditions of q20 to be true, randomly set non-conditional bits. Check whether the carrier conditions of
q20 are satisfied, if not, goto step a.
b. Compute q21 and check its conditions, if not satisfied then apply advance message modification. Check the
carrier condition, if not satisfied, goto step a.
c. Compute q22 and check its conditions, if not satisfied then apply advance message modification. Check the
carrier condition and q22,29 and q22,29 , if any is not satisfied, goto step a.
d. Compute q23 and check its conditions, if not satisfied, goto step a. Compute message words m2 to m4 , m7 to
m9 and m12 to m13 from corresponding state words in r1 .
– Stage 4:
i. Satisfy the conditions of q24 by searching q4 tunnel in brute force. If all free bits are used, goto step a.
– Stage 5:
x. Satisfy the conditions of q25 and q26 using group satisfaction scheme in q9 tunnel. If fails, goto step i.
y. Compute q27 to q64 one by one, and check its conditions, if not satisfied, goto step x.
Complexity Analysis In fact, the differential path of the first block has weak conditions as many as 110 from
q17 in r2 to the final step q64 , and there are still 39 strong conditions after the application of advanced message
modification. The dBB collision of the second block has 28 strong conditions to be satisfied randomly. However,
thanks to the divide-and-conquer strategy, the practical complexity is much lower than expected. The details are
shown as follows.
The complexity of the total five stages.
– Stage 1 will be satisfied determinately, with complexity of constant C1 ;
– Stage 2, including step 1 to 3, has 5 conditions to be randomly satisfied, with complexity less than 25 MD5
compressions;
7
– Stage 3, including step a to d, has 12 conditions to be randomly satisfied, with complexity less than 212 MD5
compressions;
– Stage 4 will be satisfied determinately, with complexity of constant C2 ;
– Stage 5, including step x to y, has 20 conditions to be randomly satisfied, with complexity about 218 MD5
compressions.
Since each stage is independent on each other, the complexity of searching the first block Cb1 is calculated by
addition, instead of a multiplication of 25+12+18 . The computation is shown as follows.
The complexity of searching the second block Cb2 can be analyzed similarly, which is about 218 MD5 compres-
sions. Therefore, the total complexity of the collision attack Ccoll with 2-block (∆M0 = (∆m8 = 231 ), ∆M1 = 0) is
computed as follows.
Ccoll = Cb1 + Cb2 = 218 + 218 ≈ 219 MD5 compressions (21)
The other six differential paths of the second block have input difference ∆M1 = (∆m8 = 231 ), and are similar
to the path of the first block and have the same number of strong conditions. Hence, the other paths’ complexity
are all 218 MD5 compressions.
Therefore, the average complexity of the collision attack with 2-block input differences (∆M0 = (∆m8 = 231 ),
∆M1 = 0 or ∆M1 = (∆m8 = 231 )) is about 219 MD5 compressions.
Further Optimization In fact, if we apply q14 tunnel and group satisfaction scheme in q26 , the complexity of MD5
collision attack can be reduced to 218 MD5 compressions, for the strong condition of q27 can also be determinately
satisfied. A collision pair is shown in Table 2
8
Table 3: Possible input differences for single-block collision attack
Input difference ∆wi Input difference ∆wi+5 #r4
m0 m5 3
m7 m12 4
m14 m3 5
m5 m10 6
m12 m1 7
m3 m8 8
m10 m15 9
m1 m6 10
m8 m13 11
m15 m4 12
m6 m11 13
m13 m2 14
m4 m9 15
Table 4: Partial differential path with optimization based on input difference ∆M = (∆m5 = 210 , ∆m10 = 231 )
i modular difference∆ signed difference4± #
3 24 31
23 ∆q23 = 2 + 2 + 2 q23 [3, 24, 31]
24 ∆q24 = 27 + 219 + 227 − 229 + 231 q24 [7, 19, 27, −29, 31]
25 ∆q25 = −22 − 25 − 210 + 222 + 231 q25 [−2, −5, −10, 22, 31] 11
26 ∆q26 = 21 − 26 + 231 q26 [1, −6, 31] 11
27 ∆q27 = 217 + 231 q27 [17, 31] 10
28 ∆q28 = 27 + 215 + 227 + 231 q28 [7, 15, 27, 31] 7
29 ∆q29 = −210 + 231 q29 [−10, 31] 4
30 ∆q30 = −215 + 231 q30 [−15, 31] 4
31 ∆q31 = −215 + 231 q31 [−15, 31] 4
32 ∆q32 = −227 + 231 q32 [27, ∗31] 1
33 ∆q33 = −227 + 231 q33 [27, ∗31] 2
34 ∆q34 = 231 q34 [∗31] 1
35 ∆q35 = 0 q35 2
36 ∆q36 = 0 q36 0
37 ∆q37 = 231 q37 [∗31] 1
38 ∆q38 = 231 q38 [∗31] 0
39 ∆q39 = 231 q39 [∗31] 0
40 ∆q40 = 231 q40 [∗31] 0
··· ··· ··· 0
48 ∆q48 = 231 q48 [ˆ31] 1
49 ∆q49 = 231 q49 [ˆ31] 1
50 ∆q50 = 231 q50 [ˆ31] 1
51 ∆q51 = 231 q51 [∗31] 1
52 ∆q52 = 0 q52 1
53 ∆q53 = 0 q53 1
54 ∆q54 = 0 q54 0
55 ∆q55 = 0 q55 0
··· ··· ··· 0
9
Table 5: Partial differential path of ∆M = (∆m5 = 210 , ∆m10 = 231 , ∆m14 = 231 )
i modular difference ∆ singed difference 4± #
4 24 31
23 ∆q23 = 2 + 2 + 2 q23 [4, 24, 31]
24 ∆q24 = 219 + 227 + 231 q24 [19, 27, 31]
25 ∆q25 = −22 − 25 − 210 + 222 + 231 q25 [−2, −5, −10, −22, 31] 11
26 ∆q26 = 21 − 26 + 218 + 231 q26 [1, −6, 18, 31] 8
27 ∆q27 = 231 q27 [31] 9
28 ∆q28 = 27 + 215 + 231 q28 [7, 15, 31] 8
29 ∆q29 = −210 + 227 + 231 q29 [−10, 27, 31] 4
30 ∆q30 = −215 + 231 q30 [−15, 31] 4
31 ∆q31 = −215 + 231 q31 [−15, 31] 4
32 ∆q32 = −227 + 231 q32 [27, ∗31] 1
33 ∆q33 = −227 + 231 q33 [27, ∗31] 2
34 ∆q34 = 231 q34 [∗31] 0
35 ∆q35 = 0 q35 2
36 ∆q36 = 0 q36 0
37 ∆q37 = 231 q37 [∗31] 1
38 ∆q38 = 231 q38 [∗31] 0
39 ∆q39 = 231 q39 [∗31] 0
40 ∆q40 = 231 q40 [∗31] 0
··· ··· ··· 0
48 ∆q48 = 231 q48 [ˆ31] 1
49 ∆q49 = 231 q49 [ˆ31] 1
50 ∆q50 = 231 q50 [ˆ31] 1
51 ∆q51 = 231 q51 [∗31] 1
52 ∆q52 = 0 q52 1
53 ∆q53 = 0 q53 1
54 ∆q54 = 0 q54 0
55 ∆q55 = 0 q55 0
··· ··· ··· 0
10
Moreover, we can insert another word difference ∆m14 = 231 into the input difference ∆M = (∆m5 =
2 , ∆m10 = 231 ), which forms ∆M = (∆m5 = 210 , ∆m10 = 231 , ∆m14 = 231 ), to get even less number of
10
strong conditions (with complexity about 241 MD5 compressions for a collision pair), whose partial differential path
is shown in Table 5. We did not publish this input difference and preserved it until someone could also find it for
the challenge.
In [12], Steven presented a collision attack by input difference ∆m8 = 225 , ∆m13 = 231 ) for our challenge,
with complexity of 250 MD5 compressions3 to generate a collision pair. We point out that the complexity can be
further reduced to 246 MD5 compressions by inserting a word difference of ∆m7 = 231 to form input difference
∆M = (∆m7 = 231 , ∆m8 = 225 , ∆m13 = 231 ), the details of partial differential path are shown in Table 6.
In fact, we really appreciate Stevens’ single block collision attack on MD5. However, it is not a completely new
one compared to our first one [14], furthermore, it has a worse complexity. It is really a pity that he had not found
the input difference we preserved with optimal complexity (∆M = (∆m5 = 210 , ∆m10 = 231 , ∆m14 = 231 )), nor
∆M = (∆m7 = 231 , ∆m8 = 225 , ∆m13 = 231 ). In the sense of above mentioned, the result presented by Stevens is
a failure to our challenge, to some extent. Considering the hardship of finding a practical single-block collision, we
have decided to give him a half of the award ($5000), which was paid in 2012.
Table 6: Partial differential path based on input difference ∆M = (∆m7 = 231 , ∆m8 = 225 , ∆m13 = 231 )
i modular difference∆ signed difference4± #
11 13 31
23 ∆q23 = 2 + 2 + 2 q23 [11, 13, 31]
24 ∆q24 = 25 + 218 + 225 + 231 q24 [5, 18, 25, 31]
25 ∆q25 = −20 + 212 + 216 − 221 + 231 q25 [−0, 12, 16, −21, 31] 11
26 ∆q26 = −26 − 227 + 231 q26 [−6, −27, 31] 11
27 ∆q27 = −26 − 220 + 225 + 231 q27 [−6, −20, 25, 31] 9
28 ∆q28 = 24 q28 [4] 8
29 ∆q29 = −216 + 220 − 225 q29 [−16, 20, −25] 4
30 ∆q30 = −24 + 220 − 225 q30 [−4, 20, −25] 4
31 ∆q31 = −24 q31 [−4] 4
32 ∆q32 = 0 q32 1
33 ∆q33 = 220 q33 [20] 2
34 ∆q34 = 220 q34 [20] 0
35 ∆q35 = 0 q35 2
36 ∆q36 = 0 q36 0
37 ∆q37 = 0 q37 0 1
38 ∆q38 = 231 q38 [∗31] 0
39 ∆q39 = 231 q39 [∗31] 0
40 ∆q40 = 231 q40 [∗31] 0
41 ∆q41 = 231 q41 [∗31] 0
··· ··· ··· 0
48 ∆q48 = 231 q48 [ˆ31] 1
49 ∆q49 = 231 q49 [ˆ31] 1
50 ∆q50 = 231 q50 [ˆ31] 1
51 ∆q51 = 231 q51 [ˆ31] 1
52 ∆q52 = 231 q52 [ˆ31] 1
53 ∆q53 = 231 q53 [ˆ31] 1
54 ∆q54 = 231 q54 [ˆ31] 1
55 ∆q55 = 231 q55 [ˆ31] 1
56 ∆q56 = 231 q56 [∗31] 1
57 ∆q57 = 0 q57 1
58 ∆q58 = 0 q58 1
59 ∆q59 = 0 q59 0
60 ∆q60 = 0 q60 0
··· ··· ··· 0
3
The complexity is computed under the condition that the tunnel of q14 is used, where the conditions of q26 can be satisfied
using group satisfying scheme.
11
5 Conclusion
In this paper, we show how to choose right input differences for MD5 collision attack, and analyze their complexities.
We answered Stevens’ challenge response for a completely new single block MD5 collision in three ways. Firstly,
Stevens’ single block MD5 collision is not a completely new one, since it can be simply derived from our original one.
Secondly, Stevens’ single block MD5 collision is much more inferior to our original one in computational complexity.
Thirdly, Stevens had not found the very optimal solution that we preserved and had been wishing that someone
could also find it, whose collision complexity is about 241 MD5 compressions. We feel sorry that Stevens had not
found even a better solution than our original one, let alone the optimal one that we preserved. However, we really
appreciate Stevens’ single block collision attack on MD5 for his hard work.
Acknowledgments
Part of this work is supported by MOST of China through the 973 program under contract 2007CB311202, and by
National Science Foundation of China through the 61070228 project.
References
1. Damgård, I.: A Design Principle for Hash Functions. In: Brassard, G. (ed.) Advances in Cryptology CRYPTO’ 89
Proceedings, Lecture Notes in Computer Science, vol. 435, pp. 416–427. Springer Berlin / Heidelberg (1990)
2. Daum, M.: Cryptanalysis of Hash Functions of the MD4-Family. Ph.D. thesis, Ruhr-Universityät of Bochum (2005)
3. Eastlake, D.E., Jones, P.: US secure hash algorithm 1 (SHA1). RFC 3174, Internet Engineering Task Force (Sep 2001),
http://www.rfc-editor.org/rfc/rfc3174.txt
4. Klima, V.: Tunnels in Hash Functions: MD5 Collisions Within a Minute. Cryptology ePrint Archive, Report 2006/105
(2006), http://eprint.iacr.org/
5. Liu, F.: A note on the fastest collision attack on md5, to appear in International Journal of Security and its Applications
in spring 2013
6. Merkle, R.: One Way Hash Functions and DES. In: Brassard, G. (ed.) Advances in Cryptology CRYPTO 89 Proceedings,
Lecture Notes in Computer Science, vol. 435, pp. 428–446. Springer Berlin / Heidelberg (1990)
7. National Institute of Standards and Technology: FIPS 180, secure hash standard, federal information processing standard
(FIPS), publication 180. Available from http://csrc.nist.gov (May 1993)
8. National Institute of Standards and Technology: FIPS 180-2, secure hash standard, federal information
processing standard (FIPS), publication 180-2. Tech. rep., DEPARTMENT OF COMMERCE (Aug 2002),
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf
9. Rivest, R.: The MD4 Message-Digest Algorithm. RFC 1320 (apr 1992), http://www.ietf.org/rfc/rfc320.txt
10. Rivest, R.: The MD5 Message-Digest Algorithm. RFC 1321 (apr 1992), http://www.ietf.org/rfc/rfc321.txt
11. Stevens, M.: On Collisions for MD5. Master’s thesis, TU Eindhoven, Faculty of Mathematics and Computer Science
(2007)
12. Stevens, M.: Single-block collision attack on md5. Cryptology ePrint Archive, Report 2012/40 (2012),
http://eprint.iacr.org/
13. Stevens, M., Sotirov, A., Appelbaum, J., Lenstra, A., Molnar, D., Osvik, D., de Weger, B.: Short chosen-prefix collisions
for md5 and the creation of a rogue ca certificate. In: Halevi, S. (ed.) Advances in Cryptology - CRYPTO 2009, Lecture
Notes in Computer Science, vol. 5677, pp. 55–69. Springer Berlin / Heidelberg (2009)
14. Xie, T., Feng, D.: Construct MD5 Collisions Using Just A Single Block Of Message. Cryptology ePrint Archive, Report
2010/643 (2010), http://eprint.iacr.org/
15. Xie, T., Liu, F., Feng, D.: Differential cryptanalysis on md5 (2013)
12