Serpent
Serpent
8-bit microcontrollers
Vincent Journot
Abstract
The algorithm of Serpent [1] is one of finalists of the AES, Advanced Encryption
Standard. AES will replace DES, Data Encryption Standard. [2]
More information
• about Serpent is available at http://www.cl.cam.ac.uk/users/rja14
• about AES (and DES) is available at http://csrc.nist.gov/encryption/
aes
1
1.2 The key schedule
Serpent generates 33 128-bit subkeys from the 256-bit key. Thus, it divides the key into
eight 32-bit words w−8 ... w−1 . Then it generates 132 wi using the following recurrence:
Where
√
5+1
Φ stands for the first thirty-two bits of the fractional part of the golden ratio 2
, i.e.
0x9e3779b9 in hexadecimal.
<<< 11 stands for a rotation by 11 bits to the left.
Adding the golden ratio constant ensures an even distribution of key bits throughout
the rounds, and eliminates weak keys. It also prevents complementation, i.e. if we take
the complement of both a plaintext and a key, the result is not the complement of the
ciphertext.
We apply the same S-box to each group of four wi . In the first round, we use S-box 3,
then in the second round, S-box 2 and so on. Each of the eight S-boxes is used four times,
except for S-box 3 which is used five times.
{k0 , k1 , k2 , k3 } = S3 (w0 , w1 , w2 , w3 )
2
X 0 , X1 , X2 , X3 := Si (Bi ⊕ Ki )
X0 := X0 <<< 13
X2 := X2 <<< 3
X1 := X1 ⊕ X0 ⊕ X2
X3 := X3 ⊕ X2 ⊕ (X0 << 3)
X1 := X1 <<< 1
X3 := X3 <<< 7
X0 := X0 ⊕ X1 ⊕ X3
X2 := X2 ⊕ X3 ⊕ (X1 << 7)
X0 := X0 <<< 5
X2 := X2 <<< 22
Bi+1 := X0 , X1 , X2 , X3
The S-boxes
S-box 0
R3 ˆ= R0 R4 = R1
R1 &= R3 R4 ˆ= R2
R1 ˆ= R0 R0 —= R3
R0 ˆ= R4 R4 ˆ= R3
R3 ˆ= R2 R2 —= R1
R2 ˆ= R4 R4 =˜ R4
R4 —= R1 R1 ˆ= R3
R1 ˆ= R4 R3 —= R0
R1 ˆ= R3 R4 ˆ= R3
R3 ˆ= R0 means R3 := R3 ⊕ R0
R4 =˜ R4 means that we replace R4 by its bit-wise complement
— stands for bit-wise “or”
3
& stands for bit-wise “and”
The program:
• reorders the bits of both the key and the plaintext to have a little endian represen-
tation (done by the ORDER2 subroutine);
• calls the ENCRYPT main subroutine. This subroutine can be divided into two
parts:
– the ROUNDS subroutine which consists of four loops. Each loop runs eight
rounds.
– the ROUND32 subroutine whose role is obvious;
• reorders the bits of both the key and the plain text to have a little endian repre-
sentation (done by the ORDER2 subroutine).
• the generation of four new w4i , w4i+1 , w4i+2 , w4i+3 thanks to four calls to the SKEY
subroutine;
• a loop repeated four times which works on a quarter of the subkey and the text Bi .
This loop applies S-box S3−i mod 8 to the subkey, makes the or-exclusive with Bi
(XORB subroutine), and then applies S-box Si mod 8 ;
2.2 Performances
Though the version of Serpent which requires 70 000 machine cycles is far slower than
DES, and Triple DES, one must keep in mind that the algorithm has been designed to
take advantage of the available technology, i.e. it computes with 32-bit registers. It is
greatly slowed down by an implementation on 8-bit registers. The most striking example
is the rotation of a 32-bit word on the left by one bit. It only requires one instruction on
32-bit registers. However, in a 8-bit mode, it needs 14 instructions, as shown below, i.e.
18 machine cycles (we must add 4 cycles for the call of the subroutine)!
4
ROTL1: MACRO one
MOV one,A
RLC A
MOV A,one+3
RLC A
MOV one+3,A
MOV A,one+2
RLC A
MOV one+2,A
MOV A,one+1
RLC A
MOV one+1,A
MOV A,one
RLC A
MOV one,A
ENDM
I made some tests in order to compare Serpent with the other candidates and DES. I
used the implementations which are provided in the folder otheraes. The simulator which
was used is provided in the floppy disk — this is an unregistered version. The numbers
for DES correspond to the encryption of a 64-bit block in 10ms.
Candidat RAM used (bytes) code size (bytes) Number of machine cycles
MARS seems to be hardly implementable on 8-bit microcontrollers
RC6 77 706 58056
RIJNDAEL 54 826 3977
SERPENT 56 2021 (7130) 70339 (60300)
TWOFISH 52 2063 19819
DES 6000
TRIPLE DES 18000
Here are the different numbers1 for the five finalists’ implementations. The numbers given
for DES and Triple DES are for a 128-bit block encryption, i.e. two encryptions. Serpent
is three times as slow as Triple DES but it is much more secure. If we reduced the number
of rounds from 32 to 16, which remains secure enough, Serpent is then slightly slower
than Triple DES.
Serpent 1.3 requires 56 RAM bytes, which is nearly optimal. As a matter of fact, we
need:
5
• 16 bytes to keep the Bi block;
The last two bytes are used to store temporary values such as indexes. The program uses
very few registers. The use of the stack and the register banks would artificially reduce
the number of necessary RAM bytes — since the registers banks are not taken account
of in the calculation. This explain the slight difference between Rijndael, Serpent and
Twofish.
Remark
If you want to test serpent.asm, you will get the following result:
• plaintext = 00000000000000000000000000000000
• key = 0800000000000000000000000000000000000000000000000000000000000
• ciphertext = EC9D6557EED58E6CF89A746BBD
• plaintext = 00000000000000000000000000000000
• key = 0000000000000000000000000000000000000000000000000000000000008
• ciphertext = BD6B749AF86C8ED5EE57659DEC
6
References
[1] Ross Anderson, Eli Biham, Lars Knudsen.
“Serpent: A Proposal for the Advanced Encryption Standard.”
Available on the net at http://csrc.nist.gov/encryption/aes.