0% found this document useful (0 votes)
17 views4 pages

Design and Implementation of A Crypto Processor and Its Application To Security System

Uploaded by

_Hoai Nam_
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Design and Implementation of A Crypto Processor and Its Application To Security System

Uploaded by

_Hoai Nam_
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Design and Implementation of a Crypto Processor

and Its Application to Security System


HoWon Kim1 , YongJe Choi1 and MooSeop Kim1
1
Department of Information Security Basic,
Electronics and Telecommunications Research Institute(ETRI)
161 Gajeong-Dong YuSeong-Gu, DaeJeon, 305-350, KOREA
Tel : +82-42-860-6228, / FAX : +82-42-860-5611
e-mail :[email protected]

Abstract: This paper presents the design and DES and the 32-bit RISC processor. In Section 3, the
implementation of a crypto processor, a special-purpose detailed VLSI design methodology of the crypto processor
microprocessor optimized for the execution of is described. In Section 4, the simulation and verification of
cryptography algorithms. This crypto processor can be used the crypto processor design is reported. Section 5 presents
for various security applications such as storage devices, the application of the crypto processor as a means of
embedded systems, network routers, etc. The crypto providing real time data security for a storage device.
processor consists of a 32-bit RISC processor block and a Finally, concluding remarks are presented in Section 6
coprocessor block dedicated to the SEED and triple-DES
(data encryption standard) symmetric key crypto 2. The Crypto Processor Architecture
(cryptography) algorithms. The crypto processor has been 2.1 The architecture of the Crypto Processor
designed and fabricated as a single VLSI chip using 0.5 m The block diagram of our crypto processor is shown in Fig.
CMOS technology. To test and demonstrate the capabilities 1. This single chip crypto processor has a crypto controller
of this chip, a custom board providing real-time data and a dedicated crypto block for the triple-DES and SEED
security for a data storage device has been developed. algorithms. The 32-bit RISC type crypto controller controls
Testing results show that the crypto processor operates the dedicated crypto block and performs the interface
correctly at a working frequency of 30MHz and a operations with external devices such as memory and an
bandwidth of 240Mbps I/O bus interface controller. It can also execute various
crypto algorithms such as RSA and ECC and other
1. Introduction application programs such as a user authentication program
The expansion of the worldwide communication network and an IC card interface program.
such as the internet and the increased dependency on The dedicated crypto block executes encryption,
digitized information in our society makes information decryption and key scheduling operations for the SEED and
more vulnerable to abuse. If there are security problems in triple-DES algorithms. The 128-bit plain text data streams
these information systems, users will fear that their entered into the 128-bit input register are encrypted with a
sensitive information may be monitored and business proper key and control signals based on the SEED
secrets stolen. For these reasons, it is important to make algorithm. After plain text data streams are encrypted, the
information systems secure by protecting data and 128-bit cipher texts are output to the 128-bit output register.
resources from malicious acts --- crypto algorithms are the The decryption process is the same as the encryption
core of such security systems[1]. process except for the control signals. For the DES
By encoding a message using crypto algorithms, users algorithm, 64-bit plain text data streams and 64-bit key
can make information transmitted over communication values with 8-bit parity bits are necessary for encryption
systems almost impossible to read, even if such information and decryption. Our crypto processor supports four
is intercepted for malicious purposes. It is fairly easy to operation modes: ECB(Electronic CodeBook), CBC(Cipher
implement crypto algorithms in software, but such Block Chaining), OFB(Output FeedBack) and CFB(Cipher
algorithms are typically too slow for real-time applications, FeedBack) for the SEED and triple-DES algorithms.
such as storage devices, embedded systems, network
routers, etc. For this reason, it becomes necessary to
implement crypto algorithms in hardware. In our crypto
processor implementation, the dedicated crypto block of the
crypto processor permits fast execution of encryption,
decryption, and key scheduling operations for triple-
DES[14,12] and SEED[13] private key crypto algorithms.
Also, the 32-bit RISC processor block can execute other
crypto algorithms such as RSA and ECC (the Elliptic Curve
Cryptography algorithm) and control the dedicated crypto
block and I/O buffers.
This paper is organized as follows. In Section 2, the
architecture of the crypto processor is briefly described; this 
includes the dedicated crypto block for SEED and triple- Figure 1. Block diagram of the Crypto processor
2.2 The dedicated crypto block for the SEED 32-bit right halves of the data are passed to the next left
algorithm halves of the data (Li-1 = Ri-1), and the 32-bit left halves of
The SEED algorithm[13] is a block cipher that operates on the data are processed in the following manner: Ri = Li-1 ⊕
128-bit blocks of data and uses a 128-bit key. It has a 16 F(Ri-1, Ki). As shown in Fig. 4, the F function of the DES
rounded Feistel structure. A Feistel structure takes a block algorithm is composed of an expansion permutation table
of length n and divides it into two halves of length n/2, a (block E), modulo-2 addition with the i-th round key (Ki),
left and right block. It is an iterated block cipher in which substitution with the S-box, and permutation with the P
the output of the i-th round is determined from the output of table(block P). Because one round of the DES algorithm is
the (i-1)-th round[11]. The SEED algorithm uses two 8 X 8 simpler than the SEED algorithm, we have made 4 rounds
S-boxes (for substitution), permutations, rotations, and of the DES algorithm executable in one clock cycle. Most
basic modulo-arithmetic operations such as modulo-2 of the latency in one round of the DES algorithm is due to
addition (exclusive OR) and modulo-232 addition. As with the S-box operation.
other Feistel ciphers, the SEED algorithm has an F function,
which takes a 64-bit data value and 64-bit key values as
shown in Fig.3.

Figure 4. One round of the DES algorithm

2.4 The 32-bit RISC processor block


The block diagram of the 32-bit RISC type crypto
Figure 3. Block diagram of the F Function of the SEED controller is shown in Fig.5 [3]. This controller controls the
algorithm operation of the dedicated crypto block during encryption,
decryption and key scheduling, and also performs the
To implement the SEED algorithm, we have operations required to interface with external devices such
instantiated one stage and iterated the data through this as the input FIFO, output FIFO, memory, and system I/O
stage 16 times. We could also have 16 or more pipeline bus(address and data bus). Since the crypto controller block
stages. But in this case, we would have had high is fully programmable, it can execute various crypto
performance in a non-feedback mode such as ECB, but no algorithms, protocols and application programs with a high
performance gains and much excessive hardware degree of freedom. The crypto controller is a 32-bit
redundancy for feedback modes such as CBC, OFB, and processor with a RISC architecture and a 3-stage pipeline.
CFB. Because we wanted to design a crypto processor with It has features (such as a barrel shifter, a Booth multiplier
equally high performance for various modes, we have block, register file, and a 16-bit and 32-bit data memory
selected this iterated method. The key values for encryption architecture) that enable it to achieve high performance and
and decryption are pre-computed and stored in internal savings in memory when executing crypto algorithms.
buffers. These stored key values are used for encryption or The codes for crypto controller generate the control
decryption of the data sequences that follow. signals for a dedicated crypto block based on a memory-
mapped method. The crypto controller generates control
2.3 The dedicated crypto block for a triple-DES signals for the key and initial vector (which are required to
algorithm execute the SEED and triple-DES algorithms), an algorithm
DES(Data Encryption Standard) [10], an encryption selection signal, and a mode selection signal. It also
algorithm developed in the 1970’s by the National Bureau performs other miscellaneous tasks such as done signal
of Standards and IBM Corporation, uses a 56-bit key. In the generation for the encryption or decryption operations.
DES algorithm, there are 16 rounds of identical operations Then, when the plain text data becomes available, the
such as non-linear substitutions and permutations. In each dedicated crypto block receives the data and encrypts it
round, 48-bit subkeys are generated, and substitutions using with a proper mode and algorithm. When the encryption
S-box, bitwise shift, and XOR (exclusive-OR) operations operations are done, the encrypted cipher texts are output to
are performed. The 56-bit key length is relatively small by an output register and the corresponding control signals are
today’s standards. For increased security, the DES operation set. Our crypto controller is fully compatible with
can be performed three consecutive times, which expands ARM7TM [3] and described using Verilog HDL.
the effective key length to 112 bits [11]. Using DES in this
manner is referred to as triple-DES.
Fig. 4 shows one round of the DES algorithm. The left
and right halves of each 64-bit input data operand are
treated as separate 32-bit data operands, Li-1 and Ri-1. The
Table 2 : Main features of the crypto processor.
Technology 0.5m CMOS
Package Type PQFP
Gate Counts 200K(with I/O PADS)
Chip Size 8.1mm X 8.1mm
240Mbps(SEED),
Bandwidth
160Mbps(triple-DES)
Operating Frequency 30MHz
The number of I/O pins 176 pins
VDD and VSS 5V and 0V
Figure 5. Block diagram of the 32-bit RISC controller block

3. The VLSI Implementation of the Crypto To validate the usability of the 32-bit RISC type
crypto controller in our crypto processor for various
processor security systems, we have implemented the ECDSA [8] and
Our crypto processor was modeled using Verilog HDL ECDH [6] protocols. The ECC algorithm we have
(Hardware Description Language) and implemented as an implemented is defined over the field GF(2163), which is a
ASIC chip. Modeling the processor using Verilog HDL SEC-2 recommendation [7], with this field being defined by
facilitates quick prototyping and modification of the target the field polynomial F(x) = x163 + x7 + x6 + x3 + 1. The
design while considering various possible trade-offs in timing results are shown in Table 3. As shown in Table 3,
different implementations of the crypto algorithms with most of the latency was due to the scalar multiplications kG
differing speed and area characteristics. Next, the crypto in Algorithm 1. The latency of the ECDSA signature
processor’s HDL model was simulated using ModelSim verification algorithm is asymptotically twice the latency of
HDL compiler and simulator [9]. Then, Synopsys Design the signature generation algorithm. The latencies of the
Analyzer and Compiler [12] was used to synthesize the modular reduction and inversion processes are also
HDL models into gate level designs, and the SDF files were negligible when compared to scalar multiplication.
simulated using Cadence’s SimWave [5]. Because the SDF We have also implemented the ECDH key agreement
file includes fairly accurate delay and load information, the protocol for the crypto controller. To obtain a common key
simulation results are comparable to actual measurement for the two participants Alice and Bob, Alice secretly
results after the circuit is fabricated in silicon. The target chooses a random integer kA and computes the factor kAG,
process technology is Hynix’s 0.5 m CMOS technology. which she sends to Bob. Likewise, Bob secretly chooses a
random integer kB, computes kB G, and sends it to Alice.
4. The Simulation and Verification of the The common key is P = kB kB G. As shown in Table 3, the
Crypto Processor performance of the crypto controller in the crypto processor
Simulation was used to validate the Verilog HDL model of is suitable for embedded system applications, where high
the crypto processor. After validation, the HDL model was flexibility and performance are a must.
synthesized into a gate level design with a target CMOS Algorithm 1. ECDSA Signature Generation Algorithm
process technology library To sign a message m, a signer A does the following:
Static timing analysis is, however, required in Select a random integer k from [1, n – 1 ]
combination with formal verification to achieve complete Compute kG = (x1, y1) and r = x1 mod n
ASIC verification. Thus, we have also performed static Compute k-1 mod n
timing analysis from the SDF files. After simulation and Compute e = SHA-1 (m)
verification of our design, we have layed out and fabricated Compute s = k-1{e+dr} mod n
the crypto processor using is Hynix’s 0.5 m CMOS If s = 0 then go to step 1
technology. Fig. 6 shows a photograph of the crypto A’s signature for the message m is (r,s)
Where, G is a base point on E(GF(2m)).
processor, and Table 2 summarizes the main features of the
d is a random integer from [1, n – 1] and A’s private key.
crypto processor. Note that a photograph of the layout is not
presented as the circuit was synthesized using a standard
Table 3 : Performance of the ECDSA and ECDH
cell library.
algorithms when executed on the crypto controller.
Method Timing
Scalar Multiplication 1.004 sec
ECDSA signature generation 1.032 sec
ECDSA signature generation 2.255 sec
EC Diffie-Hellman 1.920 sec
SHA-1(for 163bit data size) 11.24 sec
Figure 6: Photograph of the crypto processor.
5. A Crypto Processor Application: Real-time 6. Concluding Remarks
Data Security for a Storage Device In this paper, we have presented the design and
To evaluate the usability of the crypto processor, we have implementation of a crypto processor composed of a 32-bit
developed an RTDS (Real Time Data Security) system for RISC processor and a coprocessor block dedicated to the
storage devices. The RTDS system is composed of control triple-DES and SEED algorithms. The dedicated block of
and monitoring software with a GUI(Graphical User the crypto processor accelerates private key crypto
Interface) environment, a device driver, and an RTDS algorithms and the programmability of the crypto controller
board. Fig.7 shows the block diagram of the RTDS system, makes possible fast execution of various crypto algorithms
and Fig.8 shows a photograph of the RTDS board with the (such as RSA, ECC, etc.) and security applications. The
crypto processor. The main operations of the RTDS system crypto processor was implemented as an ASIC chip using
are described as follows. Hynix’s 0.5 m CMOS technology. Simulations, formal
 A user process wants to write data into the secure verification, and static timing analysis were used to fully
area of a hard disk (a) verify the ASIC design before fabrication. The fabricated
 The CPU reads data form a certain area of the chip was found to have a 30MHz operating frequency and a
memory and sends it to the hard disk via the I/O data rate of 240Mbps for all modes of operation (ECB,
bus (b). CBC, OFB, CFB) of the SEED algorithm. The crypto
 The device driver, which is a part of a RTDS processor was evaluated by constructing an RTDS (Real-
system, catches the hard disk write event, and Time Data Security) system for storage devices. This
forwards data to the crypto processor (c). application board was used to thoroughly test and verify the
 In the crypto processor, an encryption task is functionality of the crypto processor. The crypto processor
performed in real-time (d). in the RTDS system performs data encryption and
 The crypto processor, which has completed its decryption in real-time. The high performance and high
encryption task, sends the encrypted data to the flexibility of the crypto processor design makes it
hard disk(e). applicable to various security applications such as storage
 The hard disk receives the encrypted data and devices, embedded systems, network routers, firewalls, etc.
completes the write procedure (f).
References
[1] Paul C. van Oorschot Alfred J. Menezes and Scott A.
Vanstone, Handbook of applied cryptography, CRC
press Inc., Florida, 1996.
[2] Analog Devices, VMS115 IPSec Coprocessor Data
Sheet, Rev. 2.0, January 1999.
[3] ARM corp., ARM7 Data Sheet, 1996.
[4] H.B. Bakoglue, Circuits, interconnects, and packaging
for VLSI, Addison-Wesley Publishers Ltd., 1990.
[5] Cadence Corp., SimWave, may 1999.
[6] Certicom Corp., SEC 1: Elliptic curve cryptography,
September 2000.
Figure 7: Block diagram of the Real Time Data Security [7] Certicom Corp., SEC 2: Recommendation elliptic curve
System for storage devices. domain parameters, September 2000.
[8] Don B. Johnson, Alfred J.Menezes,and Scott Vanstone,
The RTDS board, shown in Fig. 8, is mainly Elliptic curve digital signature algorithm(ECDSA),
composed of a PCI interface controller, an SRAM buffer, available at http://www.certicom.com
an IC card interface controller, and a crypto processor. An [9] Modeltech Corp., Modelsim Compiler, May 1999.
Altera FPGA chip is used for the PCI interface controller, [10] National Institute of Standards and Technology, FIPS
and the ASIC chip, located in the right upper part of the publication 46-2: Data Encryption Standard, MD,
board, is the crypto processor. The performance of the USA, December 1993.
crypto processor and the PCI interface controller is high --- [11] Bruce Schneier, Applied cryptography(2nd ed. ), John
240 Mbps and 1056 Mbps, respectively --- and the average Wiley and Sons, Inc., New York, 1996.
access time of the hard disk (a Quantum FireBall 15 device) [12] Synopsys Corp., Design Compiler Reference Manual,
is low --- 12 msec in our system. Therefore, the RTDS February 1998.
system operates in real-time. [13] TTA, 128-bit Symmetric Block Cipher(SEED),
Telecommunications Technology Association(TTA), Seoul,
Korea, June 1999.

Figure 7:Photograph of the RTDS board.

You might also like