Design and Implementation of A Crypto Processor and Its Application To Security System
Design and Implementation of A Crypto Processor and Its Application To Security System
Abstract: This paper presents the design and DES and the 32-bit RISC processor. In Section 3, the
implementation of a crypto processor, a special-purpose detailed VLSI design methodology of the crypto processor
microprocessor optimized for the execution of is described. In Section 4, the simulation and verification of
cryptography algorithms. This crypto processor can be used the crypto processor design is reported. Section 5 presents
for various security applications such as storage devices, the application of the crypto processor as a means of
embedded systems, network routers, etc. The crypto providing real time data security for a storage device.
processor consists of a 32-bit RISC processor block and a Finally, concluding remarks are presented in Section 6
coprocessor block dedicated to the SEED and triple-DES
(data encryption standard) symmetric key crypto 2. The Crypto Processor Architecture
(cryptography) algorithms. The crypto processor has been 2.1 The architecture of the Crypto Processor
designed and fabricated as a single VLSI chip using 0.5 m The block diagram of our crypto processor is shown in Fig.
CMOS technology. To test and demonstrate the capabilities 1. This single chip crypto processor has a crypto controller
of this chip, a custom board providing real-time data and a dedicated crypto block for the triple-DES and SEED
security for a data storage device has been developed. algorithms. The 32-bit RISC type crypto controller controls
Testing results show that the crypto processor operates the dedicated crypto block and performs the interface
correctly at a working frequency of 30MHz and a operations with external devices such as memory and an
bandwidth of 240Mbps I/O bus interface controller. It can also execute various
crypto algorithms such as RSA and ECC and other
1. Introduction application programs such as a user authentication program
The expansion of the worldwide communication network and an IC card interface program.
such as the internet and the increased dependency on The dedicated crypto block executes encryption,
digitized information in our society makes information decryption and key scheduling operations for the SEED and
more vulnerable to abuse. If there are security problems in triple-DES algorithms. The 128-bit plain text data streams
these information systems, users will fear that their entered into the 128-bit input register are encrypted with a
sensitive information may be monitored and business proper key and control signals based on the SEED
secrets stolen. For these reasons, it is important to make algorithm. After plain text data streams are encrypted, the
information systems secure by protecting data and 128-bit cipher texts are output to the 128-bit output register.
resources from malicious acts --- crypto algorithms are the The decryption process is the same as the encryption
core of such security systems[1]. process except for the control signals. For the DES
By encoding a message using crypto algorithms, users algorithm, 64-bit plain text data streams and 64-bit key
can make information transmitted over communication values with 8-bit parity bits are necessary for encryption
systems almost impossible to read, even if such information and decryption. Our crypto processor supports four
is intercepted for malicious purposes. It is fairly easy to operation modes: ECB(Electronic CodeBook), CBC(Cipher
implement crypto algorithms in software, but such Block Chaining), OFB(Output FeedBack) and CFB(Cipher
algorithms are typically too slow for real-time applications, FeedBack) for the SEED and triple-DES algorithms.
such as storage devices, embedded systems, network
routers, etc. For this reason, it becomes necessary to
implement crypto algorithms in hardware. In our crypto
processor implementation, the dedicated crypto block of the
crypto processor permits fast execution of encryption,
decryption, and key scheduling operations for triple-
DES[14,12] and SEED[13] private key crypto algorithms.
Also, the 32-bit RISC processor block can execute other
crypto algorithms such as RSA and ECC (the Elliptic Curve
Cryptography algorithm) and control the dedicated crypto
block and I/O buffers.
This paper is organized as follows. In Section 2, the
architecture of the crypto processor is briefly described; this
includes the dedicated crypto block for SEED and triple- Figure 1. Block diagram of the Crypto processor
2.2 The dedicated crypto block for the SEED 32-bit right halves of the data are passed to the next left
algorithm halves of the data (Li-1 = Ri-1), and the 32-bit left halves of
The SEED algorithm[13] is a block cipher that operates on the data are processed in the following manner: Ri = Li-1 ⊕
128-bit blocks of data and uses a 128-bit key. It has a 16 F(Ri-1, Ki). As shown in Fig. 4, the F function of the DES
rounded Feistel structure. A Feistel structure takes a block algorithm is composed of an expansion permutation table
of length n and divides it into two halves of length n/2, a (block E), modulo-2 addition with the i-th round key (Ki),
left and right block. It is an iterated block cipher in which substitution with the S-box, and permutation with the P
the output of the i-th round is determined from the output of table(block P). Because one round of the DES algorithm is
the (i-1)-th round[11]. The SEED algorithm uses two 8 X 8 simpler than the SEED algorithm, we have made 4 rounds
S-boxes (for substitution), permutations, rotations, and of the DES algorithm executable in one clock cycle. Most
basic modulo-arithmetic operations such as modulo-2 of the latency in one round of the DES algorithm is due to
addition (exclusive OR) and modulo-232 addition. As with the S-box operation.
other Feistel ciphers, the SEED algorithm has an F function,
which takes a 64-bit data value and 64-bit key values as
shown in Fig.3.
3. The VLSI Implementation of the Crypto To validate the usability of the 32-bit RISC type
crypto controller in our crypto processor for various
processor security systems, we have implemented the ECDSA [8] and
Our crypto processor was modeled using Verilog HDL ECDH [6] protocols. The ECC algorithm we have
(Hardware Description Language) and implemented as an implemented is defined over the field GF(2163), which is a
ASIC chip. Modeling the processor using Verilog HDL SEC-2 recommendation [7], with this field being defined by
facilitates quick prototyping and modification of the target the field polynomial F(x) = x163 + x7 + x6 + x3 + 1. The
design while considering various possible trade-offs in timing results are shown in Table 3. As shown in Table 3,
different implementations of the crypto algorithms with most of the latency was due to the scalar multiplications kG
differing speed and area characteristics. Next, the crypto in Algorithm 1. The latency of the ECDSA signature
processor’s HDL model was simulated using ModelSim verification algorithm is asymptotically twice the latency of
HDL compiler and simulator [9]. Then, Synopsys Design the signature generation algorithm. The latencies of the
Analyzer and Compiler [12] was used to synthesize the modular reduction and inversion processes are also
HDL models into gate level designs, and the SDF files were negligible when compared to scalar multiplication.
simulated using Cadence’s SimWave [5]. Because the SDF We have also implemented the ECDH key agreement
file includes fairly accurate delay and load information, the protocol for the crypto controller. To obtain a common key
simulation results are comparable to actual measurement for the two participants Alice and Bob, Alice secretly
results after the circuit is fabricated in silicon. The target chooses a random integer kA and computes the factor kAG,
process technology is Hynix’s 0.5 m CMOS technology. which she sends to Bob. Likewise, Bob secretly chooses a
random integer kB, computes kB G, and sends it to Alice.
4. The Simulation and Verification of the The common key is P = kB kB G. As shown in Table 3, the
Crypto Processor performance of the crypto controller in the crypto processor
Simulation was used to validate the Verilog HDL model of is suitable for embedded system applications, where high
the crypto processor. After validation, the HDL model was flexibility and performance are a must.
synthesized into a gate level design with a target CMOS Algorithm 1. ECDSA Signature Generation Algorithm
process technology library To sign a message m, a signer A does the following:
Static timing analysis is, however, required in Select a random integer k from [1, n – 1 ]
combination with formal verification to achieve complete Compute kG = (x1, y1) and r = x1 mod n
ASIC verification. Thus, we have also performed static Compute k-1 mod n
timing analysis from the SDF files. After simulation and Compute e = SHA-1 (m)
verification of our design, we have layed out and fabricated Compute s = k-1{e+dr} mod n
the crypto processor using is Hynix’s 0.5 m CMOS If s = 0 then go to step 1
technology. Fig. 6 shows a photograph of the crypto A’s signature for the message m is (r,s)
Where, G is a base point on E(GF(2m)).
processor, and Table 2 summarizes the main features of the
d is a random integer from [1, n – 1] and A’s private key.
crypto processor. Note that a photograph of the layout is not
presented as the circuit was synthesized using a standard
Table 3 : Performance of the ECDSA and ECDH
cell library.
algorithms when executed on the crypto controller.
Method Timing
Scalar Multiplication 1.004 sec
ECDSA signature generation 1.032 sec
ECDSA signature generation 2.255 sec
EC Diffie-Hellman 1.920 sec
SHA-1(for 163bit data size) 11.24 sec
Figure 6: Photograph of the crypto processor.
5. A Crypto Processor Application: Real-time 6. Concluding Remarks
Data Security for a Storage Device In this paper, we have presented the design and
To evaluate the usability of the crypto processor, we have implementation of a crypto processor composed of a 32-bit
developed an RTDS (Real Time Data Security) system for RISC processor and a coprocessor block dedicated to the
storage devices. The RTDS system is composed of control triple-DES and SEED algorithms. The dedicated block of
and monitoring software with a GUI(Graphical User the crypto processor accelerates private key crypto
Interface) environment, a device driver, and an RTDS algorithms and the programmability of the crypto controller
board. Fig.7 shows the block diagram of the RTDS system, makes possible fast execution of various crypto algorithms
and Fig.8 shows a photograph of the RTDS board with the (such as RSA, ECC, etc.) and security applications. The
crypto processor. The main operations of the RTDS system crypto processor was implemented as an ASIC chip using
are described as follows. Hynix’s 0.5 m CMOS technology. Simulations, formal
A user process wants to write data into the secure verification, and static timing analysis were used to fully
area of a hard disk (a) verify the ASIC design before fabrication. The fabricated
The CPU reads data form a certain area of the chip was found to have a 30MHz operating frequency and a
memory and sends it to the hard disk via the I/O data rate of 240Mbps for all modes of operation (ECB,
bus (b). CBC, OFB, CFB) of the SEED algorithm. The crypto
The device driver, which is a part of a RTDS processor was evaluated by constructing an RTDS (Real-
system, catches the hard disk write event, and Time Data Security) system for storage devices. This
forwards data to the crypto processor (c). application board was used to thoroughly test and verify the
In the crypto processor, an encryption task is functionality of the crypto processor. The crypto processor
performed in real-time (d). in the RTDS system performs data encryption and
The crypto processor, which has completed its decryption in real-time. The high performance and high
encryption task, sends the encrypted data to the flexibility of the crypto processor design makes it
hard disk(e). applicable to various security applications such as storage
The hard disk receives the encrypted data and devices, embedded systems, network routers, firewalls, etc.
completes the write procedure (f).
References
[1] Paul C. van Oorschot Alfred J. Menezes and Scott A.
Vanstone, Handbook of applied cryptography, CRC
press Inc., Florida, 1996.
[2] Analog Devices, VMS115 IPSec Coprocessor Data
Sheet, Rev. 2.0, January 1999.
[3] ARM corp., ARM7 Data Sheet, 1996.
[4] H.B. Bakoglue, Circuits, interconnects, and packaging
for VLSI, Addison-Wesley Publishers Ltd., 1990.
[5] Cadence Corp., SimWave, may 1999.
[6] Certicom Corp., SEC 1: Elliptic curve cryptography,
September 2000.
Figure 7: Block diagram of the Real Time Data Security [7] Certicom Corp., SEC 2: Recommendation elliptic curve
System for storage devices. domain parameters, September 2000.
[8] Don B. Johnson, Alfred J.Menezes,and Scott Vanstone,
The RTDS board, shown in Fig. 8, is mainly Elliptic curve digital signature algorithm(ECDSA),
composed of a PCI interface controller, an SRAM buffer, available at http://www.certicom.com
an IC card interface controller, and a crypto processor. An [9] Modeltech Corp., Modelsim Compiler, May 1999.
Altera FPGA chip is used for the PCI interface controller, [10] National Institute of Standards and Technology, FIPS
and the ASIC chip, located in the right upper part of the publication 46-2: Data Encryption Standard, MD,
board, is the crypto processor. The performance of the USA, December 1993.
crypto processor and the PCI interface controller is high --- [11] Bruce Schneier, Applied cryptography(2nd ed. ), John
240 Mbps and 1056 Mbps, respectively --- and the average Wiley and Sons, Inc., New York, 1996.
access time of the hard disk (a Quantum FireBall 15 device) [12] Synopsys Corp., Design Compiler Reference Manual,
is low --- 12 msec in our system. Therefore, the RTDS February 1998.
system operates in real-time. [13] TTA, 128-bit Symmetric Block Cipher(SEED),
Telecommunications Technology Association(TTA), Seoul,
Korea, June 1999.