High Speed VLSI Implementation of The Hyperbolic Tangent Sigmoid Function
High Speed VLSI Implementation of The Hyperbolic Tangent Sigmoid Function
Karl Leboeuf, Ashkan Hosseinzadeh Namin, Roberto Muscedere, Huapeng Wu, and Majid Ahmadi Department of Electrical and Computer Engineering, University of Windsor 401 Sunset Avenue, Windsor, Ontario N9B 3P4 Canada Email: {leboeu3, hossei1, rmusced, hwu, ahmadi}@uwindsor.ca Abstract
The hyperbolic tangent function is commonly used as the activation function in articial neural networks. In this work, two different hardware implementations for the hyperbolic tangent function are proposed. Both methods are based on the approximation of the function rather than calculating it, since it has an exponential nature. The rst method uses a lookup table to approximate the function, while the second method reduces the size of the table by using range addressable decoding as opposed to the classic decoding scheme. Hardware synthesis results show that proposed methods perform signicantly faster, and use less area compared to other similar methods with the same amount of error.
1 0.8 0.6 0.4
tanh(x)
1. Introduction
Articial neural networks (ANNs) are currently employed for many diverse purposes, ranging from image classication to motor control [2, 6]. Since ANN systems are computationally intensive, they require large execution times in software implementations. Hardware implementations can eliminate this issue. One of the challenges presented when designing a hardware-based ANN system is the implementation of the activation function. There are several different activation functions available including, but not limited to, the sigmoid, hyperbolic tangent, and step functions [2, 6]. An important property of the activation function is that a continuous derivative exists, which is desirable when performing backpropagation-based learning. These functions are used to threshold the output of every articial neuron; increasing the speed of the activation function will improve the entire systems performance. The hyperbolic tangent function is among the most widely used activation functions in ANNs. As it is shown in Fig. 1, this function produces a sigmoid curve, which is a curve having an S shape. Its variation is limited outside the range of (2, 2). Currently, there are several different approaches for the hardware implementation of the activation function. Piecewise linear approximation (PWL), lookup tables (LUTs), and hybrid methods have been widely used for this purpose [1, 5, 7]. With the use of current hardware synthesizers, LUTs are not only faster, but also occupy less area than piecewise linear approximation methods. In this work, range addressable lookup tables are proposed as a solution that offers advantages compared to simple LUT implementation in terms of speed and area utilization. Range addressable lookup tables (RALUTs), similar to regular LUTs, are a type of non-volatile, read-only memory. In LUTs, every output corresponds to a unique address, whereas in RALUTs every output corresponds to a range of addresses as shown in Fig. 2. This type of table was originally proposed in [4] to implement highly nonlinear, discontinuous functions, and it will be shown to be suitable for implementing the hyperbolic tangent activation function. Depending on the desired accuracy, ranges of inputs will have the same output, which could be implemented more efciently using RALUTs rather than a regular LUT.
In this work we initially present a LUT implementation for the hyperbolic tangent function. Hardware implementation results show that a LUT implementation was significantly faster and smaller than a recently published PWL approximation method, while possessing the same level of accuracy. RALUT implementations were presented afterwards, which yielded even further improvements over LUTs in area and speed. Hardware designs were implemented using a digital CMOS 0.18m process; the same technology node used by the PWL implementation used in our comparison. The rest of this paper is organized as follows. Section 2 briey reviews previous work on hyperbolic tangent function implementation. Section 3 discusses a LUT-based approach, while section 4 examines the RALUT approach to implement the activation function. In section 5, a complexity comparison between several different methods is presented. Finally, section 6 discusses some concluding remarks.
linear (PWL) approximation, and hybrid methods, which are essentially a combination of the former two. Following is a brief overview of each of these methods.
tanh(x)
tanh(x)
tanh(x)
Figure 5. Range Addressable Lookup Table Approximation of tanh(x) with Eight Points A range addressable lookup table, originally proposed in [4] to accurately approximate non-linear, discontinuous functions, shares many aspects with the classic LUT with a few notable differences. In LUTs, every data point stored by the table corresponds to a unique address. In RALUTs, every data point corresponds to a range of addresses. This alternate addressing approach allows for a large reduction in data points, particularly in situations where the output remains constant over a range. An example of this is the hyperbolic tangent function, where the output changes only slightly outside the range of (2, 2). Rather than store every individual point, a single point is used to represent an entire range. To implement the hyperbolic tangent function, we rst wrote MATLAB code to select the minimum number of data points, while keeping the maximum error beneath a specied threshold. We were able to represent the activation function with 61 points using 9 bits for the inputs and outputs, with an error below 0.04 . Using a 10 bit representation, only 127 were needed to maintain a maximum error below 0.02 . The required number of points for these levels of maximum error using classic LUTs were 512 and 1024, respectively. This large reduction in stored values is what drives the RALUT approach to achieve better results than a LUT implementation of the same function. Both sets of data points were passed on to HDL code, and the designs were synthesized with Synopsys Design Compiler using CMOS 0.18m technology. Design parameters were chosen to maximize operating speed. Implemen-
The major advantages of using a LUT is its high operating speed, particularly when compared to PWL approximation which uses multipliers in its design. The are two different ways to implement a lookup table in hardware. The rst is to use a ROM. The main drawback of this method is that no further optimization can be done after the exact input/output bit patterns are known. The second method is to use a logic synthesizer to implement the table as a purely combinational circuit. This works well because the synthesizer performs very well in optimizing away large amounts of logic. In our implementation, we generated MATLAB code to determine the number of input and output bits, as well as the output bit patterns themselves, for a table with a specied maximum error. For a maximum error of 0.04, 9 bits were used for both the input and output, whereas 10 bits were required to keep the maximum error below 0.02. Once the input/output characteristics of the table were determined, HDL code employing them was written, and a hardware design was synthesized using Synopsys Design Compiler. Virtual Silicon standard library cells for a TSMC CMOS 0.18m process were used for this design, and synthesis parameters were chosen to maximize operating speed. Hardware implementation results with a maximum error of 0.04 and 0.02 are summarized in the second row of tables 1 and 2 respectively.
Table 1. Complexity comparison of different implementations for 0.04 maximum error Architectures Scheme-2 [3] Proposed-LUT Proposed-RALUT Max-Error 0.0220 0.0180 0.0178 AVG-Error 0.0041 0.0020 0.0057 Area 83559.17m2 17864.24 m2 11871.53 m2 Delay 1293 ns 2.45 ns 2.12 ns Area Delay 1.080 104 4.376 1011 2.516 1011
Table 2. Complexity comparison of different implementations for 0.02 maximum error tation results are shown on the last row of tables 1 and 2. this work, two different hardware implementations for this function are proposed. The rst method uses a classic LUT to approximate the function, while the second method uses a RALUT to do so. Hardware synthesis results show that proposed methods perform signicantly faster, and use less area compared to other similar methods with the same amount of error. On average, the speed was improved by 13%, while area was reduced by 26% when using the second method compared to rst in implementing a hyperbolic tangent function. The proposed designs can be used in the hardware implementation of ANNs.
References
[1] K. Basterretxea, J. Tarela, and I. D. Campo. Approximation of sigmoid function and the derivative for hardware implementation of articial neurons. IEE Proceedings - Circuits, Devices and Systems, 151(1):1824, February 2004. [2] S. Haykin. Neural networks : a comprehensive foundation. Prentice Hall, July, 1998. [3] C. Lin and J. Wang. A digital circuit design of hyperbolic tangent sigmoid function for neural networks. IEEE International Symposium on Circuits and Systems (ISCAS), pages 856859, May 2008. [4] R. Muscedere, V. Dimitrov, G. Jullien, and W. Miller. Efcient techniques for binary-to-multidigit multidimensional logarithmic number system conversion using rangeaddressable look-up tables. IEEE Transactions on Computers, 54(3):257271, March 2005. [5] F. Piazza, A. Uncini, and M. Zenobi. Neural networks with digital lut activation functions. IEE Proceedings - Circuits, Devices and Systems, 151(1):1824, February 2004. [6] D. Rumelhart, J. McClelland, and the PDP Research Group. Parallel Distributed Processing, Vol. 1: Foundations. The MIT Press, July, 1987. [7] S. Vassiliadis, M. Zhang, and J. Delgado-Frias. Elementary function generators for neural-network emulators. IEEE Transactions on Neural Networks, 11(6):14381449, November 2000.
6. Conclusion
The hyperbolic tangent function is commonly used as the activation function in articial neural networks. In