0% found this document useful (0 votes)
71 views4 pages

High Speed VLSI Implementation of The Hyperbolic Tangent Sigmoid Function

The document describes two hardware implementations of the hyperbolic tangent activation function used in artificial neural networks. The first method uses a lookup table to approximate the function, while the second method reduces the size of the table using range addressable decoding. Hardware synthesis results show that both proposed methods perform significantly faster and use less area than other similar methods with the same level of accuracy.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views4 pages

High Speed VLSI Implementation of The Hyperbolic Tangent Sigmoid Function

The document describes two hardware implementations of the hyperbolic tangent activation function used in artificial neural networks. The first method uses a lookup table to approximate the function, while the second method reduces the size of the table using range addressable decoding. Hardware synthesis results show that both proposed methods perform significantly faster and use less area than other similar methods with the same level of accuracy.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

High Speed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function

Karl Leboeuf, Ashkan Hosseinzadeh Namin, Roberto Muscedere, Huapeng Wu, and Majid Ahmadi Department of Electrical and Computer Engineering, University of Windsor 401 Sunset Avenue, Windsor, Ontario N9B 3P4 Canada Email: {leboeu3, hossei1, rmusced, hwu, ahmadi}@uwindsor.ca Abstract
The hyperbolic tangent function is commonly used as the activation function in articial neural networks. In this work, two different hardware implementations for the hyperbolic tangent function are proposed. Both methods are based on the approximation of the function rather than calculating it, since it has an exponential nature. The rst method uses a lookup table to approximate the function, while the second method reduces the size of the table by using range addressable decoding as opposed to the classic decoding scheme. Hardware synthesis results show that proposed methods perform signicantly faster, and use less area compared to other similar methods with the same amount of error.
1 0.8 0.6 0.4

tanh(x)

0.2 0 0.2 0.4 0.6 0.8 1 8 6 4 2

Figure 1. The Hyperbolic Tangent Activation Function

1. Introduction
Articial neural networks (ANNs) are currently employed for many diverse purposes, ranging from image classication to motor control [2, 6]. Since ANN systems are computationally intensive, they require large execution times in software implementations. Hardware implementations can eliminate this issue. One of the challenges presented when designing a hardware-based ANN system is the implementation of the activation function. There are several different activation functions available including, but not limited to, the sigmoid, hyperbolic tangent, and step functions [2, 6]. An important property of the activation function is that a continuous derivative exists, which is desirable when performing backpropagation-based learning. These functions are used to threshold the output of every articial neuron; increasing the speed of the activation function will improve the entire systems performance. The hyperbolic tangent function is among the most widely used activation functions in ANNs. As it is shown in Fig. 1, this function produces a sigmoid curve, which is a curve having an S shape. Its variation is limited outside the range of (2, 2). Currently, there are several different approaches for the hardware implementation of the activation function. Piecewise linear approximation (PWL), lookup tables (LUTs), and hybrid methods have been widely used for this purpose [1, 5, 7]. With the use of current hardware synthesizers, LUTs are not only faster, but also occupy less area than piecewise linear approximation methods. In this work, range addressable lookup tables are proposed as a solution that offers advantages compared to simple LUT implementation in terms of speed and area utilization. Range addressable lookup tables (RALUTs), similar to regular LUTs, are a type of non-volatile, read-only memory. In LUTs, every output corresponds to a unique address, whereas in RALUTs every output corresponds to a range of addresses as shown in Fig. 2. This type of table was originally proposed in [4] to implement highly nonlinear, discontinuous functions, and it will be shown to be suitable for implementing the hyperbolic tangent activation function. Depending on the desired accuracy, ranges of inputs will have the same output, which could be implemented more efciently using RALUTs rather than a regular LUT.

In this work we initially present a LUT implementation for the hyperbolic tangent function. Hardware implementation results show that a LUT implementation was significantly faster and smaller than a recently published PWL approximation method, while possessing the same level of accuracy. RALUT implementations were presented afterwards, which yielded even further improvements over LUTs in area and speed. Hardware designs were implemented using a digital CMOS 0.18m process; the same technology node used by the PWL implementation used in our comparison. The rest of this paper is organized as follows. Section 2 briey reviews previous work on hyperbolic tangent function implementation. Section 3 discusses a LUT-based approach, while section 4 examines the RALUT approach to implement the activation function. In section 5, a complexity comparison between several different methods is presented. Finally, section 6 discusses some concluding remarks.

linear (PWL) approximation, and hybrid methods, which are essentially a combination of the former two. Following is a brief overview of each of these methods.

2.1. Piecewise Linear Approximation


Piecewise linear schemes use a series of linear segments to approximate a function [1]. The number and location of these segments are chosen such that error, processing time, and area are minimized. This approach usually requires several clock cycles and the use of multipliers, which are expensive in terms of area. A piecewise linear approximation of the hyperbolic tangent function with ve segments is shown in Fig. 3.
1 0.8 0.6 0.4

tanh(x)

0.2 0 0.2 0.4 0.6 0.8 1 4 3 2 1 0 1 2 3 4

Figure 3. Piecewise Linear Approximation of tanh(x) with Five Segments

Figure 2. LUT and RALUT Addressing Schemes

2.2. Lookup Table Approximation


In this method, the function is approximated with a limited number of points [5]. The points are uniformly distributed across the entire input range. There is a direct relation between the number of bits used to represent the address (input) and output, and as such, care must be taken to ensure enough are used to minimize the error. A LUT approximation of the hyperbolic tangent function with eight points (a three bit input representation) is shown in Fig. 4.

2. A Brief Review of Different Hyperbolic Tangent Function Implementations


Efcient implementation of the activation function is an important part of designing an ANN system in hardware. The activation function is typically unsuitable for direct implementation since it is formed of an innite exponential series. In practice, approximations of the function are used, as opposed to the function itself. Currently, there are three main approaches used to approximate and implement the hyperbolic tangent function in hardware; lookup table (LUT) approximation, piece-wise

2.3. Hybrid Methods


Hybrid methods use a combination of look-up tables and other hardware to generate the result of a function [7]. They typically take several clock cycles, however they do not employ multipliers, which signicantly increases their speed.

1 0.8 0.6 0.4

4. Range Addressable Lookup Table Implementation of the Hyperbolic Tangent Function

tanh(x)

0.2 0 0.2 0.4 0.6 0.8 1 4 3 2 1 0 1 2 3 4 1 0.8 0.6 0.4

tanh(x)

0.2 0 0.2 0.4 0.6 0.8 1 4 3 2 1 0 1 2 3 4

Figure 4. Lookup Table Approximation of tanh(x) with Eight Points

3. Lookup Table Implementation of the Hyperbolic Tangent Function

Figure 5. Range Addressable Lookup Table Approximation of tanh(x) with Eight Points A range addressable lookup table, originally proposed in [4] to accurately approximate non-linear, discontinuous functions, shares many aspects with the classic LUT with a few notable differences. In LUTs, every data point stored by the table corresponds to a unique address. In RALUTs, every data point corresponds to a range of addresses. This alternate addressing approach allows for a large reduction in data points, particularly in situations where the output remains constant over a range. An example of this is the hyperbolic tangent function, where the output changes only slightly outside the range of (2, 2). Rather than store every individual point, a single point is used to represent an entire range. To implement the hyperbolic tangent function, we rst wrote MATLAB code to select the minimum number of data points, while keeping the maximum error beneath a specied threshold. We were able to represent the activation function with 61 points using 9 bits for the inputs and outputs, with an error below 0.04 . Using a 10 bit representation, only 127 were needed to maintain a maximum error below 0.02 . The required number of points for these levels of maximum error using classic LUTs were 512 and 1024, respectively. This large reduction in stored values is what drives the RALUT approach to achieve better results than a LUT implementation of the same function. Both sets of data points were passed on to HDL code, and the designs were synthesized with Synopsys Design Compiler using CMOS 0.18m technology. Design parameters were chosen to maximize operating speed. Implemen-

The major advantages of using a LUT is its high operating speed, particularly when compared to PWL approximation which uses multipliers in its design. The are two different ways to implement a lookup table in hardware. The rst is to use a ROM. The main drawback of this method is that no further optimization can be done after the exact input/output bit patterns are known. The second method is to use a logic synthesizer to implement the table as a purely combinational circuit. This works well because the synthesizer performs very well in optimizing away large amounts of logic. In our implementation, we generated MATLAB code to determine the number of input and output bits, as well as the output bit patterns themselves, for a table with a specied maximum error. For a maximum error of 0.04, 9 bits were used for both the input and output, whereas 10 bits were required to keep the maximum error below 0.02. Once the input/output characteristics of the table were determined, HDL code employing them was written, and a hardware design was synthesized using Synopsys Design Compiler. Virtual Silicon standard library cells for a TSMC CMOS 0.18m process were used for this design, and synthesis parameters were chosen to maximize operating speed. Hardware implementation results with a maximum error of 0.04 and 0.02 are summarized in the second row of tables 1 and 2 respectively.

Architectures Scheme-1 [3] Proposed-LUT Proposed-RALUT

Max-Error 0.0430 0.0365 0.0357

AVG-Error 0.0078 0.0040 0.0089

Area 32069.83 m2 9045.94 m2 7090.40 m2

Delay 903 ns 2.15 ns 1.85 ns

Area Delay 2.895 105 1.944 1011 1.311 1011

Table 1. Complexity comparison of different implementations for 0.04 maximum error Architectures Scheme-2 [3] Proposed-LUT Proposed-RALUT Max-Error 0.0220 0.0180 0.0178 AVG-Error 0.0041 0.0020 0.0057 Area 83559.17m2 17864.24 m2 11871.53 m2 Delay 1293 ns 2.45 ns 2.12 ns Area Delay 1.080 104 4.376 1011 2.516 1011

Table 2. Complexity comparison of different implementations for 0.02 maximum error tation results are shown on the last row of tables 1 and 2. this work, two different hardware implementations for this function are proposed. The rst method uses a classic LUT to approximate the function, while the second method uses a RALUT to do so. Hardware synthesis results show that proposed methods perform signicantly faster, and use less area compared to other similar methods with the same amount of error. On average, the speed was improved by 13%, while area was reduced by 26% when using the second method compared to rst in implementing a hyperbolic tangent function. The proposed designs can be used in the hardware implementation of ANNs.

5. Comparison of Different Hardware Implementations


Comparisons of hardware implementations for a maximum error of 0.04 and 0.02 are shown in tables 1 and 2. In table 1, the rst row represents results from Scheme-1, which is an isosceles triangular approximation of the hyperbolic tangent function. In table 2, the same row shows results from Scheme-2, which is a PWL approximation of the hyperbolic tangent function. Both Scheme-1 and Scheme-2 designs were implemented using CMOS 0.18m technology; the same used by our implementations. Also note that all designs accept an input in the range of (8, 8). Our proposed RALUT design was able to improve over the LUT implementation in both cases. With a maximum error of 0.04, the RALUT was 13% faster, and occupied 21.6% less area than the classic LUT approach. When the maximum error threshold was reduced to 0.02, the RALUT maintained a speed improvement of 13.5%, and area was further reduced by 33.5% compared to the LUT. As can be seen from the tables, the LUT and RALUT designs prove to be signicantly faster than the work recently presented in [3]. This is largely because our approach uses combinational logic exclusively, allowing results to appear after a single clock cycle, whereas multiple clock cycles are needed by the other designs. The area delay was calculated as a performance metric to compare the overall efciency of the designs. It is shown in the last column of tables 1 and 2.

References
[1] K. Basterretxea, J. Tarela, and I. D. Campo. Approximation of sigmoid function and the derivative for hardware implementation of articial neurons. IEE Proceedings - Circuits, Devices and Systems, 151(1):1824, February 2004. [2] S. Haykin. Neural networks : a comprehensive foundation. Prentice Hall, July, 1998. [3] C. Lin and J. Wang. A digital circuit design of hyperbolic tangent sigmoid function for neural networks. IEEE International Symposium on Circuits and Systems (ISCAS), pages 856859, May 2008. [4] R. Muscedere, V. Dimitrov, G. Jullien, and W. Miller. Efcient techniques for binary-to-multidigit multidimensional logarithmic number system conversion using rangeaddressable look-up tables. IEEE Transactions on Computers, 54(3):257271, March 2005. [5] F. Piazza, A. Uncini, and M. Zenobi. Neural networks with digital lut activation functions. IEE Proceedings - Circuits, Devices and Systems, 151(1):1824, February 2004. [6] D. Rumelhart, J. McClelland, and the PDP Research Group. Parallel Distributed Processing, Vol. 1: Foundations. The MIT Press, July, 1987. [7] S. Vassiliadis, M. Zhang, and J. Delgado-Frias. Elementary function generators for neural-network emulators. IEEE Transactions on Neural Networks, 11(6):14381449, November 2000.

6. Conclusion
The hyperbolic tangent function is commonly used as the activation function in articial neural networks. In

You might also like