Approximation of The Sigmoid Function and Its Derivative Using A Minimax Approach
Approximation of The Sigmoid Function and Its Derivative Using A Minimax Approach
Lehigh University
Lehigh Preserve
Theses and Dissertations
2002
Recommended Citation
Schlessman, Jason, "Approximation of the sigmoid function and its derivative using a minimax approach" (2002). Theses and
Dissertations. Paper 752.
This Thesis is brought to you for free and open access by Lehigh Preserve. It has been accepted for inclusion in Theses and Dissertations by an
authorized administrator of Lehigh Preserve. For more information, please contact [email protected].
Schlessman, Jason
Approximation of
the Sigmoid
Function and Its
Derivative Using a
Minimax Approach
January 2003
APPROXIMATION OF THE SIGMOID
MINIMAX APPROACH
by
Jason Schlessman
A Thesis
of Lehigh University
Master of Science
1Il
Computer Engineering
Lehigh University
August 2002
Acknowledgments
Michael Schulte, my thesis advisor, for his constant support and motivation as well
Meghanad Wagh, my undergraduate advisor, for his inspirational teaching and com-
Terry Boult, my professor and independent research advisor, for his guidance through-
George Walters, a fellow graduate student, for providing his time, patience, and
Sharon and Warren Schlessman, my parents, for their endless support of my ecluca-
tion at Lehigh.
Megg Mass, Jim Weber, and Steve Haworth, for helping me move towards my aca-
demic: career.
Shalma Terry, for her love, support, and inspiration throughout all facets of my life.
III
Table of Contents
Acknowledgments 111
Table of Contents IV
List of Tables VI
IV
2.2 CORDIC Approximations 12
3 Proposed Technique 23
4 Results 51
5 ConcIusions 55
Bibliography 59
r v
List of Tables
3.2 Sample Maximum Minimax Error Values for the Sigmoid's Derivative 31
Vi
3.13 Sigmoid Interval Differentiating Bits for 7-9 Accurate Bits . 47
3.15 Sigmoid Derivative Interval Differentiating Bits for 7-9 Accurate Bits 49
3.16 Sigmoid Derivative Interval Differentiating Bits for 10-11 Accurate Bits 50
Vll
List of Figures
Vlll
Abstract
ing signal processing, multimedia, and neural networks. The sigmoid function and
the sigmoid function is used as a learning function and a threshold determinant for
for neural networks. As demands on neural network speed and accuracy increase,
choices for sigmoid apprximation are presented, as well as a discussion of the math-
ematical properties of the sigmoid function and its derivative. In addition, a novel
This approach was developed with reduction of maximum error and simplicity of
approximation as improtant design criteria. Designs for the sigmoid and its deriva-
tive are shown for seven to eleven fractional bits of accuracy. The designs were
For both technologies, the designs increase consistently in both area and delay with
Introduction to Function
Approximation
functions using digital hardware. It begins with a motivation for function approx-
imation. This is followed by some basic design blocks used for implementations.
\vith examples of each technique. The chapter concludes with an outline of the
1
1.1 Overview of Function Approximation
system, which upon receiving an input value x outputs a value y ~ j(x). This area
of digital system research holds potential benefits based on the application domain.
ing, including multimedia. In many cases, speed is often the optimization criterion.
applications involving mobile devices, area and power consumption are important
then remains, how should this issue of approximating necessary function values be
function approximation as a direct mapping. That is, have a specific output value
directly associated with each input value. This approach is conceived in digital
2
cases involving a small number of possible inputs, this approach is the ideal one. An
n<!vantag<' lwing that assuming a direct mapping within the lookup table, function
accuracy is optimal. Also, these tables are optimized for speed, as memory units
tend to have low delay. Needs for larger input and output word sizes, however,
are present both from a push towards single and double precision word sizes and
the sensitivity of specific applications and their algorithms. For example, some
computer graphics applications rely upon input word sizes of 24 and 32 bits and
mandate corresponding accuracy. For each bit the input word length is increased,
the size of the lookup table increases to twice its former size. As input lengths are
increased past a size of 12 bits, table sizes become very large. As considerations
of devic(' size are increasingly essential in many applications, lookup tables alone
are clearly insufficient in solving all approximation needs. Based on these reasons,
speed [2]. Regardless of optimizations employed, these devices and the algorithms
that use them tend to be slower than lookup tables~ Additionally, there is potential
for implicit error dependent upon the approximation method employed in designing
3
the arithmetic devices. By use of both lookup tables and arithmetic devices, a num-
bel' of approaches exist for resolving the conflicting design constniints demanded by
function approximation.
That is, methods which repeat themselves a specified number of cycles. One such
algorithm, which is widely used, is the CORDIC algorithm. The CORDIC algorithm
[3]. It is based on the following equations, which are solved iteratively to reach a
(1.1 )
(1.2)
(1.3)
where Si is the direction of rotation, S(m, i) is the shift sequence, and m is a coor-
4
Th(-~ CORDIe algorithm has two operational modes, rotational and vectoring, which
One advantage of CORDIC and other iterative methods is their relative sim-
plicity, lending them well to a digital implementation. The CORDIC algorithm, for
example, typically uses only an adder for calculation and memory for storing arc
tangent values. Another advantage of this algorithm is that it allows for scalable
output accuracy; for a desired level of accuracy, the method is applied iteratively
until the level is achieved. This, however, leads to a major drawback of CORDIC
and many other iterative approaches. Due to their iterative nature, for a given iter-
ation i, the (i _l)th iteration result is needed. This dramatically reduces the overall
5
1.4 Piecewise Approaches
Within this class, for a given function f(x), an interval [a, b] is selected over which
f(x) is to be approximated. Within [a, b], n subintervals [ai, bi ] are chosen along
with all approximation function fi(x) for each subinterval. The fi(X)'S are chosen
such that they approximate the portion of f (x) within [ai, bi ] more accurately than
one single approximation function over the entire interval. The overall fi (x) 's can
the form f(x) = Co + CIX, piecewise quadratic approximations use functions of the
form f (x) = Co + Cl X + C2X2, and so on. Naturally, choosing a higher degree of poly-
and memory. For low precisions, these implementations are often either linear or
6
with high degree polynomials are popular for high precision. Compared to iterative
approaches, they tend to require fewer cycles to perform the approximations and
One widely used polynomial technique involves the use of Taylor series approxi-
mations [4]. The Taylor series approximates a function f (x) in the following manner:
f(x) ~ t
i=O
.t(.TO)~~ - .TO) (1.4)
where 1:0 is a point at which the function's value is known. This approach is typically
adder. 'With this approach, a variable level of accuracy can be attained depending on
the approximation intervals chosen and the number of terms in the approximation.
One drawback to this technique is the lookup table size or number of terms in the
This thesis exammes techniques for approximating the sigmoid function and its
niquc. Chapter 2 presents sigmoid function theory, as well as some previous research
this first chapter. Advantages and disadvantages of these techniques relevant to the
7
sigmoid function are also presented. Chapter 3 focuses on the proposed minimax ap-
proximation method for the sigmoid and its derivative, describing minimax function
approximation and its application to the sigmoid. Additionally, this chapter con-
tains a presentation of the associated hardware used with this approach. Chapter 4
presents area and delay estimates for synthesized implementations of the approach
chapter also offers some future research directions based on the material presented
in this thesis.
8
Chapter 2
The sigmoid function and its derivative are used in a number of applications, in-
cluding neural networks. Typically, within a neural network, the sigmoid is used
as a learning function for training the neural network, while its derivative is used
as a network ac.:tivatioll function, specifying the point at which the network should
switch to a true state. A number of the methods specified in the previous chapter
have been applied to the approximation of the sigmoid function and its derivative
with varying levels of success. In this chapter, the mathematical nature of the
sigmoid function and its derivative is presented. This chapter also includes a discus-
sion of those techniques for the sigmoid utilized in previous research with resulting
9
sig(x) sig'(x)
1
0.25
0.5
0
0
-8 -4 0 4 8 -8 -4 0 4 8
elude the CORDIC algorithm, pseudo and standard piecewise approximations, and
1
sig(x) = - -X (2.1 )
1+ e-
(2.2)
As can be seen in Figure 2.1, these functions have the following limits:
10
lim sig(x) = 1 (2.3)
x-+oo
and
and
on these defined limits and given a desired level of accuracy, it is feasible to place
a bound on the inputs. That is, a threshold can be defined such that all inputs
past the threshold cause the limit value to be output. Typically, this is used to
define an interval of interest, (a, b), dependent upon the desired accuracy. This
interval is applicable, since the functions are within a relatively small distance from
the limit values pas~ this interval. Additionally, the symmetric properties of the
11
sigmoid and its derivative allow for potential benefits and optimizations. In the
case of both the sigmoid and its derivative, the inherent symmetric properties make
For either function, a small amount of combinational logic for detecting a negative
input and converting the output appropriately is required. With these points in
mind, a number of the previously used approaches for approximating the sigmoid
The CORDIC algorithm, as described in Section 1.3, has been used in practical
rocal, and square root approximations. This algorithm, however, suffers from the
speed drawbacks mentioned in Section 1.3, as well as practical issues that are partic-
ular to the sigmoid function. Due to the nature of the CORDIC algorithm, at least
two distinct CORDIC operations are required to approximate the sigmoid. Initially,
12
to usc two CORDIe functional units; one for computing e- X and a second for com-
puting the reciprocal of 1+e- x . Pipelining potentially allows for effective throughput
increases. The amount of hardware needed for this sort of implementation, however,
is quite complex, considering the amount of steering logic and staging necessary. For
these reasons, although the CORDIC algorithm shows a high degree of success for
other functions such as sines and cosines, the CORDIC algorithm is not well suited
pruposrd within that \vork represents the performance inefficiencies expected when
using the CORDIC algorithm for sigmoid approximations. Two CORDIC units, one
for exponential calculation and one for division, are pipelined while a ROM-based
with respect to other methods, utilizing a large aI.Dount of area while yielding only
standard CORDIC and a modified CORDIC, known as the flat CORDIC algorithm.
With this method, two units, one for exponential and one for division, are again
employed for the function calculation. The overall performance gain is due to the
use of the flat CORDIC algorithm, yielding a smaller device area and lower latency.
In this case, larger levels of accuracy are achieved" however, a dramatic increase in
13
device area occurs with levels of accuracy greater than twelve bits.
the sigmoid function anel its derivative with varying degrees of success. This success
Since the sigmoid and its derivative are fairly consistent, particularly as input val-
ues extend away from the origin, piecewise approximations over wide intervals can
be utilized, which reduces device latency and area. One disadvantage to these
approaches involves the nature of the hardware typically used in these implementa-
interval in which the input value lies, using either ROM lookup tables for function
techniques also use multipliers and adders for calculating the final approximation
value. An additional difficulty with these methods can occur when using ROM ta-
bles for coefficient values. These tables can grow to unruly sizes, depending on the
desired accuracy. Taking these considerations into account, however, useful results
14
In this paper, the input interval is split into thirteen equally-sized intervals with
an approximation function for each interval. These functions then represent the
sigmoid function and are realizable with conventional digital hardware. In this
latenc~T and area. Rather than calculating the functions with multipliers and adders
or storing their values in a lookup table, the input values are manipulated via
inverters and NAND gates to approximate the sigmoid. Although this negates any
some cases. this approach yields less than four bits of accuracy. Additionally, the
found in [1]. The authors propose to split an entire domain of inputs within (-8,8)
into segments with corresponding outputs. Both seven and fifteen-segment piecewise
approximations are proposed, with the fifteen segments offering greater accuracy at
t hr rxprnsr of area and potentially delay. The outputs are specified using shifted
versions of the inputs, with variations of the bits shifted into the result, depending
The method is presented in Table 2.1, for the proposed seven-segment piece-
0.110010000000000 2 is output.
15
In pu t Range Binary Input Binary Output Output Range
.. [a, b) xxxX.xxxxxxxxxxx X.xxxxxxxxxxxxxxx [a, b)
~
The authors report results which are favorable in terms of area and delay, how-
ever, little consideration is given to accuracy. The approach outlined in the paper
yields an accuracy of seven bits in the fifteen-segment case, and five bits in the
seven-segment case. Removing the lookup tables typically used in piecewise ap-
necessary selection logic. Since no implicit selection is available through table ad-
dressing, additional multiplexors and combinational logic are required to select the
appropriate outputs. Depending upon the desired accuracy, the necessary selection
A purer form of piecewise linear approximations for the sigmoid is discussed in [8]
in [10]. In this collection of work, three distinct design alternatives using first-
order piecewise approximations are described by the authors with respect to relative
advantages regarding accuracy, delay, and device area. These alternatives are all
defined for each of the segments. H(x) is then further defined by the following
16
equation:
Three first-order methods are considered by the authors for relative advantages
adder, and a lookup table. The segment in which a given input lies is determined in
multiplied. This product is then added to the B value associated with the input's
complexity of its implementation could potentially lead to large area and delay.
The second of these approaches reduces the delay of the first approach. It re-
two multiplication. The Segment's A coefficient is replaced with a value A', where
A' = 2- n and n is an integer. The value of A' is chosen such that it is the nearest
power of two to the original A value. In addition, B is calculated based on A' rather
than A, making it distinct from the previous technique. The following equation
(2.9)
17
appropriate number of places. This reduces the overall computational complexity
of the device, reducing device latency and area. This technique has the potential
integer power of two. Also, although the multiplication unit is effectively eliminated,
The third first-order approach takes the most aggressive stance on delay reduc-
tion. This approach eliminates the adder used with the first two approaches. This
is accomplished by determining the linear coefficient A' in the same manner as the
within a given segment, such that no changing bits of u overlap with nonzero bits
of B. The changing bits of u are those bits within a segment which distinguish
inputs from one another. For example, using the author's internal number system,
the segment (-2, -1] has values of the following form: 1001.xxxxxxxxxx where x
represents a changing bit of the input value. While it would appear that a large
number of changing bits need be considered in this example, the function's A' value
will effectively shift a number of the changing bits out of the product considered
when determining the B value. In this example, the A' value used is 2- 3 , eliminating
three of the changing bits. B is then chosen such that its seven least significant bits
are zero. This method requires a sophisticated system of logic for coefficient deter-
mination, as well as shifting units and lookup tables. While it effectively reduces
18
I
in Chapter L can lead to large device area.
These designs yield estimated area and delay which scale appropriately with the
expectations discussed in this section. Latency was effectively reduced with each
successive first-order design approach. Regarding accuracy, the authors report this
ill tprnls of maximum error of their sigmoid generator devices. For all of the first-
technique is described which also guarantees at most six bits. Again, as with other
work discussed in this chapter, the issues of area and latency are well addressed,
SYllllll(ltric tablp Iller hods pm ploy a combination of arithmetic devices and lookup
reduce the overall area and latency. Symmetric table methods have seen high levels
of success in reducing these factors for a number of functions, including the sigmoid
function and its derivative. Since these methods typically base themselves upon an
effective method for function approximation, such as Taylor polynomials, they are
19
inverse symmetry about the y-axis, and its derivative has symmetry about the y-
axis, both functions hold potential for reaping the benefits available with symmetric
table methods.
in [11]. This method provides a compromise between the issues of area and delay
starting point for the function's approximation. The input x is divided into m +1
partit.ions. \\'hich an' then utilized as inputs to m lookup tables containing coefficient
values. The partitions are addressed within the lookup table such that each lookup
table provides a function value ai(xO, Xi+l)' Based on this addressing, each table
receives the high-order partition, Xo along with a lower-order partition Xi+l. All
GiL Germe Geing passed to the lookup table. The outputs of the lookup tables are
overall complexity of the design. The lookup tables are minimized by taking advan-
symmetry, the inverters allow for the lookup table sizes to be reduced to half their
20
I· ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ =~Il xm
xO xl x2
I I II -1
./ V nO ./ V ni ./ V n2-I ./ Vnm-I
-----------_.
Ie XORs ~e XORs
1
I v
./
v n2-I ./
nm-I
I XORs el I XORs
-t P1 ~m
Multi-Operand Adder
v
l
./
p
f(x)
I I
Figure 2.2: STAtvr Block Diagram
original size. This is further enhanced in situations where all entries have identi-
cal trailing or leading bits, as these bits need not be stored in the lookup tables.
The STAM method yields decreased table size with minimal increase in delay with
suhsequent addition of lookup tables. Once past five tables, the reduction in individ-
ual table sizes tends to be outweighed by the area consumption of multiple tables.
At this point, the hardware complexity tends to outweigh the table size reduction
benefits.
III Figure 2.2, an example of a sigmoid approximation unit utilizing the STAM
21
method is shown~ As can be seen in this figure, the input argument x is split into
m +1 partitions with corresponding lengths ni. All partitions beyond the second
partition are inverted based on their most significant bit. The partition values are
moid and its derivative, as described in [12]. That paper explores five configurations
for sigmoid approximation using the STAM method, with two to six lookup tables.
For 13, 16, and 24-bit operand devices, the authors give results in terms of memory
these cases, a decrease in memory occurs for each successive table addition up to
five tables. Error values are also given for 16-bit operands. The configurations yield
a full 16 bits of accuracy. These results are clearly more favorable than others listed
22
Chapter 3
Proposed Technique
Givell the research previousl~r conducted for approximating the sigmoid and its
derivative in digital hardware, the question remains: What is an area and delay effi-
cient implementation, potentially not dependent upon lookup tables, which also
allows for accurate results? The technique proposed in this chapter, that of a
presented, followed by a proposed approach for its use with the sigmoid and its
implelllelltatioll for approximating the sigmoid and its derivative using the proposed
approach.
23
3.1 Minimax Function Approximation
number of applications, methods exist that can improve accuracy, while lowering
the maximum error across a given input segment, as opposed to a Taylor polynomial
For the purposes of reducing error, the approximation error is considered as a func-
To }'('c\uce the av(~ra,ge error over the interval [a, b], the errors at the endpoints are
considered, and the desired approximation function is chosen such that its endpoint
This equation is then solved for CI in terms of a and b. Next, the point at which
the error is maximum, X max , is determined by setting the derivative of E(x) to zero.
24
Thus:
(3.4)
This equation is then solved for X max in terms of Cl, which can be further expressed
in terms of a and b. Given Cl and X max ' Co can now be determined based on the
linear minimax polynomial requirement that the error at the endpoints be equal to
the. negation of the error at X maX . Thus, the following equation is solved for Co in
terms of a and b:
sider the approximation of f (x) = eX over the interval [-1, 1] using first-order mini-
max and Taylor polynomials. The Taylor polynomial for this approximation is 1 +x,
while the minimax equivalent is 1.2643 + 1. 1752x. Plots of the original function,
along with the Taylor and minimax polynomials, are shown in Figure 3.1. As noted
previously, the approximation error is minimal for the Taylor polynomial around the
point of pxpansion. while the minimax polynomial maintains a more consistent error
across the interval. For this example, the respective maximum errors for the Taylor
rllid Illillilll(\:\ POI.vllOillials (\1'(' 0.718 and 0.279, giving an improvement of over 60%.
presented. Consider the function f(x) = l/x over the interval [a, b]. The following
25
~~ .-------,r----~----__r----___.
2.5
eXP(Xl-
1+ x -----
1.2643+1.1752* x ---- :.
1.5
0.5
........ . " , , ;
o ...-
-1 -0.5 o 0.5 1
1
E(a) = - - (co + cIa) (3.6)
a
1
E(b) = b - (co + c1b) (3.7)
(3.8)
1
CI =-- (3.9)
ab
26
Now, X rnax is determined with· respect to a and b by setting the derivative of the
(3.11)
Next, Co is determined by setting E(a) equal to -E(x max ) and solving for Co. Thus:
1 1 1 1
- - (co - -a) - ( - - (co - -~)) (3.12)
a ab ~ ab
1 1 1 1
- - Co +- --+co--- (3.13)
a b Jab Jab
1 1 2
2co -+-+-
a b VOJj
(3.14)
b+a+2VOJj
Co (3.15)
2ab
giving a closed form minimax approximation of f(x) = l/x over the interval [a, b].
the sigmoid and its derivative involves generation of coefficients Co and CI' As shown
ill the previuu~ section, a clused form sulution to the minimax approximation of a
mining a closed form solution for the sigmoid minimax approximation begins with
27
the following derivation for Ci. Setting the errors at the interval endpoints equal to
one another:
1
E(a) = 1
+ e- a - (co + cia) (3.16)
(3.17)
(3.18)
1 1
Cl (b - a) = b (3.19)
1 + e- 1 + e- a
(3.20)
Thus, a closed form solution for Ci is attained. A closed form solution for Co of the
Illil}(,(1. TIll' (·olllpl('xil.\· of this solutio!l is such that its closed form is not presented
in this thesis. Since the purpose of this thesis centers around hardware imple-
mentations of the sigmoid and its derivative, the overall coefficient calculation was
performed by the Maple software package. This package provides minimax coeffi-
('iell! solutio!ls with its nurnapproxO library. It should be noted that Maple makes
llS(~ of Lh(~ lh~illez a.lgurithm fur its determining of minimax coefficients, which differs
28
from the approach presented in Section 3.1. This algorithm is employed by Maple,
Regarding the error of these approximations, since a solution for E(x max ) is not
11laxilllUlll elTor values for a slightly less accurate but similar approximation class,
(3.21)
where ~ is the point on [a, b) where 1" has its maximum value. In Figure 3.2, a plot
of the sigmoid's second derivative is shown. Based on Equation 3.21, the error is
drppndpnt upon tlw sizr of a given interval as well as the maximum value of 1" within
that interval. Due to this dependence, smaller intervals are chosen to reduce error,
and special attention is given to points at which the function's second derivative
~laple for first-order minimax approximations over a number of intervals are given
ill lable J.l fur Lire Sigllloid and Table 3.2 for its derivative. An explanation of the
Once the coefficients are determined, the next step in the design approach is
interval selection. In selecting the intervals to be used with the sigmoid and its
29
0
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-0.1
0 1 2 3 4 5 6 7 8
Interval Maximum
Error
[0,8) .134251
[0,4) .070646
[0,2) .020442
[2,4) .010768
[1,2) .005824
[2,3) .003740
[4,8) .003656
[0,1) .003534
[3.4) .001706
Table 3.1: Sample lVlaximum ~vlinimax Error Values for the Sigmoid
30
Interval Maximum
Error
[0,8) .057549
[0,4) .032273
lrO ,-')) .01124, 7
[2,4) .008136
[0,1) .005895
[4,8) .003561
[2,3) .002491
[3,4) .001489
[1,2) .000977
Table 3.2: Sample Maximum Minimax Error Values for the Sigmoid's Derivative
attempting to lower the error across the entire input interval. For the purposes of
this thesis, the input interval (-8,8) was selected due to its applicability to a number
and its derivative, the approximation is made on [0,8) and symmetry properties are
through inherent properties such as extensive leading or trailing bits. Also, using
equally-sized intervals allows for table lookups for coefficient selection. Using the ta-
ble lookups, an implicit addressing is gained since each successive address represents
an interval.
31
For the purposes of this thesis, intervals are selected based on their error-reducing
ability rather than their consistency. Starting with the entire interval over which r
the function is to be approximated, the interval is split in half to two subintervals.
:\Text, the accuracy of the subintervals is determined. These values are compared to
2- k where k is the desired number of bits of accuracy. Those intervals limiting the
overall accuracy of the design are split in half until the desired level of accuracy is
<ltt<lilled.TIl<'se smaller intprvals are chosen such that they represents equal halves
of tI](\ original interval. Once all intervals satisfy the accuracy requirements, the
interval set is complete. This method is illustrated in Figure 3.3. While the method
As all example of selecting the necessary intervals for a given accuracy, consider
the case of seven bits of accuracy for the sigmoid function. Using the interval [0,8)
as a starting point, the first step is to determine the accuracy of this interval. A
minimax approximation of the function over this interval provides much less than
seven bits of accuracy. so the interval is split into [0,4) and [4,8). Referring to
Table :3.1. [0. -l) provides less than seven bits of accuracy, and it is divided into [0,2)
and [2,4). The accuracy of [0,2) is checked, again providing less than the desired
accuracy, so it is split into [0,1) and [1,2). [0,1) provides the desired accuracy, so it
is not split further. Similarly, [1,2), [2,4) and [4,8) all provide sufficient accuracy
fOl' this fumtion and are also not divided further. At this point, the interval set
32
Accurate #Intervals Intervals
Bits
7 5 [0,1) [1,2) [2,3) [3,4) [4,8)
8 6 [0,1) [1,~) [~, 2) [2,3) [3,4) [4,8)
I
0 10 [0,1) [~,1) [1,~) [~,~) [~,2) [2,~)
I
[~,3) [3,4) [4,6) [6,8)
10 I:)
'2 2'4~' '4 4'2
r
[0 1) [1 :i) [:i 1) [1 -~) [~ 1:) 1)
r4
[i,2) [2,~) [4' ~) [~, 3) [3,~) [2,4)
[4,5) [5,6) [6,8)
11 19 [O,~) [i,~) [~, ~) [~, 1) [1, %) [%'
[~, 4) [i,2) [2,~) [~,~) [~, ~ )
P
[1 1, 3) [3, 13) [1 3, ~) [~, 4) [4, ~)
4 4 4
[*,5) [5,6) [6,8)
Table 3.3: Sigmoid Interval Configurations
for the desired accuracy has been attained. The results of this example can be seen
in Table 3.3, along with the cases of eight, nine, ten, and eleven bits. Intervals
for til(' dPJ'ivativp of the sigmoid are shown in Table 3.4. It should be noted that
the deri\"ativc of the sigmoid requires fewer intervals overall. This is due to the
less drastic changes in slope throughout the function. Error values, as calculated
by Maple, for these intervals are shown in Tables 3.5 and 3.6 for the sigmoid, and
With the intervals in place, the appropriate C1 for each interval can be determined
using the closed-form shown in Equation 3.20. As stated previously, however, Maple
was used to calculate both Co and C1 for each interval. The decimal values for these
coefficients are shown in Tables 3.9 and 3.10 for the sigmoid and Tables 3.11 and
Fill;1lk ('()I1..,ir10rMioIlS n!'0 lllnd0 rr'gnrding n('ceSSaT~! nata widths for the devices,
33
Begin
i=O;
number of intervals=l
yes
Is i >= number of intervals?
no
no
~
Is Error[i] > desired accuracy?
yes
Number of intervals++;
Split interval i, [a,b) into subintervals
[a,(a+b)/2) and [(a+b)/2,b)
;:.
i++;
End
34
Accurate #Intervals Intervals
Bits
7 5 [0,1) [1,2) [2,3) [3,4) [4,8)
8 6 [O,~) [~, 1) [1,2) [2,3) [3,4) [4,8)
9 8 [O,~) [~, 1) [1,2) [2,~) [~, 3) [3,4)
[4,6) [6,8)
10 13 [0 '4l)[ll)[l ~)[~ 1)[1 ~)[~ 2)
4'2 2'4 4' '2 2'
[2, ~) [~, 3) [3,~) [~, 4) [4,5) [5,6)
[6,8)
11 16 [0 '~l) [14'21) [12'4~) [~4' 1) [1 '2~) [~2' 2)
[2, 4) [~,~) [~, 141) [141,3) [3,~)
[~, 4) [4, *) [*,5) [5,6) [6,8)
.
Table 3.4: Sigmoid Derivative Interval Configurations
[3,4) .001706
[4,6) .001815
[6.8) ( .000253
35
Accurate Interval Minimax Error
Bits
10 [0, ~) .000489
[~, ~) .000268
[*,1) .000333
[1, i) .000367
[~, ~) .000374
[*, ±) .000359
[±,2) .000330
[2, ~) .000292
[~, ~) .000252
[~, 3) .000778
[3, ~) .000521
[~, 4) .000336
[4,5) .000680
[5,6) .000258
[6,8) .000253
11 [0, }) .000062
[1, ~) .000174
[~, ~) .000268
[~, 1) .000333
[1, ~) .000367
[~, *) .000374
[*, ±) .000359
[±,2) .000330
[2, ~) .000292
[~, ~) .000252
[~, y) .000212
[¥,3) .000176
[3, ~j) .000144
[1], ~) .000116
[~, 4) .000336
[4, *) .000211
[*,5) .000131
[5,6) .000258
[6,8) .000253
36
Accurate Interval Minimax Error
Bits
7 [0,1) .005895
[1,2) .000977
[2,3) .002491
[3,4) .001489
[4,8) .003561
8 [0, ~) .001816
[~, 1) .001050
[1,2) .000977
[2,3) .002491
[3,4) .001489
[4,8) .003561
9 [0, ~) .001816
[~, 1) .001050
[1,2) .000977
[2, ~) .000645
[~, 3) .000582
[3,4) .001489
[4,6) .001750
[6,8) .000252
Table 3.7: Sigmoid Derivative Maximum Errors for 7-9 Accurate Bits
37
Accurate Interval Minimax Error
Bits
10 [0, *) .000479
U' ~) .000422
[~, ~) .000321
[~, 1) .000200
[1, ~) .000141
[~, 2) .000472
[2, ~) .000645
[~, 3) .000582
[3, ~) .000440
[~, 4) .000304
[4,5) .000648
[5,6) .000253
[6,8) .000252
11 [0, ~) .000479
[~, ~) .000321
G,l) .000200
[1, ~) .000141
[~ '))
')' ~
.000472
[2, ¥) .000159
[¥, ~) .000161
[~, ¥-) .000152
[¥,3) .000137
[3, ~) .000440
[~, 4) .000304
[4, ~) .000199
[~, 5) .000126
[5,6) .000253
[6,8) .000252
Table 3.8: Sigmoid Derivative Maximum Errors for 10-11 Accurate Bits
38
Accurate Interval Co Cl
Bits
7 [0,1) 0.503526 0.231058
[1,2) 0.587144 0.149738
[2,3) 0.740982 0.071777
[3,4) 0.865960 0.029439
[4,8) 0.968019 0.004412
8 [0,1) 0.503526 0.231058
[1, §) 0.559518 0.173031
[~, 2) 0.629290 0.126445
[2,3) 0.740982 0.071777
[3,4) 0.865960 0.029439
[4,8) 0.968019 0.004412
9 [0, ~) 0.500484 0.244918
[~, 1) 0.515071 0.217198
[1, ~) 0.546461 0.184965
[f, ~) 0.576301 0.161098
[~, 2) 0.629290 0.126445
[2, ~) 0.708509 0.086689
[*,3) 0.782758 0.056864
[3,4) 0.865960 0.029439
[4,6) 0.952796 0.007756
[6,8) 0.991368 0.001068
39
Accurate Interval Co Cl
Bits
10 [0, ~) 0.500484 0.244918
[~, ~) 0.509288 0.226877
[~, 1) 0.523872 0.207519
[1, ~) 0.546461 0.184965
[~, *) 0.576301 0.161098
[*, ±) 0.611664 0.137513
[±,2) 0.650373 0.115377
[2, ~) 0.690262 0.095413
[~, ~) 0.729481 0.077965
[~, 3) 0.782758 0.056864
[3, ~) 0.844413 0.036227
[~, 4) 0.891742 0.022652
[4,5) 0.937520 0.011293
[5,6) 0.972463 0.004220
[6,8) 0.991368 0.001068
11 [0, }) 0.500062 0.248706
[*, ~) 0.502068 0.241131
[~, ~) 0.509288 0.226877
[~, 1) 0.523872 0.207519
[1, ~) 0.546461 0.184965
[~, ~) 0.576301 0.161098
[~, ~) 0.611664 0.137513
[~, 2) 0.650373 0.115377
[2, ~) 0.690262 0.095413
[~, ~) 0.729481 0.077965
[~, If) 0.766639 0.063086
[If,3) 0.800821 0.050643
[3, ¥) 0.831530 0.040395
[~J, ~) 0.858599 0.032058
[~, 4) 0.891742 0.022652
[4, ¥) 0.926231 0.013998
[¥,5) 0.950497 0.008588
[5,6) 0.972463 0.004220
[6,8) 0.991368 0.001068
40
Acc:u rat(' Interval Co Cj
Bits
7 [0,1) 0.255890 -0.053388
[1,2) 0.287292 -0.091626
[2,3) 0.222136 -0.059816
[3,4) 0.126229 -0.027513
[4,8) 0.031428 -0.004331
8 [0, ~) 0.251816 -0.029992
[~, 1) 0.274444 -0.076783
[1,2) 0.287292 -0.091626
[2,3) 0.222136 -0.059816
[3,4) 0.126229 -0.027513
[4,8) 0.031428 -0.004331
9 [0, ~) 0.251816 -0.029992
[~, 1) 0.274444 -0.076783
[1,2) 0.287292 -0.091626
[2, ~) 0.243907 -0.069779
[~, 3) 0.194156 -0.049854
[3,4) 0.126229 -0.027513
[4,6) 0.046308 -0.007598
[6,8) 0.008608 -0.001065
41
Accurate Interval Co Cl
Bits
10 [0, i) 0.250479 -0.015463
[!4' !)
?
0.257686 -0.044521
[~, ~) 0.269542 -0.068434
[~, 1) 0.281944 -0.085132
[1, *) 0.291686 -0.094933
[*,2) 0.281132 -0.088305
[2, ~) 0.243907 -0.069779
[~, 3) 0.194156 -0.049854
[3, ~) 0.145077 -0.033447
[~, 4) 0.103680 -0.021580
[4,5) 0.061073 -0.011014
[5.6) 0.020320 -0.002868
[6,8) 0.008608 -0.001065
11 [0, ~) 0.250479 -0.015463
[~, ~) 0.257686 -0.044521
[~, ~) 0.269542 -0.068434
[~, 1) 0.281944 -0.085132
[1, *) 0.291686 -0.094933
[*,2) 0.281132 -0.088305
[2, ~) 0.254719 -0.074942
[~, ~) 0.231484 -0.064616
[~, ¥) 0.206225 -0.054509
[¥,3) 0.180634 -0.045198
[3, ~) 0.145077 -0.033447
[~, 4) 0.103680 -0.021580
[4, ~) 0.071834 -0.013592
[*,5) 0.048702 -0.008436
[5,6) 0.020320 -0.002868
[6.8) 0.008608 -0.001065
42
As the multiplication unit will be the limiting factor in accuracy, focus is placed on
the widths necessary for x and CI' For these inputs, for k bits of accuracy desired,
there must be k + 3 bits of the inputs provided. Consider the case of x. The desired
accuracy can be expressed as 2- k . Thus, the fractional bits of x must yield a result
(3.22)
(3.23)
adding the three integer bits to x gives a total of k+3 bits. Similarly, Cl requires k+2
\iVith the intervals and associated coefficients determined, the final step in the pro-
tation of the design. Based on the methodology discussed in Section 3.2, a hardware
design is presented.
3.4. As can be seen in the figure, the approach consists of a number of multiplexors,
43
signal logic for these multiplexors, XOR gates, and a multiply-accumulate unit. The
multiplexors serve as selection logic to determine the appropriate coefficients for the
input argument. This is dependent upon the interval in which the input argument
lies. The coefficients, discussed in Section 3.2, are hard-wired to the inputs of
X2· .. X-2 is received, the multiplexor signal logic determines the interval to which the
input belongs, allowing for the appropriate coefficient selection. The signal logic
then outputs a number of selection signals equal to the number of intervals in the
design, shown as 7/1, in Figure 3.4. These selection signals are determined by the
interval over which the approximations are made. Input sequences that identify
these intervals are used as the multiplexor signal inputs. These sequences are shown
in Tables 3.13 and 3.14 for the sigmoid and Tables 3.15 and 3.16 for its derivative.
Consider the interval [0, 1). This interval is used for any input having all zero integer
bits. This is d~mined by the input sequence X2XIXO, as any inputs matching this
pattern are within the interval. Using the interval differentiating bit patterns for
the multiplexor signals requires use of one-hot multiplexor devices, which typically
have less area and delay than a traditional multiplexor. Use of these devices also
red uees the necessi ty for multiplexor signal encoding prior to the multiplexor.
The coefficient values selected are sent to a multiply accumulate unit which
calculates Co + CIX, This unit is implemented with a tree multiplier as its basis. A
row is added to the bottom of the partial product matrix for the addition of Co to
44
the product. This matrix is then reduced using the reduced area technique to form
two resultant values. These values are then added using a carry lookahead adder.
[14].
The XOR gates in the device are used to conditionally invert negative inputs.
As discussed, the symmetry properties of the sigmoid and its derivative allow for
calculation of inputs over the interval (-8, 0) based on inputs over the interval [0,8).
All integer and fractional bits of the input are XOR'd with the input's sign bit, thus
providing the one's complement of x for negative inputs. This allows the device to
calculate sig( -.r) or sig'( -x) as the case may be. Once this calculation is complete,
the result is again sent through XOR gates with the input sign bit. This provides
-sig( -x) and -sig'(-x). Although by definition 1 - sig( -x) is desired, adding
a value of 1 to any of the results within the range of -sig( -x) will give a zero in
the integer bit. Because of this, the integer bit can be set to zero and the one's
45
~ Sx x
I
t n
~ XOR Gates
t 5
I I .... II
+ m
I I .... II
~\coMUX / \
c, MUX /
-
f- k+3 _- k+3 -- k+3
Multiply-Accumulate Unit
1 t k
~ XOR Gates
{k
f(x)
46
Accurate Interval Differentiating Bits
Bits
7 [0,1) X2 X I X O
[1,2) X2 X I X O
[2,3) X2 X I X O
[3,4) X2 X I X O
[4,8) X2
8 [0,1) X2 X I X O
[1, ~) X2 X I X OX -l
[~, 2) X2 X I X OX -l
[2,3) X2 X I X O
[3,4) X2 X I X O
[4,8) X2
9 [0, ~) X2 X I X OX -l
[~, 1) X2 X I X OX -l
[1, ~) X2 X I X OX -I X -2
[Q4' ~)
2 X2 X I X OX -I X -2
[~, 2) X2 X I X OX -l
[2, ~) X2 X I X OX -l
[~, 3) X2 X I X OX -l
[3,4) X2 X I X O
[4,6) X2 X l
[6,8) X2 X l
Table 3.13: Sigmoid Interval Differentiating Bits for 7-9 Accurate Bits
47
Accurate Interval Differentiating Bits
Bits
10 [0, ~) X2 X I X OX -l
[! ~)
L2' 4 X2 X I X OX -I X -2
[%,1) X2 X I X OX -I X -2
[1, ~) X2 X I X OX -I X -2
[J, *) X2 X I X OX -I X -2
[~, f) X2 X I X OX -I X -2
[f,2) X2 X I X OX -I X -2
[2, ~) X2 X I X OX -I X -2
[~, *) X2 X I X OX -I X -2
[~, 3) X2 X I X OX -l
[3, ~) X2 X I X OX -l
[~, 4) X2 X I X OX -l
[4,5) X2 X I X O
[5,6) X2 X I X O
[6,8) X2 X l
11 [0, t) X2 X 1X OX -1 X -2
[±, ~) X2 X I X OX -l:r -2
[~, ~) X2 X I X OX -I X -2
[1,1) X2 X I X OX -I X -2
[1, ~) X2 X I X OX -I X -2
[%' ~) X2 X I X OX -I X -2
[~, f) X2 X I X OX -I X -2
[f,2) X2 X I X OX -I X -2
[2, *) X2 X I X OX -I X -2
[*, ~) X2 X I X OX -I X -2
[~, ¥) X2 X l XOX-l X-2
[¥,3) X2 X I X OX -I X -2
[3, .11) X2:J;lXOX-IX-2
[7' ~) X2 X I X OX -I X -2
[~, 4) X2 X I X OX -l
[4, *) X2 X I X OX -l
[*,5) X2 X 1X OX -l
[5,6) X2 X I X O
[6,8) X2 X l
Table 3.14: Sigmoid Interval Differentiating Bits for 10-11 Accurate Bits
48
Accurate Interval Differentiating Bits
Bits
7 [0,1) X2·'EIXO
[1,2) X2 X I XO
[2,3) X2 X I X O
[3,4) X2 X I X O
[4,8) X2
8 [0, ~) X2 X I X OX -l
[~, 1) X2 X I X OX -l
[1,2) X2 X I X O
[2,3) X2 X I X O
[3,4) X2 X I X O
[4,8) X2
9 [0, ~) X2 X I X OX -l
[~, 1) X2 X I X OX -l
[1,2) X2 X I X O
[2, ~) X2 X I X OX -l
[~, 3) X2 X I X OX -l
[3,4) ·'E2:L I X O
[4,6) X2 X l
[6,8) X2 X l
Table 3.15: Sigmoid Derivative Interval Differentiating Bits for 7-9 Accurate Bits
49
Accurate Interval Differentiating Bits
Bits
10 [0, ~) X2 X I X OX -I X -2
[~, ~) X2 X I X OX -I X -2
[1, ~) X2 X I X OX -I X -2
[1,1) X2 X I XOX -I X -2
[1, *) X2 X I X OX -l
[~, 2) X2 X I X OX - l
[2, ~) X2 X I X OX - l
[*,3) X2 X I X OX - l
[3, ~) X2 X I X OX - l
[~, 4) X2 X I X OX - l
[4,5) X2 X I X O
[5,6) X2 X I X O
[6,8) X L1:,
11 [0, ~) :r Lx lxOx-1X-2
[1,1) X2 X 1XOX -1 X -2
[1,~) X2 X I X OX -I X -2
[1,1) X2 X I X OX -I X -2
[1, ~) X2 X I X OX - l
~,2) X2 X I X OX - l
[2, ~) X2 X I X OX -I X -2
-
[~, ~) X2 X I X OX -I X -2
[%,3) X2 X I X OX - l
[3, ?") X2 X I X OX - l
[~, 4) X2 X I X OX - l
[4, ~) X2 X I X OX - l
[*,5) X2 X I X OX - l
[5,6) X2 X I X O
[6,8) X2 X l
Tabl" :3.16: Sigmoid Derivative Interval Differentiating Bits for 10-11 Accurate Bits
50
Chapter 4
Results
discussed in Chapter 3, the designs have been implemented and synthesized for the
presented in this chapter. The chapter begins with a discussion of the manner in
which the designs have been implemented, followed by a presentation of the area
in Chapter 3 have been prepared. Modules for the arithmetic components of the
designs were generated automatically using existing VHDL generation Java code
51
provided through the FGS program.
The designs were simulated for behavior confirmation and then synthesized for
FPGA devices using Altera's Quartus software. This software package uses Altera
FPGA devices as its target technologies, and claims to provide extensive logic-fitting
algorithms for optimal layouts. The software provides synthesis data for device
usage in terms of logic cells, esb bits, and pins, as these are typically of importance
to FPGA designs. The logic cells implement gates while the esb bits implement
memory.
The designs were also synthesized to ASIC technology using the Leonardo Spec-
trum synthesis package. Leonardo's provided SCL05u ASIC library was used as the
target technology for synthesis. The area results provided by this package are in
As stated, the devices were synthesized to Leonardo's SCL05u ASIC library. The
synthesis results for the sigmoid implementations are shown in 4.1 and the sigmoid
derivative results are shown in 4.2. Other than a slight anomaly in delay between
the ten and eleven bit sigmoid cases, the devices scale regularly in terms of both
area and delay as the number of accurate bits increases. One potential reason for
the mentioned anomaly is the irregular size of the multiply-accumulate unit and its
52
Accurate Area (Gates) Delay (ns)
Bits
7 2268 16.01
8 2560 16.79
9 3323 19.18
10 3756 19.87
11 4392 19.54
associated carry lookahead adder. Eliminating this anomaly was not explored as the
difference in delay between the ten and eleven bit cases was considered negligible.
Regarding the FPGA synthesis, each of the devices was synthesized to an AI-
tera E20k30ETC144-1 FPGA. The synthesis results for the cases synthesized are
presented in Table 4.3 for the sigmoid and Table 4.4 for the sigmoid's derivative.
Area is given in terms of logic cells as this was the only considerable area result
given. No esb usage was reported, since neither registers nor lookup tables were
in use. As can be seen in the tables, both delay and area increase with increases
ill desired accuracy. The largest jump occurs betvveen 10 and 11 bits of accuracy
for the sigmoid, as this requires a considerably higher number of intervals than the
other cases.
53
Accurate Area (Logic Cells) Delay (ns)
Bits
7 351 34.46
8 449 38.54
9 531 40.41
10 603 42.66
11 813 51.86
54
Chapter 5
Conclusions
reg'mding til<' \'alidity of the proposed approach to sigmoid and sigmoid derivative
method are discussed, followed by outlets for future research based on the material
One important consideration when using the approach proposed in this thesis is its
lack of registers and ROM tables. While this potentially offers improvements, it
55
ll'llllliqll(' is t.!H' lack ul' dCjJl~lIdellcy upuu register lugic withiu oue"s target imple-
or ASIC by varying manufacturers, the device should not perform any differently
based on register technology. Also, synthesis to devices not offering extensive mem-
ory for lookup tables is a possibility, which could potentially be of benefit. One
possible disadvantage, however, is the lack of implicit addressing that lookup tables
hot multiplexors to control coefficient selection. By using ROM lookup tables along
of the device, which could potentially be problematic as input sizes and desired
aCCllraCH'S lllcrease.
to consider is the amount of area consumption per device module with these de-
signs. For the ten configurations discussed in the previous two chapters, the device
which consumes the largest amount of area is the multiply accumulate unit. As
a carry lookahead adder. In many FPGA devices, multipliers and adders are on-
board and available for use by other design components. With this in mind, a
56
design could be implemented using the existing multiplication and addition units,
effectively eliminating the largest component of the design, while maintaining the
Using the work' presented in this thesis as a starting point, there are a number of
areas in which further work could be pursued. First, since the scope of this thesis was
synthesis. one possible extension to this work is furthering the FPGA synthesis
results. In addition to the Altera FPGA synthesis performed, devices from other
ASIC libraries is also a potential direction. This would potentially provide further
pr<)l)!' of the b(mefits of using this approximation technique for ASIC devices.
\Vith respect to the actual technique discussed, exploration could be made re-
and trigonometric functions. If this research proved effective with these functions
as well, a general form for implementing function approximations using the pro-
posed technique might be possible, for any given function meeting criteria such as
.57
Higher-order minimax approximations could be considered for the sigmoid func-
tion and its derivative. This thesis focused on first-order approximations as they
were simplest to implement. It should be noted, however, that the amount of accu-
racy available for designs may be limited depending on the function in question. By
Finally. softwC1rp to generate various aspects of this work could be created. One
starting point might be the design of the algorithm for interval determination dis-
and synthesis would also be a possible area of future work, potentially working with
the existing programs used to generate the carry lookahead adders and multiply-
As the results and these directions show, this implementation technique of par-
titioned minimax approximation of the sigmoid and its derivative offers potential
benefits for FPGA and ASIC implementations, due to its lack of lookup table and
58
Bibliography
[1] P. Murtagh and A. Tsoi, "Implementation issues of sigmoid function and its
derivative for VLSI neural networks," in IEEE Proc.- E, May 1992. vol.139,
no.3.
[3] .J. S. \Valther, "A unified algorithm for elementary functions," in Proceedings
[4] K. Atkinson, Elementary Numerical Analysis, Second Edition. New York, NY:
[6] T. S. B. Gisuthan and K. Asari, "A high speed flat CORDIC based neuron with,
multi-level acitvation function for robust pattern recognition," IEEE, vol. 11,
59
pp. 1-12,2000.
tivation function for digital VLSI neural networks," Electronic Letters, vol. 25,
[8] J. D.-F. Stamatis Vassiliadis and M. Zhang, "High performance with low im-
[0] S. V. lVr. Zh<lng and .J. Delgado-Frias, "Sigmoid generators for neural computing
[11] J. E. Stine and M. J. Schulte, "The symmetric table addition method for ac-
[1:2] .J. S. N. Koc-Sahan and Iv1. Schulte, "Symmetric table addition methods for
60
[13] J .H.lVIathews, Numerical Methods for Computer Science, Engineering and
arpC\ multipliers," Journal of VLSI Signal Processing, vol. 9, pp. 181-192, April
1995.
61
Vita
.lr1S011 Schlpssrnr1n Wr1S born in Brading, Pennsylvania on June 11, 1975 to Warren
dergraduate with a major in computer engineering from August 1993 to May 1995.
During this time Jason was named to the dean's list each semester and received the
Vollner- Kleckner award for academic excellence. At this point he took time off to
consider his academic career. Upon consideration, Jason attended Lehigh University
1999 and culminating in his earning a Bachelor of Science degree in computer engi-
neering June 2001. During this time he was named to the dean's list each semester,
graduatrd sumrna cum laude, and served as a grader and teaching assistant for two
senior level computer engineering courses. Since June 2001, Jason has pursued a
pam this degree August 2002, Jason served as a teaching assistant for a senior level
Jason published four papers in international journals under the auspices of the
62
Pennsylvania State University as well as two conference papers under the auspices
of Lehigh University. Jason served as speaker for one of the latter papers at the
63
END OF
TITLE