My Project
My Project
September 2022
i
Department of Electronics and Communication Engineering
CERTIFICATE
This is to certify that the project report entitled “Design of High Performance Posit
Multiplication for DSP Applications” being submitted by V. MANASA (20711D57082), in
partial fulfillment for the award of the Degree of Master of Technology in VLSI to Narayana
Engineering College, Nellore is a record of bonafied work carried out by her under my
guidance and supervision.
The results Embodied in this project report have not been submitted to any other University or
Institute for the award of any Degree or Diploma.
ii
DECLARATION
I hereby declare that the project entitled,” Design of High Performance Posit Multiplication
for DSP Applications” completed and written by me has not been previously formed the basis
for the award of any degree or diploma or certificate.
Place: Nellore
Date: V. MANASA
iii
ACKNOWLEDGEMENT
I am extremely thankful to Dr. P. NARAYANA,Ph.D Founder, Narayana Educational
Institutions, Andhra Pradesh for the kind blessings. I am extremely thankful to Mr.R.
Sambasiva Rao B.Tech, Registrar Narayana Engineering College, Nellore.
I am much obliged to Dr. A.V.S Prasad, Ph.D Director, Narayana Engineering &
Pharmacy Colleges, for the continuous encouragement and support. I am owe indebtedness to
our Principal Dr.G. Srinivasulu Reddy, M.Tech,Ph.D, Narayana Engineering College, Nellore for
providing us the required facilities.
I would like to express our deep sense of gratitude and sincere thanks to Dr. K.
Murali, M.Tech, Ph.D , Professor & HOD, Department of Electronics and Communication
Engineering, Narayana Engineering College, Nellore for providing the necessary facilities and
encouragement towards the project work.
I am thankful to our Faculty In charge - project, Dr. K. SELVAKUMARASAMY,
M.Tech, Ph.D, professor, Department of Electronics and Communication Engineering, Narayana
Engineering College, Nellore for his guidance and support in the completion of the project.
I would like to thank our project guide, Mr.A. Siva sai Kumar, Assistant
Professor, Department of Electronics and Communication Engineering, Narayana Engineering
College, Nellore for his guidance, valuable suggestions and support in the completion of the
project.
I am gratefully acknowledged and express our thanks to teaching and non-teaching
staff of ECE Department and especially to my parents who helped me thoroughly for shaping
the things well in order.
Project Associate
V.MANASA
(20711D5702)
iv
ABSTRACT
This standard for floating-point arithmetic (IEEE 754) has been in use for many years and
is incorporated in most current computer systems, making it the most widely used. In
recent years, John L. Gustafson has suggested a new numerical representation format
dubbed posit (Type III unum), which he believes may achieve superior precision using
equal or fewer bits and simpler technology than the existing IEEE 754 arithmetic.
The new posit numeric format, its features and qualities, as well as the standards for
floating-point numbers, are examined and contrasted in this Bachelor's thesis (floats).
posits are tapered precision number to replace IEEE floating point. It provides more
precision and lower complexity implementations than IEEE floating point.in this project
area efficient posit multiplier architecture is proposed. The mantissa multipliers are
designed for the maximum possible bit-width however the whole multiplier is divided into
multiple smaller multipliers. Only the required small multipliers are enabled at run-time.
Those smaller multipliers are controlled by the regime bit-width which can be used to
determine the mantissa bit-width.by using this method it can achieve less delay. There are
several ways to reduce the area and latency of the system. Here we have introduced a new
approach dubbed the RoBA (Rounding Based Approximate) multiplier.
We are extending posit multiplier with replacing of reversible gates and providing the
better multiplication operation when compared to the posit multiplier system. Reverse
logic is very important in low area circuit design. The important reversible gates used for
reversible logic synthesis are Feynman gate, Fredkin gate, HNG gate etc. Reversible logic
gate is an n-input-output logic device with one-to-one mapping. This helps to determine
the outputs from the inputs and also the inputs can be uniquely recovered from the outputs.
Multiplication computing can decrease the complexity with an increasing performance and
power efficiency for error resilient applications. The proposed multiplication technique is
utilized in two variants of 8-bit multipliers. The effectiveness of the proposed design is
synthesized and simulated using Xilinx softwa
v
CONTENTS
TITLE PAGE N0
CHAPTER-1 INTRODUCTION 1-2
1.1 Motivation 1
1.2 Objectives 2
CHAPTER-2 LITERATURE SURVEY 3-5
CHAPTER-3 ADDERS AND MULTIPLIERS 6-11
3.1 Adders 6
3.2 Types of Adders 6
3.2.1 Half adder 6
3.2.2 Full adder 8
3.2.3 Ripple carry adder 9
3.3 Multipliers 10
3.3.1 Dadda multiplier 11
CHAPETR-4 EXISTING SYSTEM 12-22
4.1 The Unum Number Format 12
4.1.1 Type-1 and Type-2 Unm’s 12
4.1.2 Type-3 Unm’s 14
4.2 Posit Format 15
4.3 Setting Posit Environment 16
4.3.1 Extracting the sign bit 18
4.3.2 Extracting the regime bits 19
4.3.3 Extracting the exponent bits 20
4.3.4 Extracting the fraction bits 20
vi
4.4 Posits as Projective Reals 21
4.5 Multiplication Algorithm of Roba Multiplier 22
CHAPTER-5 PROPOSED SYSTEM 25
5.1 Reversible Logic Gates 25
CHAPTER-6 SOFTWARE IMPLEMENTATION USING 28-29
XILINX
6.1 Xilinx Introduction 28
6.1.1 Programmable logic device 29
6.2 Creating a New Project 29
CHAPTER-7 SIMULATION AND SYNTHESIS RESULTS 36-41
7.1 Unsigned 8 Bit Posit Multiplier 36
7.2 Signed 8 Bit Posit Multiplier 36
7.3 Rtl and Technology Schematics for Existing 37
Method
7.3.1 Rtl schematic for posit multiplier 37
7.3.2 Rtl schematic for dsr-right-n-s 37
7.3.3 Rtl schematic for dsr-left-n-s 38
7.3.4 Rtl schematic of transistor level 38
7.4 Rtl and Technology Schematics for Proposed 39
Method
7.4.1 Reversible 8 bit posit multiplier 39
7.4.2 Transistor level of reversible 8 bit posit 39
multiplier
7.5 Synthesis Result of Existing Method 40
7.5.1 Area 40
7.5.2 Delay 40
7.6 Synthesis Result of Proposed Method 41
vii
7.6.1 Area 41
7.6.2 Delay 41
7.7 Comparison Table 42
CHAPTER-8 CONCLUSION AND FUTURE SCOPE 43
CHAPTER-9 REFERENCES 44-45
viii
LIST OF FIGURES
x
LIST OF TABLES
TABLE NO NAME OF TABLE PAGE NO
3.1 Half adder truth table 7
3.2 Full Adder truth table 8
5.1 Truth table of HNG reversible logic 25
7.1 Comparison table 41
xi
xii
CHAPTER -1
INTRODUCTION
1.1 Motivation
They live in a time when computer scientists must balance the competing demands of low cost,
high performance, and low energy use. Large volumes of data, elevated computing (HPC), or
the restricted processing resources available on increasingly common embedded devices are all
driving the development of new computer paradigms right now. Famous catastrophes caused by
floating-point numerical mistakes have happened throughout time, such as Missile System
failure (February 25, 1991), change in German parliament composition (April 5, 1992) or the
launch of the Ariane 5 rocket (June 4, 1996). Problems with the floating-point standard design,
including overflow or rounding mistakes, led to these catastrophic errors.
The Ieee Std for Floating-Point Arithmetic (IEEE 754) is the most widely used implementation
in current computing systems, however several floating-point formats have been utilized in
computers throughout the years. The standard was last updated in 2008 after being introduced in
1985 (IEEE 754).
It's not necessary to use the same IEEE floating-point format on every machine. Rounding is
used when a calculation does not fit inside the selected number format. Even though the round-
to-nearest algorithm, tie-free zero rounding, and guidelines for reproducible calculations were
all included in the most recent iteration of the standard, hardware manufacturers are still free to
ignore these recommendations. This means that the same calculations on various computer
systems might provide different results.
Bit patterns are used to handle exceptions such as division by zero, which results in a NaN,
including the Not-A-Number (NaN)value, that signals that a number is not recognizable or
undefined. Having too many bit patterns to represent NaNs complicates hardware design and
reduces the number of perfectly representable values that may be used.
1
Large-magnitude finite numbers may be replaced by overflow allowing + or in IEEE 754,
whereas small-magnitude nonzero numbers can be replaced by underflow accepting 0. As a
result, serious issues, such as the ones described above, might arise.
The associativity and distributivity qualities are not always retained in floating-point format
since rounding is done on each operand. Along with the fuse multiply add (FMA)operation, the
most recent iteration of the standard attempts to resolve this problem. However, not all
computer systems will be able to take use of this. Due to the aforementioned issues, it was
created this new number system to replace the widely used IEEE 754 arithmetic. John L.
Gustafson invented the posit integer representation system in the end of 2017, a format that
does not have any underflow, overflow, or wasted NaN values. When it comes to replacing the
IEEE standard, Gustafson asserts that posits not only replace it well, but they also deliver better
accurate results with an equivalent or lower amount of bits and less complicated hardware.
1.2 Objectives
In order to see if the posit numerical format can replace the present IEEE 754 flying number
format, the fundamental goal of this research paper is to find out. Due to the fact that this is a
large issue, we shall concentrate our efforts on the following objectives:
Create an IEEE 754 counterpart of an IEEE 754 posit operation using programmable logic to
see how well the hardware architecture compares.
2
CHAPTER-2
LITERATURE SURVEY
1.Beating Floating Point at its Own Game: Posit Arithmetic
A new data type called a posit is designed as a direct drop-in replacement for IEEE Standard
754 floating-point numbers (floats). [5]Unlike earlier forms of universal number (unum)
arithmetic, posits do not require interval arithmetic or variable size operands; like floats, they
round if an answer is inexact. However, they provide compelling advantages over floats,
including larger dynamic range, higher accuracy, better closure, bitwise identical results across
systems, simpler hardware, and simpler exception handling. Posits never overflow to infinity or
underflow to zero, and “Nota-Number” (NaN) indicates an action instead of a bit pattern. A
posit processing unit takes less circuitry than an IEEE float FPU. With lower power use and
smaller silicon footprint, the posit operations per second (POPS) supported by a chip can be
significantly higher than the FLOPS using similar hardware resources. GPU accelerators and
Deep Learning processors, in particular, can do more per watt and per dollar with posits, yet
deliver superior answer quality.
3
The recently proposed posit number system is more accurate and can provide a wider dynamic
range than the conventional IEEE754-2008 floating-point numbers. Its nonuniform data
representation makes it suitable in deep learning applications. Posit adder and posit multiplier
have been well developed recently in the literature. However, the use of posit in fused
arithmetic unit has not been investigated yet. [15]In order to facilitate the use of posit number
format in deep learning applications, in this paper, an efficient architecture of posit multiply-
accumulate (MAC) unit is proposed. Unlike IEEE754-2008 where four standard binary number
formats are presented, the posit format is more flexible where the total bit width and exponent
bit width can be any number. Therefore, in this proposed design, bit widths of all data path are
parameterized and a posit MAC unit generator written in C language is proposed. The proposed
generator can generate Verilog HDL code of posit MAC unit for any given total bit width and
exponent bit width. The code generated by the generator is a combinational design, however a
5-stage pipeline strategy is also presented and analyzed in this paper. The worst-case delay,
area, and power consumption of the generated MAC unit under STM-28nm library with
different bit width choices are provided and analyzed.
5
CHAPTER-3
ADDERS AND MULTIPLIERS
3.1 Adders
Due to it being the most often utilised operation, the binary digits A and B are crucial in digital
computers, microprocessors, and digital signal processors, among other things. It serves as a
building component for synthesis in all particular arithmetic operations. Digital circuits such as
adders and summers are used in electronics to add numbers.
When it comes to computing, adders aren't only utilised in the arithmetic logic unit(s) found in
many computers and processor; they're also used in other areas of processors for various
activities such as estimating addresses and table indexes.
No matter how many other numerical implementations adders may be built for, the most
popular adders work on binary values, such as BCD or Excess-3. When two's complement or
one's complement are used to represent signed numbers, changing an adder into an adder-
subtract is of little consequence; other ways of representing signed numbers need a more
complicated adder.
The binary adder designs, however, show out to be a particularly dangerous hardware unit in
terms of an ALU's ability to be executed properly. In any laptop arithmetic textbook, it seems
that a wide range of circuit topologies with varying performance characteristics and widespread
usage exist.
Regardless of the fact that many studies on binary adder designs are complete, there are few
studies based on an analysis of their respective overall performance. This mission's evaluations
of the labelled binary adder systems have been positive. In order to illustrate the identical
performance of the RCA and Carry Select adders, we developed (description language) code
for a few of the large members of the adders at very fast speeds.
6
A half wave rectifier is a 1-bit multiplexer that adds two inputs: one is Sum and carry are
generated from a single bit. Carry propagation is used to achieve the addition of several bits,
and the final result is 2c + s.A basic logic gate may be created by combining the AND basic
gate with the XOR special gate.
A one-bit adder is being constructed. A full-adder may be created by combining two half-
adders with an additional OR gate and a 1-bit addition performer. It's easy to see in Figure 3.2
how the 1-bit adder takes two inputs (X0 and X1) and creates outputs by adding them. The
outputs, which are called Sum and Cout, are the letters Y.
7
It is necessary to use two NAND gates in order to retrieve the sum of two NAND gates'
outputs. The sum of these two NAND gates' outputs is then supplied to a third NAND gate,
and so on. A NAND gate is used to produce the output Cout by taking as inputs a and b.
1-bit adders combine binary integers and store the resulting sums in memory. Two operands,
X0 and X1, and Cin are used in a 1-bit FA to add three 1-bit values. C in is the carry output from
the preceding least significant bit. A cascade of adders often includes a full-adder as a module.
Which multiply 8-bit binary values by a factor of 16 or 32 or 64. The outputs of a full adder are
denoted by the symbols Cout and S, which are sum and carry, respectively.
Assume that Sum=2 X Cout+S, the below figure shows full adder diagram.
8
Table 3.2: Full adder truth table
To provide an n-bit wide variety, several complete adder circuitries can be cascaded in
simultaneously. N-bit parallel adders need N different types of complete adder circuits to
function properly. When using a ripple carry adder, the carry out of each adder circuit is the
haul in of the next full adder, and so on. The ripple carry adder name comes from the fact that
each carry bit ripples onto the next level. When using a binary adder, none of the full adder
outputs are true until all of the preceding carries have been produced. After then, the logic
circuitry has propagation delays. An input's utility is measured by propagation delay, which
measures how long it takes for a given output to spread. Now consider a NOT gate, where the
output might be "1" or "0" depending on the value of the input. The propagation delay is the
time it takes for not output to go from "1" to "0" once logic "1" is applied to the NOT gate's
9
entry. Also known as the carry propagation delay, this refers to the time passed between
sending a carry in signal and receiving a carry out signal (Cout). The following code
Partial products are formed by an array of AND gates in a parallel multiplier. When using a
multiplier, it's important to keep in mind that the most time-consuming portion is adding up all of
the partial products. When using the Wilson procedure, the intermediate results are reduced as
quickly as possible. When using the dadda method, the intermediate results are reduced in same
lot of scales as when using the Wallace method. This leads to a design that uses as there are less
full adders and ½ adders than when using the Wallace method. As a result, there is a reduction in
power usage. Because the dadda technique uses a broader and faster CPA, and because of this, it is
less regular in structure than the Wallace tree.
As N, the number of bits in the operands, increases, so does the number of Half-adders and Full-
adders need for a Dadda multiplier, there are N full-adders
11
CHAPTER- 4
EXISTING SYSTEM
4.1 The Unum Number Format
The IEEE 754 math standard is being challenged by the emerging universal number (Unum)
format, which uses an arithmetic format comparable to floating point. This chapter outlines in
detail Gustafson's planned replacement for floats in 2017: the posit integer format (Type III
unum). The understanding of Type 1 and 2 Unum’s is helpful in understanding propositions.
For this reason, the premises are discussed first, followed by their description. Finally, we'll go
through some of the new number format's features and benefits.
Type I Unum floats are to integers what integers are to floats. It's a collection of collections
stacked on top of each other In the case of computations that cannot offer the numerically
accurate result and in traditional floating-point arithmetic rounding to be done, they may
indicate either a precise float or an open interval between neighboring floats. Unum is tasked
with doing this. Finish the fraction with a "ubit" (uncertainty bit) to indicate whether the
fraction corresponds to a precise number or an interval, depending on whether or not the ubit is
equal to 0.
The IEEE 754 hanging scheme's sign, exponent, and fraction (or mantissa) bit fields are
likewise supported by the Category I Unum format. Exponent and fraction fields in this format
may be as little as one bit or as long as the user desires. As a result, the Unum scheme now
includes values for exponent length and fraction size that indicate the lengths of the
12
fields for the exponent and percentage of the number in question Figure 4.1 depicts this format
definition. Interval arithmetic can be expanded naturally using Type I Unum, but their
changeable length necessitates hardware control. More information about the format's proposal
and rationale may be found there as well.
Certain of the initial Unum's flaws, such as hardware implementation difficulty or the fact that
some values may be represented in multiple ways, were addressed with the Type II Unum. As a
result, IEEE floats will no longer work with this version. An alternative approach is presented
by Type II Unum, which utilizes an elegant, mathematical mapping of variables onto to the real
spatial line (the set R= R U). The key concept is that the point when positive real numbers
convert into negative numbers (and the same ordering), and reflect the value, is where designed
(two-word) numbers transition from positive to negative. Figure 4.2 depicts the Type II Unum's
structure.
There is an organized collection of real numbers in the circle's top right quadrant. Xi, While the
Xi negatives are concentrated in the top left quadrant, a reflection on the vertical axis may be
seen.
This is a reflection on the horizontal axis, with the inverses of the numerals on the upper half of
the circle. If we have a value, we can use vertical and lateral reflections to acquire the opposite
and reciprocal values. To reiterate what has already been said about type I and type II Unum’s,
the open space here between surrounding reals is represented by Type II Unum’s ending in 1
(also known as the ubit). Since projective real numbers have many perfect mathematical
qualities based on their geometry, Type II Unum uses take a gander table for most operations.
This severely restricts the ultra-fast format's capacity to go beyond
13
With today's memory technology, it drops to 20 bits or less. Furthermore, in this format, fusion
operations like dot product are highly costly. For these reasons, the quest was on for a new file
format that would preserve as much as possible of both the Type II Unum characteristics while
still being more "hardware-friendly".
Posit's interval arithmetic variant, a valid, is used instead. A pair of equally-sized posits, both
terminating in a ubit to indicate the boundaries, make up the structure. Valid, on the other hand,
are not the primary emphasis of this dissertation, and Gustafson has not yet formally revealed
the format's specifics.
14
4.2 The Posit Format
It has four fields: sign, mode, exponent and fraction. This is called the posit format.
If a number is positive or negative, the sign bit will be set to 0. Otherwise, it will be set to 1.
Before getting the regimes, exponent, and fraction fields, you have to take the 2's complement
of the bits that are left. The numeric format is not permitted in this field. It's used to figure out a
used k scale factor, with used equal to 22es.Number many identical bits followed by an
opposing bit determines the value of k. If the regimes field has leading 0s, then k = m; if it
contains leading 1s, then k = m 1.
The transformation function 2e is represented by the exponent bits (shown in blue), which
encode the value e. No bias exists, as opposed to floats. Due to the varying length of the
regime, there may be up to es exponent bits since the first bit of this field follows the regime
field immediately (so the possibility of no exponent bits exists).
In the fraction field, the bits that remain correlate to the fraction value f, which is represented
by the fraction field's remaining bits (f). However, there is an important distinction when it
comes to positing the IEEE float (also known as significant or mantissa).
No subnormal integer has a hidden bit value of 0, hence the hidden but is always 1. following
IEEE 754 as a guideline. As a result, the decimal position value is Inputs used: K, 2e, 1, and
1+f It goes without saying.. (3) where: scale factor is equal to 22es, k is the number of the
phase field, e is the exponent field's value and f is its fraction field's value; these are the fields
to be used.
15
As can be seen, the total number of bits (n) as well as the total amount of exponent bits totally
dictate any posit arrangement (es). The form Posit (n, es) is often used to represent a set of n-bit
posits with exponent bits of es.
16
floats in other applications. Long arithmetic sequences may be created with just one rounding
mistake by using the quire register, and in certain cases, 16-bit positions can even substitute 64-
bit floats.
Notice that the 2nbits-4 ratio of maximum position to minimum position is employed.
The posits' dynamic range Remember that the unit of currency being used is the 22-dollar bill.
The dynamic range of the posit format is an exponential of an exponential of an exponential by
using regime bits to elevate used to the power of any integer from -n bits + 1 to n bits - 1. Plots
may thus provide more accurate results by using fraction bits rather than the exponent bits used
in IEEE floats, and hence have a broader dynamic range. To put it another way, the es key has a
lot of power, so use it wisely. More above 5 may strain your computer's memory. For example,
writing the precise value of min pos with es = 5 takes many gigabytes of disc space.
A Nar or unlawful operation will result in a Nar as an output. A regular number or 0 must be
returned otherwise. Saturating the output is the answer to overflow and underflow. It is reduced
to the max pos or raised to the min pos if a favorable outcome exceeds the max pos or the min
pos
respectively, min pos the output of the function must also be negative and range from -max pos
to -min pos. The criteria for rounding are straightforward as well. Rounding to the closest value
is the only option; the result will be rounded to the next higher number. IEEE 754-2008, on the
other hand, outlines a wide range of rounding options. These modes add complexity to hardware
implementations. In the end, posit is a new number system that outperforms the IEEE 754-2008
standard for floating-point numbers in certain situations. Using it is a breeze because of its ease
of use, flexibility, simplicity, and overall aesthetic appeal. It is also easier to implement Posit in
hardware than IEEE 754-2008. Posit's supremacy has led to a number of Posit hardware
implementations. This paper introduced a parameterized adder/subtractor. in which
parameterized adder/subtractor and multiplier were discussed. Leading bits were determined by
using both the leading one’s detector (LOD) and the leading zeros detector (LZD).
17
Fig 4.4: POSIT component extraction
These two pieces of work transformed positive Assert numbers to the equivalent opposite
numbers in order to handle negative Posit numbers. Even if their techniques are
straightforward, the underlying technology isn't. LOD and LZD create redundant area when
converting their supplement to sign and magnitude, respectively. The initial flaw was fixed by
another piece of art displayed there. Only LZD is being used in the algorithm. Unfortunately,
the second flaw remains, preventing this effort from fully using Posit's benefits. Reference
suggested a Newton–Raphson parameterized divider based on this research. Besides studying
Posit's software applications, Reference also provided a basic hardware implementation. Posit
is encoded as a sign and a magnitude in this implementation. This ingenious idea reduces the
complexity of circuits; however, it does not adhere to the Posit criteria. Our approach resolves
these two issues, lowering our implementation costs while also providing a comprehensive
resolution for Posit arithmetic.
4.3.1 Extracting the sign bit
Positive bit strings have a range from negative one to positive one, depending on how they're
read as 2's complement integers. This discussion will have to make due with integers between 0
and n pat - 1 since Mathematica doesn't natively support 2's complement integers. The posit bit
18
string is represented by the unsigned integer n pat /2. has a posit bit string that looks like this
when using signed integers: -n pat / 2.
The posit Q test gives a positive result. If an input number (a bit stream) is a valid position,
then this function returns True; otherwise, it returns False.
You can easily get the most significant part of the positive or negative integer (including zero)
by removing the sign bit, which is the most important bit. Long binary strings can be read more
quickly because to the chroma of bits, which is utilized for all kinds of Unum’s. A red indicator
bit (RGB components: 1, 0.125, 0) indicates that the value is positive.
Because the posit bit strings span from 0 to n pat - 1, the posit Q function prevents us from
assessing the sign bit of the out-of-range integer n pat (when viewed as unsigned integers). -
32p 32 is the minimum requirement in the C environment, but in Java, you may get away with
anything between zero and sixty-four as long as you have a signed integer as input. (When
To conduct an out-of-domain action, Mathematica simply repeats the as when it returns "sign
bit" above, an expression is returned to the user).
19
try a negative one to see how the 0 bit run changes to a 1 bit one. Note that flipping all the bits
and adding 1 result in 011110 + 1 = 011111, which is the 2's complement of 100001.
4.3.3 Extracting the exponent bits
The exponent bits are located after that. Even though es is more than zero, the regime bits may
push part or all of the exponent bits to the right side of the integer, resulting in no exponent
bits. 011111 and 011110, for example, have no exponent bits. Bits with exponents begin to
emerge for regimes with shorter bit runs, such as 011100 and 011101. We skip over the
termination bit in the following code and instead look at next es bits or whatever many are left,
depending on how many are smaller. the empty set to a series of bits of any size may be
obtained length of es. The power of 2 that these exponents bit strings convey is just an
unsigned integer since there is no "bias" in them as with floats. Blue is the colour designation
for exponent bits (RGB value 0.25, 0.5, 1). If 110 = 6 is the exponent bit value, then the scale
factor that the exponent bits contribute is 26. As a result, the exponent will be 0 and the scaling
factor will be 20 = 1, when there are no exponent bits.
4.3.4 Extracting the fraction bits
Now that we have all four components of both the posit bit string, we can figure out what value
each one represents directly. If the other bits haven't pushed them off the right end, the
fractional bits are all the ones to the outside of the exponent. They may have (in which case, the
fractional value is 1). When a hidden bit is not present, it will always be 1 (i.e., When
compared to the exponent bits function, the only differences are that the fraction bits function
uses a start bit value that's also es bits farther to the right and that it takes all remaining parts
instead of just the last one. This is how the accuracy of propositions varies according to the
scale factor. It is possible to describe a very precise fraction even when the scale factor is
modest. There are less bits left for precision the larger or smaller a posits magnitude.
4.4 Posits as Projective Reals
Possibilities are best comprehended geometrically. This is because the projective reals map to
the Type II Ipsa, which is where Type 3 Posit Arithmetic comes from. While this is the case
for Type II and Type III, the constraint that all values have an opposite is eased, it is possible
to acquire the contrary values of a single integer by turning the central line, but only for 0, and
powers of 2. The two-bit template is used by Type II and Type III Unum.
20
Fig 4.5: Two-bit numbers mapped over the projective reals.
As real numbers go from positive to negative, so do the bit strings that surround the ring (which
may be thought of as 2's complement signed integers). Thus, floats no longer have a "negative
zero," and the - and + signs are grouped into a single sign. To produce a three-bit ring, we may
add a value within 1 and to the aforementioned two-bit ring (along with the appropriate
negation and reciprocation acquired from reflections along the vertical axes, respectively).
However, it must be noted that this option "seeds" the way the remainder of the unum ring is
filled (all of its positive values are power values of that value), which is why this amount is
referred to as the used value.
In order to improve processor efficiency, it is critical to improve multipliers' speed and
power/energy efficiency properties. For the most part, digital signal processor (DSP) cores are
used to process pictures and movies for human consumption. Since this is the case, we may
employ approximations to increase performance while also saving energy. This is due to
human beings' limited visual and auditory perception capacities while watching a picture or a
video.
There are additional applications than video compression where the accuracy of mathematical
operations is not important to the system's functioning. Assumptive computing allows the
designer to make compromises between accuracy and speed while also reducing power and
energy usage. Design abstraction levels such as circuit, logic, and architecture, along with
algorithm & software layers may all be used to apply the approximation to arithmetic units.
You may use a variety of approaches to approximate, including timing violations (like voltage
scaling or overclocking) and function approximation methods (such changing a circuit's
21
Boolean function), or any combination of these techniques. Many approximate arithmetic basic
elements, such adders and multipliers, have been proposed at various design levels under the
domain of function approximation approaches. Our primary goal in this study is to provide a
high-speed, low-power/energy, yet close multiplier suitable for DSP applications that must be
error-tolerant. Assuming rounded input numbers, the suggested approximation multiplier
modifies the usual multiplication technique at the algorithm level. Rounding-based
approximate (RoBA) multipliers is the name given to it.
Both signed and unsigned multiplications may be used with the proposed technique, and three
optimal designs are shown. They compare the delays, power and energy consumption and
energy delay products (EDPs) of these structures with certain approximate and precise (exact)
multipliers, in order to determine their efficiency.
The following are the paper's main contributions: A novel RoBA multiplication strategy is
shown, which modifies the usual multiplication technique; two hardware designs of the
proposed approximation multiple system for signed and unsigned operations are described
4.5. Multiplication Algorithm of ROBA Multiplier
When working with two-to-the-power-n numbers, the key goal is to make the operation as
simple as possible for the approximation multiplier (2n). To better understand the
22
approximation multiplier's function, let's use Ar and Br to symbolize the rounded inputs of A
and B. The formula for multiplying A by B is as follows:
If (Ar - A) is divided by (Br – B), then A = B A big thank you to everyone who has contributed
(4) Important finding is that Ar Br (and B) (and A) multiplications all occur in the same way.
only the shift operation can put this into practise. (Ar implementation) in hardware.
However, A) (Br B) is rather difficult. This word has a modest impact on the final outcome
since the difference between the precise and rounded figures determines how much weight it
will have. Because of this, we advise removing it from the multiplication equation to make it
more straightforward. As a result, the following equation is used to carry out the multiplication:
What is A X B? A = Ar X B + Br X A- .......................................... (5) This strategy calls for
finding the 2n-th closest values of A and B. when p is a positive integer more than one, the
value of A has two closest values of 2n having equal absolute differences: 2p and 2p1 when p
is a positive integer greater than one. There's no difference between them in terms of accuracy,
but using the bigger one (except for p = 2) results in less hardware implementation to get the
closest-rounded number, so that's what we're going to look at in this study.
If you round up 3 2p2, you don't have to worry about simplifying the process since the numbers
in that format don't care. If you round down, you have to worry about simplifying the
procedure. The single exception is the number three, for which the suggested approximation
multiplier uses two as its closest value
Unlike prior work where approximate result was smaller than the precise result, it should have
been noted that the RoBA multiplier's ultimate result may be either greater or less than the actual
measurement based on the order of magnitude of Ar and Br in comparison to those of A and B
23
Fig 4.7: Block diagram for the hardware implementation of the RoBA multiplier
respectively. The estimated result will be greater than the precise result if one of operands (say A)
is lower than its corresponding adjusted value while another operand (say B) is bigger than its
equal rounded value. This is because (Ar A) (Br B) has a negative multiplication result in this
example.
In this case, the approximation is more important than exactness, since their difference is exactly
this product. This is because the estimated result will just be smaller if A is greater (or smaller)
than Ar (or if B is larger or smaller than Ar). It's also worth noting that the suggested RoBA
multiplier only has an advantage when dealing with positive inputs, since negative inputs' rounded
values in the two's complement representation aren't in the form of 2n.
Then conduct the multiplication process for unsigned integers, and finally sign-apply the correct
sign to unsigned results after determining both the absolute values of inputs and output signs
based on the input signs has been done before starting the multiplication operation.
24
CHAPTER- 5
PROPOSED SYSTEM
25
Table 5.1: Truth table of HNG reversible logic
multiplier.
This array multiplier multiplies two 4-bit binary numbers by using an array of full adders.
Simultaneously addition of the different product term is done in this array. By using the
multitude of AND gates, the partial product terms are generated. Following this array of
26
AND gates, the adder array is used. The hardware structure for a p x q bit multiplier is
described as p x q AND gates and (p-1) adders. Array multiplier is doing the multiplication
process in traditional way. It looks like regular structure. Hence routing and layout design are
implemented in a much uncomplicated manner. Implementation of this multiplier is obvious
but it requires larger area with considerable delay [20]. The major eminence of the array
multiplier is that it has less hardware complexity, easily scalable, easily pipelined and also
easily place and route.
B. Proposed Multiplier Design Our proposed multiplier architecture circuit has two parts.
Primarily, the partial products are accomplished using AND gates. The realization of HNG
gate as AND gate has been done if input vectors of HNG gate is IV= (A, B,0,0), the output
vectors will
be OV (P=A, Q=B, R=AB, S=AB). Therefore, S output gives the logical expression of AND
gate as c and d inputs are ‘0’.
27
noteworthy impact on the whole performance of the system in regards with power
consumption, delay and area occupancy.
CHAPTER- 6
SOFTWARE IMPLEMENTATION USING XILINX
In this section, we'll go through the project's software requirements. Xilinx ISE is the
programme utilized in this case. Verilog HDL, a programming language, may be used to
create the programme code. The behavior of the system is also generated for the specific
target device.
6.1 Xilinx Introduction
Xilinx Tools is a collection of digital circuit design software tools. Field-programmable gate
arrays (FPGA) as those from Xilinx or CPL devices are used (CPLD). For example, there are
four stages in the design process: (a) the initial concept phase, (b) the actual design phase,
and (c) functional simulation. A prototype entry tool, a Verilog hardware description (HDL) -
VHDL or VHDL, or a mix of the two CAD tools may be used to input digital designs. Only
the design flow using VHDL HDL will be used in this lab.
As long as you start with VHDL HDL design requirements, you may use the CAD tools to
create combinational and sequential circuits. The following are the stages in this design
process:
a) Using a template-driven editor, create Verilog code design input files. Compile and
interpret the VHDL design file
(b).design-simulation
(c) create the test vectors (functional simulation) instead of making use of a PLD (FPGA or
CPLD).
(d) Assign the design's input/output pins to a target device to implement it.
(e)Use an FPGA or CPLD to download the bit stream.
(f)Testing on FPGA/CPLD device design
The following segments make up a VHDL input file in the Xilinx software environment:
28
First line of code: module name and a list of ports for input and output. Declarations: ports,
registers, and wires. Input/output ports. Equations, state machines, and logic functions are all
examples of logic descriptions. The module comes to an end at this point. You must use the
VHDL input format described above for all of your lab designs. For combinational logic
architectures, the state diagram section does not exist.
29
Fig 6.1: Xilinx project navigator window (snapshot from Xilinx ISE software)
Getting a project off the ground to begin a new project, choose File->New Project from
the menu bar. This will open a new window on your desktop for the current project.
Complete the fields as follows:
30
Fig 6.2: New project initiation window (snapshot from Xilinx ISE software)
31
The following window should appear after clicking NEXT:
Fig 6.3: Device and Design Flow of Project (snapshot from Xilinx ISE software)
Click on the "value" section and choose the appropriate value for each of following properties:
from the resulting drop-down menu of options.
Device Family: The FPGA/CPLD family that is being utilized. The ZYNQ FPGAs will be
used in this lab.
Device: The unique identifier for the underlying hardware. You may use the code XC7Z010
for this particular experiment.
Package: The form factor and number of pins in a package. For this experiment, we're using
a ZTNQ FPGA in a CLG400 package.
It's a "-3" on the speed scale.
Software used for synthesis: XST
The instrument used to model and test the design's functioning is known as a simulator. An
integrated ISIM (VHDL/Verilog) model is included in the Xilinx ISE design environment. In
this case, the simulator of choice should be "ISIM (VHDL/Verilog)"; nevertheless, Xilinx
ISE Simulator may also be utilized
In order to save your entries, go ahead and click on NEXT.
32
A subfolder with the project name will hold all project files, including schematics, net lists,
and VHDL files. Only one top-level HDL source file may be used in a project (or schematic).
A component, hierarchical design may be created by adding modules to the project.
The remaining fields may be left as-is. After that, click on the following link:
Now, a new window will open, displaying the project's file name (RoBA) along with its
speed grade and package.
Select Add Source from the context menu when you right-click on the name.
33
In Xilinx Tools, choose File->Open Project to get a list of all the projects currently open. Click
OK when you've selected your project.
If the project isn't already installed on your computer, go to the Add Source menu, select it, and
then browse for and select all of the executable files.
After that, the following window will appear on your screen:
Fig 6.5: Create new source window (A Snapshot from Xilinx ISE Software)
To proceed, verify that all files have green checkmarks but then click on ok. After that, a new
Editor window will appear, revealing that all of the files have the v file extension.
34
Fig 6.6: Editor window
Select uut-roba(roba multiplier.v) and set I as a TOP MODULE as show below
35
Now select the test. v file then check for syntax errors (if any) which show in
SYNTHESIZE XST and IMPLEMENT DESIGN.
Process “Check Syntax” and “Generate Post-Place & Route Static Timing” completed
successfully will appear in the console window at the bottom.
36
CHAPTER- 7
37
7.3 Rtl and Technology Schematics for Existing Method
38
Fig 7.3: Rtl schematic for dsr-left-n-s
39
Fig 7.5: Reversible 8-bit posit multiplier
40
7.5.2 Delay
41
7.6 Synthesis Result: Propose Method
7.6.1 Area
7.6.2 Delay
42
7.7 Comparison Table
Comparing the 8-bit POSIT multiplier to the Rounded Based Algorithm (RoBA) multiplier, it
can be shown that the two have comparable power characteristics. However, the 8-bit modified
Rounding Being Its multiplier reduces area and latency by roughly the same amount as the 8-bit
Surmise multiplier in terms of technical complexity. The updated ROBA multiplier, as
compared to the Posit multiplier, simplifies the circuitry even more with its schematic
implementation of an 8-bit Posit multiplier
43
CHAPTER 8
CONCLUSION & FUTURE SCOPE
We presented the RoBA multiplier, a fast yet energy-efficient approximation multiplier. Inputs were
rounded in the shape of 2n to create the high-accuracy multiplier. Because the computationally
difficult component was eliminated, performance and energy consumption were both improved at
the expense of a very minor inaccuracy. Signed & unsigned multiplications might both benefit from
the new technique.
By using Reversible Multigate operations we can reduce the Area as well as Delay, when compared
to the previous Models.
We also extended this project with the reversible gates to implement 16,32 bit posit multipliers and
also ALU Pipelining systems and MAC Units implementing.
44
REFERENCES
[1] D. Goldberg, “What every computer scientist should know about floating-point
arithmetic”, ACM Computing Surveys (CSUR), vol. 23, no. 1, pp. 5–48, Mar.
1991. do i:./.10.1145/103162.103163
[2] IEEE Computer Society Standards Committee and American National Standards
Institute, “IEEE Standard for Binary Floating-Point Arithmetic”, ANSI/IEEE Std
754-1985, 1985. Do i: 10.1109/ieeestd.1985.82928.
[3] “IEEE Standard for Floating-Point Arithmetic”, IEEE Std 754-2008, pp. 1–70,
2008. do i: 10.1109/ieeestd.2008.4610935.
[4] J. L. Gustafson, The End of Error: Unum Computing. CRC Press, Feb. 5, 2015, vol.
24, is bn: 9781482239867.
[5] J. L. Gustafson and I. T. Yonemoto, “Beating Floating Point at its Own Game:
Posit Arithmetic”, Supercomputing Frontiers and Innovations, vol. 4, no. 2, pp.
71–86, Jun.2017. do i: 10.14529/jsfi170206.
[6] L. van Dam, “Enabling High Performance Posit Arithmetic Applications Using
Hard- ware Acceleration”, Master’s thesis, Delft University of Technology, the
Netherlands, Sep. 17, 2018, is bn: 9789461869579.
[7] A. A. D. Barrio, N. Bagherzadeh, and R. Hermida, “Ultra-low-power adder stage
design for exascale floating point units”, ACM Trans. Embed. Compute. Syst., vol.
13, no. 3s, 150:1–150:24, Mar. 2014. Do i: / 10.1145/2567932.
[8] J. L. Gustafson. (Oct. 10, 2 0 1 7 ). Posit Arithmetic, [ Online]. Available:
4 https://posithub.org/docs/Posits4.pdf
[9] J. L. Gustafson, “A Radical Approach to Computation with Real Numbers”, Su-
precomputing Frontiers and Innovations, vol. 3, no. 2, pp. 38–53, Sep. 2016. doi:
.10.14529/jsfi160203.
[10] P o s i t Working Group. (Jun. 23, 2018). Posit Standard Documentation, [Online].
Available: https://posithub.org/docs/posit_standard.pdf .
[11] R. Munafo. (2018). Survey of Floating-Point Formats, [Online]. Available:
http://www.mrob.com/pub/math/floatformats.html
[12] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,
M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R.
Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,
45
I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O.
Vinyals, P. Warden. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. (2015).
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software
available from tensorflow.org, [Online]. Available: https://www.tensorflow.org/.
[13] S. van der Linde, “Posits also vervanging van floating-points: Een vergelijking
van UnumType III Posits met IEEE 754 Floating Points met Mathematica end
Python”, Bachelor’s Thesis, Delft University of Technology, Sep. 26, 2018.
[14] M. Klöwer, P. D. Düben, and T. N. Palmer, “Posits as an alternative to floats
for weather and climate models,” in Proc. Conf. Next Gener. Arithmetic, Mar.
2019, pp. 1–8
[15] H. Zhang, J. He, and S.-B. Ko, “Efficient posit multiply-accumulate unit generator
for deep learning applications,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
Sapporo, Japan, May 2019, pp. 1–5.
[16] Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D.
Kudithipudi, “Performance-efficiency trade-off of low-precision numerical
formats in deep neural networks,” in Proc. Conf. Next Gener. Arithmetic, Mar.
2019, pp. 1–9.
[17] Hao Zhang , Member, Ieee, And Seok-Bum Ko , Senior Member, Ieee”Design Of
Power Efficient Posit Multiplier” Ieee Transactions On Circuits And Systems—Ii:
Express Briefs, Vol. 67, No. 5, May 2020
46