0% found this document useful (0 votes)
101 views58 pages

My Project

This project report discusses the design of an efficient posit multiplication architecture for digital signal processing applications. Posit is an alternative floating point format to the widely used IEEE 754 standard. The proposed posit multiplier architecture divides the mantissa multipliers into smaller multipliers to reduce delay. Only the required smaller multipliers are enabled during runtime based on the regime bit-width. This achieves lower latency compared to a single large multiplier. The report also presents reversible logic gates as an approach to further reduce area of the posit multiplier design. Reversible logic allows the inputs to be recovered from the outputs, reducing hardware costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views58 pages

My Project

This project report discusses the design of an efficient posit multiplication architecture for digital signal processing applications. Posit is an alternative floating point format to the widely used IEEE 754 standard. The proposed posit multiplier architecture divides the mantissa multipliers into smaller multipliers to reduce delay. Only the required smaller multipliers are enabled during runtime based on the regime bit-width. This achieves lower latency compared to a single large multiplier. The report also presents reversible logic gates as an approach to further reduce area of the posit multiplier design. Reversible logic allows the inputs to be recovered from the outputs, reducing hardware costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 58

A Project Report on

DESIN OF HIGH PERFORMANCE POSIT


MULTIPLCATION FOR DSP
APPLICATIONS
is submitted in partial fulfillment of the requirement for the Award of the Degree of
MASTER OF TECHNOLOGY
in
VERY LARGE-SCALE INTEGRATION
By
V.MANASA
(20711D5702)

Under the guidance of


Mr. A. SIVA SAI KUMAR, M.Tech
Assistant Professor

Department of Electronics and Communication Engineering

September 2022

i
Department of Electronics and Communication Engineering

CERTIFICATE

This is to certify that the project report entitled “Design of High Performance Posit
Multiplication for DSP Applications” being submitted by V. MANASA (20711D57082), in
partial fulfillment for the award of the Degree of Master of Technology in VLSI to Narayana
Engineering College, Nellore is a record of bonafied work carried out by her under my
guidance and supervision.
The results Embodied in this project report have not been submitted to any other University or
Institute for the award of any Degree or Diploma.

Project Supervisor Head of the Department

Mr. A. SIVA SAI KUMAR, MTech Dr. K. MURALI, MTech, PhD


Assistant Professor HOD
Project Supervisor Department of ECE

Submitted for the viva-voice examination held on:___________________

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
DECLARATION

I hereby declare that the project entitled,” Design of High Performance Posit Multiplication
for DSP Applications” completed and written by me has not been previously formed the basis
for the award of any degree or diploma or certificate.

Place: Nellore
Date: V. MANASA

iii
ACKNOWLEDGEMENT
I am extremely thankful to Dr. P. NARAYANA,Ph.D Founder, Narayana Educational
Institutions, Andhra Pradesh for the kind blessings. I am extremely thankful to Mr.R.
Sambasiva Rao B.Tech, Registrar Narayana Engineering College, Nellore.
I am much obliged to Dr. A.V.S Prasad, Ph.D Director, Narayana Engineering &
Pharmacy Colleges, for the continuous encouragement and support. I am owe indebtedness to
our Principal Dr.G. Srinivasulu Reddy, M.Tech,Ph.D, Narayana Engineering College, Nellore for
providing us the required facilities.
I would like to express our deep sense of gratitude and sincere thanks to Dr. K.
Murali, M.Tech, Ph.D , Professor & HOD, Department of Electronics and Communication
Engineering, Narayana Engineering College, Nellore for providing the necessary facilities and
encouragement towards the project work.
I am thankful to our Faculty In charge - project, Dr. K. SELVAKUMARASAMY,
M.Tech, Ph.D, professor, Department of Electronics and Communication Engineering, Narayana
Engineering College, Nellore for his guidance and support in the completion of the project.
I would like to thank our project guide, Mr.A. Siva sai Kumar, Assistant
Professor, Department of Electronics and Communication Engineering, Narayana Engineering
College, Nellore for his guidance, valuable suggestions and support in the completion of the
project.
I am gratefully acknowledged and express our thanks to teaching and non-teaching
staff of ECE Department and especially to my parents who helped me thoroughly for shaping
the things well in order.

Project Associate
V.MANASA
(20711D5702)

iv
ABSTRACT
This standard for floating-point arithmetic (IEEE 754) has been in use for many years and
is incorporated in most current computer systems, making it the most widely used. In
recent years, John L. Gustafson has suggested a new numerical representation format
dubbed posit (Type III unum), which he believes may achieve superior precision using
equal or fewer bits and simpler technology than the existing IEEE 754 arithmetic.

The new posit numeric format, its features and qualities, as well as the standards for
floating-point numbers, are examined and contrasted in this Bachelor's thesis (floats).
posits are tapered precision number to replace IEEE floating point. It provides more
precision and lower complexity implementations than IEEE floating point.in this project
area efficient posit multiplier architecture is proposed. The mantissa multipliers are
designed for the maximum possible bit-width however the whole multiplier is divided into
multiple smaller multipliers. Only the required small multipliers are enabled at run-time.
Those smaller multipliers are controlled by the regime bit-width which can be used to
determine the mantissa bit-width.by using this method it can achieve less delay. There are
several ways to reduce the area and latency of the system. Here we have introduced a new
approach dubbed the RoBA (Rounding Based Approximate) multiplier.

We are extending posit multiplier with replacing of reversible gates and providing the
better multiplication operation when compared to the posit multiplier system. Reverse
logic is very important in low area circuit design. The important reversible gates used for
reversible logic synthesis are Feynman gate, Fredkin gate, HNG gate etc. Reversible logic
gate is an n-input-output logic device with one-to-one mapping. This helps to determine
the outputs from the inputs and also the inputs can be uniquely recovered from the outputs.
Multiplication computing can decrease the complexity with an increasing performance and
power efficiency for error resilient applications. The proposed multiplication technique is
utilized in two variants of 8-bit multipliers. The effectiveness of the proposed design is
synthesized and simulated using Xilinx softwa

v
CONTENTS
TITLE PAGE N0
CHAPTER-1 INTRODUCTION 1-2
1.1 Motivation 1
1.2 Objectives 2
CHAPTER-2 LITERATURE SURVEY 3-5
CHAPTER-3 ADDERS AND MULTIPLIERS 6-11
3.1 Adders 6
3.2 Types of Adders 6
3.2.1 Half adder 6
3.2.2 Full adder 8
3.2.3 Ripple carry adder 9
3.3 Multipliers 10
3.3.1 Dadda multiplier 11
CHAPETR-4 EXISTING SYSTEM 12-22
4.1 The Unum Number Format 12
4.1.1 Type-1 and Type-2 Unm’s 12
4.1.2 Type-3 Unm’s 14
4.2 Posit Format 15
4.3 Setting Posit Environment 16
4.3.1 Extracting the sign bit 18
4.3.2 Extracting the regime bits 19
4.3.3 Extracting the exponent bits 20
4.3.4 Extracting the fraction bits 20

vi
4.4 Posits as Projective Reals 21
4.5 Multiplication Algorithm of Roba Multiplier 22
CHAPTER-5 PROPOSED SYSTEM 25
5.1 Reversible Logic Gates 25
CHAPTER-6 SOFTWARE IMPLEMENTATION USING 28-29
XILINX
6.1 Xilinx Introduction 28
6.1.1 Programmable logic device 29
6.2 Creating a New Project 29
CHAPTER-7 SIMULATION AND SYNTHESIS RESULTS 36-41
7.1 Unsigned 8 Bit Posit Multiplier 36
7.2 Signed 8 Bit Posit Multiplier 36
7.3 Rtl and Technology Schematics for Existing 37
Method
7.3.1 Rtl schematic for posit multiplier 37
7.3.2 Rtl schematic for dsr-right-n-s 37
7.3.3 Rtl schematic for dsr-left-n-s 38
7.3.4 Rtl schematic of transistor level 38
7.4 Rtl and Technology Schematics for Proposed 39
Method
7.4.1 Reversible 8 bit posit multiplier 39
7.4.2 Transistor level of reversible 8 bit posit 39
multiplier
7.5 Synthesis Result of Existing Method 40
7.5.1 Area 40
7.5.2 Delay 40
7.6 Synthesis Result of Proposed Method 41
vii
7.6.1 Area 41
7.6.2 Delay 41
7.7 Comparison Table 42
CHAPTER-8 CONCLUSION AND FUTURE SCOPE 43
CHAPTER-9 REFERENCES 44-45

viii
LIST OF FIGURES

FIGURE NO NAME OF FIGURES PAGE NO


3.1 Half adder block diagram 7
3.2 Half adder logic diagram 8
3.3 Full adder block diagram 8
3.4 Logic Design Of 1-bit full adder 9
3.5 Ripple carry adder 10
4.1 Type-1 Unum bit fields 13
4.2 Visual representation of the projective real number line of type- 14
2 Unum
4.3 General posit format 15
4.4 Posit component extraction 18
4.5 Two- bit numbers mapped over the projective reals 21
4.6 Data path of the proposed posit multiplier 22
4.7 Block diagram of the hardware implementation of RoBA 23
multiplier
5.1 Basic HNG reversible logic 25
5.2 The diagram of 4*4 array multiplier 26
5.3 Shows HNG gate as AND gate 27
5.4 Full adder using HNG gate 27
6.1 Xilinx project navigator window 29
6.2 New project initiation window 30
6.3 Device and design flow of project 31
6.4 Adding source code in to the project 32
6.5 Create new source window 33
6.6 Editor window 34
ix
6.7 Setting code as top module 34
6.8 Checking syntax errors 35
7.1 Simulation result of 8 bit unsigned posit multiplier 36
7.2 Simulation result of 8 bit signed posit multiplier 36
7.1 Rtl schematic for posit multiplier 37
7.2 Rtl schematic for dsr-right-n-s 37
7.3 Rtl schematic for dsr-left-n-s 38
7.4 Rtl schematic of transistor level 8 bit posit multiplier 38
7.5 Reversible 8 bit posit multiplier 39
7.6 Transistor level of reversible 8 bit posit multiplier 39
7.7 Synthesis result of area 40
7.8 Synthesis result of delay 40
7.9 Synthesis result of area 41
7.10 Synthesis result of delay 41

x
LIST OF TABLES
TABLE NO NAME OF TABLE PAGE NO
3.1 Half adder truth table 7
3.2 Full Adder truth table 8
5.1 Truth table of HNG reversible logic 25
7.1 Comparison table 41

xi
xii
CHAPTER -1
INTRODUCTION
1.1 Motivation
They live in a time when computer scientists must balance the competing demands of low cost,
high performance, and low energy use. Large volumes of data, elevated computing (HPC), or
the restricted processing resources available on increasingly common embedded devices are all
driving the development of new computer paradigms right now. Famous catastrophes caused by
floating-point numerical mistakes have happened throughout time, such as Missile System
failure (February 25, 1991), change in German parliament composition (April 5, 1992) or the
launch of the Ariane 5 rocket (June 4, 1996). Problems with the floating-point standard design,
including overflow or rounding mistakes, led to these catastrophic errors.

The Ieee Std for Floating-Point Arithmetic (IEEE 754) is the most widely used implementation
in current computing systems, however several floating-point formats have been utilized in
computers throughout the years. The standard was last updated in 2008 after being introduced in
1985 (IEEE 754).

In order to stay backwards-compatible with previous implementations, it retains many of the


original's key qualities, although it has not yet been accepted by all computer networks. A few
flaws have been found in the IEEE 754 standard; they are detailed below:

It's not necessary to use the same IEEE floating-point format on every machine. Rounding is
used when a calculation does not fit inside the selected number format. Even though the round-
to-nearest algorithm, tie-free zero rounding, and guidelines for reproducible calculations were
all included in the most recent iteration of the standard, hardware manufacturers are still free to
ignore these recommendations. This means that the same calculations on various computer
systems might provide different results.

Bit patterns are used to handle exceptions such as division by zero, which results in a NaN,
including the Not-A-Number (NaN)value, that signals that a number is not recognizable or
undefined. Having too many bit patterns to represent NaNs complicates hardware design and
reduces the number of perfectly representable values that may be used.
1
Large-magnitude finite numbers may be replaced by overflow allowing + or in IEEE 754,
whereas small-magnitude nonzero numbers can be replaced by underflow accepting 0. As a
result, serious issues, such as the ones described above, might arise.

The associativity and distributivity qualities are not always retained in floating-point format
since rounding is done on each operand. Along with the fuse multiply add (FMA)operation, the
most recent iteration of the standard attempts to resolve this problem. However, not all
computer systems will be able to take use of this. Due to the aforementioned issues, it was
created this new number system to replace the widely used IEEE 754 arithmetic. John L.
Gustafson invented the posit integer representation system in the end of 2017, a format that
does not have any underflow, overflow, or wasted NaN values. When it comes to replacing the
IEEE standard, Gustafson asserts that posits not only replace it well, but they also deliver better
accurate results with an equivalent or lower amount of bits and less complicated hardware.

1.2 Objectives
In order to see if the posit numerical format can replace the present IEEE 754 flying number
format, the fundamental goal of this research paper is to find out. Due to the fact that this is a
large issue, we shall concentrate our efforts on the following objectives:

a) To be familiar with the format of the Posit number.


b) To identify theoretical and practical discrepancies between IEEE Standard for Floating-Point
Arithmetic and Type III Unum’s (posits).
b) To examine and compare the usage of posits in Deep Learning with in terms of precision
and speed, the IEEE754 floating-point standard is superior.

Create an IEEE 754 counterpart of an IEEE 754 posit operation using programmable logic to
see how well the hardware architecture compares.

2
CHAPTER-2
LITERATURE SURVEY
1.Beating Floating Point at its Own Game: Posit Arithmetic
A new data type called a posit is designed as a direct drop-in replacement for IEEE Standard
754 floating-point numbers (floats). [5]Unlike earlier forms of universal number (unum)
arithmetic, posits do not require interval arithmetic or variable size operands; like floats, they
round if an answer is inexact. However, they provide compelling advantages over floats,
including larger dynamic range, higher accuracy, better closure, bitwise identical results across
systems, simpler hardware, and simpler exception handling. Posits never overflow to infinity or
underflow to zero, and “Nota-Number” (NaN) indicates an action instead of a bit pattern. A
posit processing unit takes less circuitry than an IEEE float FPU. With lower power use and
smaller silicon footprint, the posit operations per second (POPS) supported by a chip can be
significantly higher than the FLOPS using similar hardware resources. GPU accelerators and
Deep Learning processors, in particular, can do more per watt and per dollar with posits, yet
deliver superior answer quality.

2. Posits as an alternative to floats for weather and climate models


Posit numbers, a recently proposed alternative to floating-point numbers, claim to have smaller
arithmetic rounding errors in many applications. By studying weather and climate models of
low and medium complexity (the Lorenz system and a shallow water model) we present
benefits of posits compared to floats at 16 bits. As a standardized posit processor does not exist
yet, we emulate posit arithmetic on a conventional CPU. [14] Using a shallow water model,
forecasts based on 16-bit posits with 1 or 2 exponent bits are clearly more accurate than half
precision floats. We therefore propose 16 bits with 2 exponent bits as a standard posit format,
as its wide dynamic range of 32 orders of magnitude provides a great potential for many
weather and climate models. Although the focus is on geophysical fluid simulations, the results
are also meaningful and promising for reduced precision posit arithmetic in the wider field of
computational fluid dynamics.

3. “Efficient posit multiply-accumulate unit generator for deep learning application

3
The recently proposed posit number system is more accurate and can provide a wider dynamic
range than the conventional IEEE754-2008 floating-point numbers. Its nonuniform data
representation makes it suitable in deep learning applications. Posit adder and posit multiplier
have been well developed recently in the literature. However, the use of posit in fused
arithmetic unit has not been investigated yet. [15]In order to facilitate the use of posit number
format in deep learning applications, in this paper, an efficient architecture of posit multiply-
accumulate (MAC) unit is proposed. Unlike IEEE754-2008 where four standard binary number
formats are presented, the posit format is more flexible where the total bit width and exponent
bit width can be any number. Therefore, in this proposed design, bit widths of all data path are
parameterized and a posit MAC unit generator written in C language is proposed. The proposed
generator can generate Verilog HDL code of posit MAC unit for any given total bit width and
exponent bit width. The code generated by the generator is a combinational design, however a
5-stage pipeline strategy is also presented and analyzed in this paper. The worst-case delay,
area, and power consumption of the generated MAC unit under STM-28nm library with
different bit width choices are provided and analyzed.

4. Performance-efficiency trade-off of low-precision numerical formats in deep neural


networks
Deep neural networks (DNNs) have been demonstrated as effective prognostic models across
various domains, e.g., natural language processing, computer vision, and genomics. However,
modern-day DNNs demand high compute and memory storage for executing any reasonably
complex task. To optimize the inference time and alleviate the power consumption of these
networks, DNN accelerators with low-precision representations of data and DNN parameters
are being actively studied. [16] An interesting research question is in how low-precision
networks can be ported to edge-devices with similar performance as high-precision networks.
In this work, we employ the fixed-point, floating point, and posit numerical formats at ≤8-bit
precision within a DNN accelerator, Deep Positron, with exact multiply-and-accumulate
(EMAC) units for inference. A unified analysis quantifies the trade-offs between overall
network efficiency and performance across five classification tasks. Our results indicate that
posits are a natural fit for DNN inference, outperforming at ≤8-bit precision, and can be
realized with competitive resource requirements relative to those of floating point.
4
5.Design of power efficient Posit Multiplier
Posit number system has been used as an alternative to IEEE floating-point number system in
many applications, especially the recent popular deep learning. Its non-uniformed number
distribution fits well with the data distribution of deep learning and thus can speed up the
training process of deep learning. [17] Among all the related arithmetic operations,
multiplication is one of the most frequent operations used in applications. However, due to the
bit-width flexibility nature of posit numbers, the hardware multiplier is usually designed with
the maximum possible mantissa bit-width. As the mantissa bit-width is not always the
maximum value, such multiplier design leads to a high-power consumption especially when
the mantissa bit-width is small. In this brief, a power efficient posit multiplier architecture is
proposed. The mantissa multiplier is still designed for the maximum possible bit-width;
however, the whole multiplier is divided into multiple smaller multipliers. Only the required
small multipliers are enabled at run-time. Those smaller multipliers are controlled by the
regime bit-width which can be used to deter- mine the mantissa bit-width. This design
technique is applied to 8-bit, 16-bit, and 32-bit posit formats in this brief and an average of
16% power reduction can be achieved with negligible area and timing overhead.

5
CHAPTER-3
ADDERS AND MULTIPLIERS
3.1 Adders
Due to it being the most often utilised operation, the binary digits A and B are crucial in digital
computers, microprocessors, and digital signal processors, among other things. It serves as a
building component for synthesis in all particular arithmetic operations. Digital circuits such as
adders and summers are used in electronics to add numbers.
When it comes to computing, adders aren't only utilised in the arithmetic logic unit(s) found in
many computers and processor; they're also used in other areas of processors for various
activities such as estimating addresses and table indexes.
No matter how many other numerical implementations adders may be built for, the most
popular adders work on binary values, such as BCD or Excess-3. When two's complement or
one's complement are used to represent signed numbers, changing an adder into an adder-
subtract is of little consequence; other ways of representing signed numbers need a more
complicated adder.
The binary adder designs, however, show out to be a particularly dangerous hardware unit in
terms of an ALU's ability to be executed properly. In any laptop arithmetic textbook, it seems
that a wide range of circuit topologies with varying performance characteristics and widespread
usage exist.
Regardless of the fact that many studies on binary adder designs are complete, there are few
studies based on an analysis of their respective overall performance. This mission's evaluations
of the labelled binary adder systems have been positive. In order to illustrate the identical
performance of the RCA and Carry Select adders, we developed (description language) code
for a few of the large members of the adders at very fast speeds.

3.2 Types of Adders


3.2.1 Half adder

6
A half wave rectifier is a 1-bit multiplexer that adds two inputs: one is Sum and carry are
generated from a single bit. Carry propagation is used to achieve the addition of several bits,
and the final result is 2c + s.A basic logic gate may be created by combining the AND basic
gate with the XOR special gate.
A one-bit adder is being constructed. A full-adder may be created by combining two half-
adders with an additional OR gate and a 1-bit addition performer. It's easy to see in Figure 3.2
how the 1-bit adder takes two inputs (X0 and X1) and creates outputs by adding them. The
outputs, which are called Sum and Cout, are the letters Y.

Fig 3.1: Half adder block diagram


Table 3.1: Half adder truth table.
It's possible to derive the Boolean expressions from the Truth Table as follows: Sum = ab'
+a'b,
Cout = ab ……………………………… (1)
This figure shows how we implemented Nand for both outputs to render the layout more
efficient by converting the circuits to all NAND form. The NAND gates' inputs are a, b, and
a1, b1, while the NAND gates' outputs are Sum, Cout. Because of this, using Boolean
expressions will allow us to quickly create the schematic for both Sum and Cout's output.

7
It is necessary to use two NAND gates in order to retrieve the sum of two NAND gates'
outputs. The sum of these two NAND gates' outputs is then supplied to a third NAND gate,
and so on. A NAND gate is used to produce the output Cout by taking as inputs a and b.

Fig 3.2: Half adder logic diagram


3.2.2 Full adder

1-bit adders combine binary integers and store the resulting sums in memory. Two operands,
X0 and X1, and Cin are used in a 1-bit FA to add three 1-bit values. C in is the carry output from
the preceding least significant bit. A cascade of adders often includes a full-adder as a module.
Which multiply 8-bit binary values by a factor of 16 or 32 or 64. The outputs of a full adder are
denoted by the symbols Cout and S, which are sum and carry, respectively.
Assume that Sum=2 X Cout+S, the below figure shows full adder diagram.

Fig 3.3: Full adder block diagram

8
Table 3.2: Full adder truth table

Fig 3.4: Logic design of one-bit full adder

3.2.3 Ripple Carry Adder

To provide an n-bit wide variety, several complete adder circuitries can be cascaded in
simultaneously. N-bit parallel adders need N different types of complete adder circuits to
function properly. When using a ripple carry adder, the carry out of each adder circuit is the
haul in of the next full adder, and so on. The ripple carry adder name comes from the fact that
each carry bit ripples onto the next level. When using a binary adder, none of the full adder
outputs are true until all of the preceding carries have been produced. After then, the logic
circuitry has propagation delays. An input's utility is measured by propagation delay, which
measures how long it takes for a given output to spread. Now consider a NOT gate, where the
output might be "1" or "0" depending on the value of the input. The propagation delay is the
time it takes for not output to go from "1" to "0" once logic "1" is applied to the NOT gate's

9
entry. Also known as the carry propagation delay, this refers to the time passed between
sending a carry in signal and receiving a carry out signal (Cout). The following code

demonstrates how to add a 4-bit binary integer.

Fig 3.5: Ripple carry adder


3.3 Multipliers
Digital signal processors use multipliers as one of its main functional units (DSP). Since
hardware multiplication comprises the execution of most DSP algorithms, a high-speed
multiplier is highly needed for rising DSP systems that require high data throughput. Currently,
a DSP chip's instruction cycle time is mostly determined by the amount of time it takes to
perform a multiplication operation. Most algorithms for signal processing use multiplication as
a basic action. There is a lot of space taken up by multipliers, as well as a significant latency
and a lot of power. The development of low-power VLSI systems has relied heavily on low-
power multiplier design. Digital signal systems would be incomplete without fast multipliers.
Digital signal processing, as well as general-purpose processors, currently place a high value
on multiplier operating speed. Power reduction techniques for reduced digital design, such as
reducing supply voltage and multi-threshold logic and clock rate, the use of did sign amplitude
mathematics and differential encoding and decoding, parallelization or pipeline operation and
tuning input bit patterns to start reducing switching activity, have all been proposed in recent
years. If you have three parts to your multiplication equation, you have a simple multiplier. The
first two parts are partial product production, while the second and third portions are partial
additions to your original equation. Binary multiplication may be accomplished using a variety
of methods. As a general rule, design complexity and latency are taken into consideration while
making a decision.
10
To sum part products more efficiently, use an array of tree of complete adders. Hardware
implementations of binary multipliers that are appropriate for System design at the CMOS
level include array multipliers, booth multipliers, and Wallace Tree multipliers. It compares the
efficiency of array multipliers (using ripple carry adders), array multipliers (using ripple carry
adders and carry save adders), and modified array multipliers (using carry save adders and
carry look ahead adders).
Binary multiplication may be accomplished using a variety of methods. To some extent, the
decision is dependent on the relative importance of several characteristics including throughput
speed, throughput area, overall system size, and overall system complexity. To aggregate
partial products more efficiently, use an array and tree of complete adders. Standard multipliers
include the matrix multiple, Wallace Tree multiplier, and Dadda multiplier. Due to the
enormous number of gates required by an array multiplier, the overall area increases. More
logic gates need higher power consumption. As a result, array multipliers are less cost-
effective. The worst-case latency of an array multiplier is inversely proportional to the
multiplier's breadth. With a large multiplier, performance will be sluggish.

3.3.1 Dadda multiplier

Partial products are formed by an array of AND gates in a parallel multiplier. When using a
multiplier, it's important to keep in mind that the most time-consuming portion is adding up all of
the partial products. When using the Wilson procedure, the intermediate results are reduced as
quickly as possible. When using the dadda method, the intermediate results are reduced in same
lot of scales as when using the Wallace method. This leads to a design that uses as there are less
full adders and ½ adders than when using the Wallace method. As a result, there is a reduction in
power usage. Because the dadda technique uses a broader and faster CPA, and because of this, it is
less regular in structure than the Wallace tree.
As N, the number of bits in the operands, increases, so does the number of Half-adders and Full-
adders need for a Dadda multiplier, there are N full-adders

11
CHAPTER- 4
EXISTING SYSTEM
4.1 The Unum Number Format
The IEEE 754 math standard is being challenged by the emerging universal number (Unum)
format, which uses an arithmetic format comparable to floating point. This chapter outlines in
detail Gustafson's planned replacement for floats in 2017: the posit integer format (Type III
unum). The understanding of Type 1 and 2 Unum’s is helpful in understanding propositions.
For this reason, the premises are discussed first, followed by their description. Finally, we'll go
through some of the new number format's features and benefits.

4.1.1 Type-1 and Type-2 Unum’s


As an alternative to the decades-old Ieee Std for Floating-Point Arithmetic, John L. Gustafson
developed the notion of Unum in (IEEE 754). Real numbers and real number ranges may both
be expressed using the Unum number format. Using this mathematical framework, Unum has
been divided into three categories during the previous several years.

Type I Unum floats are to integers what integers are to floats. It's a collection of collections
stacked on top of each other In the case of computations that cannot offer the numerically
accurate result and in traditional floating-point arithmetic rounding to be done, they may
indicate either a precise float or an open interval between neighboring floats. Unum is tasked
with doing this. Finish the fraction with a "ubit" (uncertainty bit) to indicate whether the
fraction corresponds to a precise number or an interval, depending on whether or not the ubit is
equal to 0.

The IEEE 754 hanging scheme's sign, exponent, and fraction (or mantissa) bit fields are
likewise supported by the Category I Unum format. Exponent and fraction fields in this format
may be as little as one bit or as long as the user desires. As a result, the Unum scheme now
includes values for exponent length and fraction size that indicate the lengths of the

12
fields for the exponent and percentage of the number in question Figure 4.1 depicts this format
definition. Interval arithmetic can be expanded naturally using Type I Unum, but their
changeable length necessitates hardware control. More information about the format's proposal
and rationale may be found there as well.

Certain of the initial Unum's flaws, such as hardware implementation difficulty or the fact that
some values may be represented in multiple ways, were addressed with the Type II Unum. As a
result, IEEE floats will no longer work with this version. An alternative approach is presented
by Type II Unum, which utilizes an elegant, mathematical mapping of variables onto to the real
spatial line (the set R= R U). The key concept is that the point when positive real numbers
convert into negative numbers (and the same ordering), and reflect the value, is where designed
(two-word) numbers transition from positive to negative. Figure 4.2 depicts the Type II Unum's
structure.

There is an organized collection of real numbers in the circle's top right quadrant. Xi, While the
Xi negatives are concentrated in the top left quadrant, a reflection on the vertical axis may be
seen.

This is a reflection on the horizontal axis, with the inverses of the numerals on the upper half of
the circle. If we have a value, we can use vertical and lateral reflections to acquire the opposite
and reciprocal values. To reiterate what has already been said about type I and type II Unum’s,
the open space here between surrounding reals is represented by Type II Unum’s ending in 1
(also known as the ubit). Since projective real numbers have many perfect mathematical
qualities based on their geometry, Type II Unum uses take a gander table for most operations.
This severely restricts the ultra-fast format's capacity to go beyond

13
With today's memory technology, it drops to 20 bits or less. Furthermore, in this format, fusion
operations like dot product are highly costly. For these reasons, the quest was on for a new file
format that would preserve as much as possible of both the Type II Unum characteristics while
still being more "hardware-friendly".

4.1.2 Type -3 Unum: Posits


When it comes to the IEEE 754 standards for floating-point numbers, the Type III Nunc text
box (also known as posit) was "intended as a straight drop-in replacement." Type III Unum is
consequently based on a real projective line, exactly as Type II, despite the fact that the
hardware implementation of this format would be comparable to the current circuitry used for
IEEE 754 hanging arithmetic. Perfect reflection has been relaxed to obtain square root now
only occurs for 0, and arithmetic powers of two. This means that those numbers are int-integer
multiples of m and have no open intervals.

Posit's interval arithmetic variant, a valid, is used instead. A pair of equally-sized posits, both
terminating in a ubit to indicate the boundaries, make up the structure. Valid, on the other hand,
are not the primary emphasis of this dissertation, and Gustafson has not yet formally revealed
the format's specifics.

14
4.2 The Posit Format

It has four fields: sign, mode, exponent and fraction. This is called the posit format.

If a number is positive or negative, the sign bit will be set to 0. Otherwise, it will be set to 1.
Before getting the regimes, exponent, and fraction fields, you have to take the 2's complement
of the bits that are left. The numeric format is not permitted in this field. It's used to figure out a
used k scale factor, with used equal to 22es.Number many identical bits followed by an
opposing bit determines the value of k. If the regimes field has leading 0s, then k = m; if it
contains leading 1s, then k = m 1.

The transformation function 2e is represented by the exponent bits (shown in blue), which
encode the value e. No bias exists, as opposed to floats. Due to the varying length of the
regime, there may be up to es exponent bits since the first bit of this field follows the regime
field immediately (so the possibility of no exponent bits exists).

In the fraction field, the bits that remain correlate to the fraction value f, which is represented
by the fraction field's remaining bits (f). However, there is an important distinction when it
comes to positing the IEEE float (also known as significant or mantissa).
No subnormal integer has a hidden bit value of 0, hence the hidden but is always 1. following
IEEE 754 as a guideline. As a result, the decimal position value is Inputs used: K, 2e, 1, and
1+f It goes without saying.. (3) where: scale factor is equal to 22es, k is the number of the
phase field, e is the exponent field's value and f is its fraction field's value; these are the fields
to be used.

15
As can be seen, the total number of bits (n) as well as the total amount of exponent bits totally
dictate any posit arrangement (es). The form Posit (n, es) is often used to represent a set of n-bit
posits with exponent bits of es.

4.3 Setting the Posit Environment


Type I Unum's e size and f size parameters specified the computing environment. It's better to
think of them as the global integer values we need to understand what a posit means. Bits may
be any integer larger than 2 for n bits (total number of bits) and any integer zero or more for
their respective exponent bits (number of exponent bits). As for the n bits and es values, they are
selectable independently; for example, a 4-bit posit may have a maximum of 5 exponent bits. It's
not a problem with the equations.
The set posit env procedure, which is similar to Type I Unum, computes posit environment-
specific values using n bits and es. Two n bits are needed to represent the number of potential bit
patterns (n pat). Because it appears so often, the value 22es is referred to as used. Additionally,
set posit env calculates n pat and used, as well as finding the smallest and largest possible
positive integers. According to IEEE floating, these numbers define the dynamic range. The
dynamic range is completely balanced around 1.0, and every value of 2 has an exact reciprocal.
The extremes are always identical reciprocals for posits; 1/ (min post) =max pos. (It should be
noted that IEEE floats do not have these features.)
For the worst summations that result in carry bits, you must additionally set the size of the cause
dissatisfaction register (q size) and the quantity of extra bits it contains (q extra). There will be
more on the quire register in the Fused Operations section. Powers of two are preferred as bit
lengths for variables in most computer systems, or at the very least multiple copies of 8-bit
bytes. The use of 56-bit posits instead than 64-bit floats, for example, may frequently produce
accurate answers than the latter if the performance cost is not imposed on the computer. Because
bandwidth-limited computers must align fetches and stores on bigger power-of-two boundaries,
the savings in storage and bandwidth required for smaller integers might just save labor (and
energy and power). Many applications don't need the precision or dynamic range that a 64-bit
float offers. In these circumstances, a 32-bit posit may be sufficient and give a clear speed and
storage benefit over the more common 64-bit float. 8-bit posits may replace 16-bit floats in
certain applications (such as neural network training), while 16-bit posits can replace 32-bit

16
floats in other applications. Long arithmetic sequences may be created with just one rounding
mistake by using the quire register, and in certain cases, 16-bit positions can even substitute 64-
bit floats.
Notice that the 2nbits-4 ratio of maximum position to minimum position is employed.

The posits' dynamic range Remember that the unit of currency being used is the 22-dollar bill.
The dynamic range of the posit format is an exponential of an exponential of an exponential by
using regime bits to elevate used to the power of any integer from -n bits + 1 to n bits - 1. Plots
may thus provide more accurate results by using fraction bits rather than the exponent bits used
in IEEE floats, and hence have a broader dynamic range. To put it another way, the es key has a
lot of power, so use it wisely. More above 5 may strain your computer's memory. For example,
writing the precise value of min pos with es = 5 takes many gigabytes of disc space.

A Nar or unlawful operation will result in a Nar as an output. A regular number or 0 must be
returned otherwise. Saturating the output is the answer to overflow and underflow. It is reduced
to the max pos or raised to the min pos if a favorable outcome exceeds the max pos or the min
pos
respectively, min pos the output of the function must also be negative and range from -max pos
to -min pos. The criteria for rounding are straightforward as well. Rounding to the closest value
is the only option; the result will be rounded to the next higher number. IEEE 754-2008, on the
other hand, outlines a wide range of rounding options. These modes add complexity to hardware
implementations. In the end, posit is a new number system that outperforms the IEEE 754-2008
standard for floating-point numbers in certain situations. Using it is a breeze because of its ease
of use, flexibility, simplicity, and overall aesthetic appeal. It is also easier to implement Posit in
hardware than IEEE 754-2008. Posit's supremacy has led to a number of Posit hardware
implementations. This paper introduced a parameterized adder/subtractor. in which
parameterized adder/subtractor and multiplier were discussed. Leading bits were determined by
using both the leading one’s detector (LOD) and the leading zeros detector (LZD).

17
Fig 4.4: POSIT component extraction

These two pieces of work transformed positive Assert numbers to the equivalent opposite
numbers in order to handle negative Posit numbers. Even if their techniques are
straightforward, the underlying technology isn't. LOD and LZD create redundant area when
converting their supplement to sign and magnitude, respectively. The initial flaw was fixed by
another piece of art displayed there. Only LZD is being used in the algorithm. Unfortunately,
the second flaw remains, preventing this effort from fully using Posit's benefits. Reference
suggested a Newton–Raphson parameterized divider based on this research. Besides studying
Posit's software applications, Reference also provided a basic hardware implementation. Posit
is encoded as a sign and a magnitude in this implementation. This ingenious idea reduces the
complexity of circuits; however, it does not adhere to the Posit criteria. Our approach resolves
these two issues, lowering our implementation costs while also providing a comprehensive
resolution for Posit arithmetic.
4.3.1 Extracting the sign bit
Positive bit strings have a range from negative one to positive one, depending on how they're
read as 2's complement integers. This discussion will have to make due with integers between 0
and n pat - 1 since Mathematica doesn't natively support 2's complement integers. The posit bit

18
string is represented by the unsigned integer n pat /2. has a posit bit string that looks like this
when using signed integers: -n pat / 2.
The posit Q test gives a positive result. If an input number (a bit stream) is a valid position,
then this function returns True; otherwise, it returns False.
You can easily get the most significant part of the positive or negative integer (including zero)
by removing the sign bit, which is the most important bit. Long binary strings can be read more
quickly because to the chroma of bits, which is utilized for all kinds of Unum’s. A red indicator
bit (RGB components: 1, 0.125, 0) indicates that the value is positive.
Because the posit bit strings span from 0 to n pat - 1, the posit Q function prevents us from
assessing the sign bit of the out-of-range integer n pat (when viewed as unsigned integers). -
32p 32 is the minimum requirement in the C environment, but in Java, you may get away with
anything between zero and sixty-four as long as you have a signed integer as input. (When
To conduct an out-of-domain action, Mathematica simply repeats the as when it returns "sign
bit" above, an expression is returned to the user).

4.3.2 Extracting the regime bits


To determine how many regime bits there are, look at how long a string of identical bits runs,
all 0s or all 1s. Bits that belong to the same regime are assigned a colour code of honey (RGB
0.8, 0.6, 0.2). The easiest approach to decode it is to negate the bit string corresponding to the
posit as a signed integer with the 2's complement bit set to 1. (Which means flip all the bits and
add 1). For example, "Find First One" or "Count Leading Zeros" are single op-codes in
assembly language that are executed in one clock period on most processors. Real hardware (or
a procedure written in a limited language) might execute bit extraction far more rapidly and
easily than we can in the regimes bit’s function. It's easy to understand the regime pieces if you
think of them as clothing sizes: L means Large, XL means Extra Big, and so on. Determine
how many characters there are in X. The logic behind the regime value function is the same,
and it's a simple one-liner. It's possible that the number denoted by the regimes bits was the
direct output, but we calculated it twice for the sake of clarity. Here are a few of instances of
this. The regime value for positive exponents is one less than the bits in the run, but we'd need
to be able to express a zero value. Again, hardware would be able to perform this in a matter of
seconds since it closely resembles 2's complement logic. Once you've tried a positive number,

19
try a negative one to see how the 0 bit run changes to a 1 bit one. Note that flipping all the bits
and adding 1 result in 011110 + 1 = 011111, which is the 2's complement of 100001.
4.3.3 Extracting the exponent bits
The exponent bits are located after that. Even though es is more than zero, the regime bits may
push part or all of the exponent bits to the right side of the integer, resulting in no exponent
bits. 011111 and 011110, for example, have no exponent bits. Bits with exponents begin to
emerge for regimes with shorter bit runs, such as 011100 and 011101. We skip over the
termination bit in the following code and instead look at next es bits or whatever many are left,
depending on how many are smaller. the empty set to a series of bits of any size may be
obtained length of es. The power of 2 that these exponents bit strings convey is just an
unsigned integer since there is no "bias" in them as with floats. Blue is the colour designation
for exponent bits (RGB value 0.25, 0.5, 1). If 110 = 6 is the exponent bit value, then the scale
factor that the exponent bits contribute is 26. As a result, the exponent will be 0 and the scaling
factor will be 20 = 1, when there are no exponent bits.
4.3.4 Extracting the fraction bits
Now that we have all four components of both the posit bit string, we can figure out what value
each one represents directly. If the other bits haven't pushed them off the right end, the
fractional bits are all the ones to the outside of the exponent. They may have (in which case, the
fractional value is 1). When a hidden bit is not present, it will always be 1 (i.e., When
compared to the exponent bits function, the only differences are that the fraction bits function
uses a start bit value that's also es bits farther to the right and that it takes all remaining parts
instead of just the last one. This is how the accuracy of propositions varies according to the
scale factor. It is possible to describe a very precise fraction even when the scale factor is
modest. There are less bits left for precision the larger or smaller a posits magnitude.
4.4 Posits as Projective Reals
Possibilities are best comprehended geometrically. This is because the projective reals map to
the Type II Ipsa, which is where Type 3 Posit Arithmetic comes from. While this is the case
for Type II and Type III, the constraint that all values have an opposite is eased, it is possible
to acquire the contrary values of a single integer by turning the central line, but only for 0, and
powers of 2. The two-bit template is used by Type II and Type III Unum.

20
Fig 4.5: Two-bit numbers mapped over the projective reals.
As real numbers go from positive to negative, so do the bit strings that surround the ring (which
may be thought of as 2's complement signed integers). Thus, floats no longer have a "negative
zero," and the - and + signs are grouped into a single sign. To produce a three-bit ring, we may
add a value within 1 and to the aforementioned two-bit ring (along with the appropriate
negation and reciprocation acquired from reflections along the vertical axes, respectively).
However, it must be noted that this option "seeds" the way the remainder of the unum ring is
filled (all of its positive values are power values of that value), which is why this amount is
referred to as the used value.
In order to improve processor efficiency, it is critical to improve multipliers' speed and
power/energy efficiency properties. For the most part, digital signal processor (DSP) cores are
used to process pictures and movies for human consumption. Since this is the case, we may
employ approximations to increase performance while also saving energy. This is due to
human beings' limited visual and auditory perception capacities while watching a picture or a
video.
There are additional applications than video compression where the accuracy of mathematical
operations is not important to the system's functioning. Assumptive computing allows the
designer to make compromises between accuracy and speed while also reducing power and
energy usage. Design abstraction levels such as circuit, logic, and architecture, along with
algorithm & software layers may all be used to apply the approximation to arithmetic units.
You may use a variety of approaches to approximate, including timing violations (like voltage
scaling or overclocking) and function approximation methods (such changing a circuit's
21
Boolean function), or any combination of these techniques. Many approximate arithmetic basic
elements, such adders and multipliers, have been proposed at various design levels under the
domain of function approximation approaches. Our primary goal in this study is to provide a
high-speed, low-power/energy, yet close multiplier suitable for DSP applications that must be
error-tolerant. Assuming rounded input numbers, the suggested approximation multiplier
modifies the usual multiplication technique at the algorithm level. Rounding-based
approximate (RoBA) multipliers is the name given to it.
Both signed and unsigned multiplications may be used with the proposed technique, and three
optimal designs are shown. They compare the delays, power and energy consumption and
energy delay products (EDPs) of these structures with certain approximate and precise (exact)
multipliers, in order to determine their efficiency.

Fig 4.6: Data path of the proposed posit multiplier

The following are the paper's main contributions: A novel RoBA multiplication strategy is
shown, which modifies the usual multiplication technique; two hardware designs of the
proposed approximation multiple system for signed and unsigned operations are described
4.5. Multiplication Algorithm of ROBA Multiplier
When working with two-to-the-power-n numbers, the key goal is to make the operation as
simple as possible for the approximation multiplier (2n). To better understand the

22
approximation multiplier's function, let's use Ar and Br to symbolize the rounded inputs of A
and B. The formula for multiplying A by B is as follows:

If (Ar - A) is divided by (Br – B), then A = B A big thank you to everyone who has contributed
(4) Important finding is that Ar Br (and B) (and A) multiplications all occur in the same way.
only the shift operation can put this into practise. (Ar implementation) in hardware.

However, A) (Br B) is rather difficult. This word has a modest impact on the final outcome
since the difference between the precise and rounded figures determines how much weight it
will have. Because of this, we advise removing it from the multiplication equation to make it
more straightforward. As a result, the following equation is used to carry out the multiplication:
What is A X B? A = Ar X B + Br X A- .......................................... (5) This strategy calls for
finding the 2n-th closest values of A and B. when p is a positive integer more than one, the
value of A has two closest values of 2n having equal absolute differences: 2p and 2p1 when p
is a positive integer greater than one. There's no difference between them in terms of accuracy,
but using the bigger one (except for p = 2) results in less hardware implementation to get the
closest-rounded number, so that's what we're going to look at in this study.

If you round up 3 2p2, you don't have to worry about simplifying the process since the numbers
in that format don't care. If you round down, you have to worry about simplifying the
procedure. The single exception is the number three, for which the suggested approximation
multiplier uses two as its closest value
Unlike prior work where approximate result was smaller than the precise result, it should have
been noted that the RoBA multiplier's ultimate result may be either greater or less than the actual
measurement based on the order of magnitude of Ar and Br in comparison to those of A and B

23
Fig 4.7: Block diagram for the hardware implementation of the RoBA multiplier
respectively. The estimated result will be greater than the precise result if one of operands (say A)
is lower than its corresponding adjusted value while another operand (say B) is bigger than its
equal rounded value. This is because (Ar A) (Br B) has a negative multiplication result in this
example.
In this case, the approximation is more important than exactness, since their difference is exactly
this product. This is because the estimated result will just be smaller if A is greater (or smaller)
than Ar (or if B is larger or smaller than Ar). It's also worth noting that the suggested RoBA
multiplier only has an advantage when dealing with positive inputs, since negative inputs' rounded
values in the two's complement representation aren't in the form of 2n.
Then conduct the multiplication process for unsigned integers, and finally sign-apply the correct
sign to unsigned results after determining both the absolute values of inputs and output signs
based on the input signs has been done before starting the multiplication operation.

24
CHAPTER- 5
PROPOSED SYSTEM

5.1 Reversible Logic Gates:


In reversible logic it is viable to obtain any input data from the output data. This is more
advantages than the existing logic levels. Because of this property no information is lost. The
conventional logic gate can be converted into reversible logic gate by appending additional
input and output wires if required [19]. There are two major drawbacks in reversible logic:
• There is no feedback.
• There is no fanout
Figure 5.1 shows HNG gate as a reversible logic gate. This HNG gate is referred as k*k gate
as there exists k inputs and k outputs. Many combinational logic circuits have been proposed
using the combinations of reversible logic gate over the last decades. This will increase the
circuit’s quantum cost. To overcome this, in the proposed multiplier only HNG gate is used
for designing the complete multiplier.

Fig 5.1: Basic HNG reversible logic

25
Table 5.1: Truth table of HNG reversible logic

Proposed multiplier design using HNG gates


The high-speed efficient array multiplier has been designed using HNG gate alone. The HNG
gate is used to provoke partial product as well as for addition also. The comparison is made
with the proposed array multiplier using HNG gate and the array multiplier with normal gate
in regards with high speed, area consumption and power consumption.
A. Conventional Array Multiplier In multiplication process, the final result is obtained by
multiplying the two incoming input and generating partial products. For generating the final
product, conventional shifting and adding method is done. To get the better performance of
the multiplier the different adder circuits are used. This may include ripple carry adder, Look
ahead carry adder, carry skip adders, carry bypass adder and carry select adders. In the
proposed multiplier design, reversible gate-based ripple carry adders is designed and then
using this adder multiplier has been designed. The addition can be accomplished with
reversible ripple carry adder using HNG gate. For normal array multipliers P-1 RCA adders
are required where P is the multiplier length. Figure 5.2 shows the 4-bit binary array

multiplier.

Fig 5.2: The diagram of 4 X 4 array multiplier

This array multiplier multiplies two 4-bit binary numbers by using an array of full adders.
Simultaneously addition of the different product term is done in this array. By using the
multitude of AND gates, the partial product terms are generated. Following this array of

26
AND gates, the adder array is used. The hardware structure for a p x q bit multiplier is
described as p x q AND gates and (p-1) adders. Array multiplier is doing the multiplication
process in traditional way. It looks like regular structure. Hence routing and layout design are
implemented in a much uncomplicated manner. Implementation of this multiplier is obvious
but it requires larger area with considerable delay [20]. The major eminence of the array
multiplier is that it has less hardware complexity, easily scalable, easily pipelined and also
easily place and route.
B. Proposed Multiplier Design Our proposed multiplier architecture circuit has two parts.
Primarily, the partial products are accomplished using AND gates. The realization of HNG
gate as AND gate has been done if input vectors of HNG gate is IV= (A, B,0,0), the output
vectors will
be OV (P=A, Q=B, R=AB, S=AB). Therefore, S output gives the logical expression of AND
gate as c and d inputs are ‘0’.

Fig 5.3: shows HNG gate as an AND gate.


The important component of the HNG gate is that a reversible full adder circuit can be
designed by using a single HNG gate. If input vectors of HNG are Iv (A, B, Cin,0), the
output vector will be Ov (P=A, Q=Cin, R=Sum, S=Cout). Thence it is possible to get both
the sum and carry outputs. The realization of reversible full adder using HNG gate is
implemented and shown in Figure 4. The above mentioned le full adder circuit uses a single
HNG gate. Out of 4 outputs, sum and carry are produced in 2 outputs and remaining 2
outputs are garbage outputs. [21]. Adder is a fundamental unit of the multiplier; thus, it has

27
noteworthy impact on the whole performance of the system in regards with power
consumption, delay and area occupancy.

Figure 5. 4. Full adder using HNG gate

CHAPTER- 6
SOFTWARE IMPLEMENTATION USING XILINX
In this section, we'll go through the project's software requirements. Xilinx ISE is the
programme utilized in this case. Verilog HDL, a programming language, may be used to
create the programme code. The behavior of the system is also generated for the specific
target device.
6.1 Xilinx Introduction
Xilinx Tools is a collection of digital circuit design software tools. Field-programmable gate
arrays (FPGA) as those from Xilinx or CPL devices are used (CPLD). For example, there are
four stages in the design process: (a) the initial concept phase, (b) the actual design phase,
and (c) functional simulation. A prototype entry tool, a Verilog hardware description (HDL) -
VHDL or VHDL, or a mix of the two CAD tools may be used to input digital designs. Only
the design flow using VHDL HDL will be used in this lab.

As long as you start with VHDL HDL design requirements, you may use the CAD tools to
create combinational and sequential circuits. The following are the stages in this design
process:

a) Using a template-driven editor, create Verilog code design input files. Compile and
interpret the VHDL design file
(b).design-simulation
(c) create the test vectors (functional simulation) instead of making use of a PLD (FPGA or
CPLD).
(d) Assign the design's input/output pins to a target device to implement it.
(e)Use an FPGA or CPLD to download the bit stream.
(f)Testing on FPGA/CPLD device design

The following segments make up a VHDL input file in the Xilinx software environment:
28
First line of code: module name and a list of ports for input and output. Declarations: ports,
registers, and wires. Input/output ports. Equations, state machines, and logic functions are all
examples of logic descriptions. The module comes to an end at this point. You must use the
VHDL input format described above for all of your lab designs. For combinational logic
architectures, the state diagram section does not exist.

6.1.1 Programmable logic device


The Basys2 board, which features a Synthesis Spartan3E –XC3S500E FPGA with FT256
packaging, will be used in this lab to develop digital designs. The Spartan FPGA series includes
this FPGA. These gadgets may be found in a number of different configurations. Devices
manufactured in 132 pin packages with component number XC3S250E-CP132 will be used.
There are around 50K gates in this FPGA. The Xilinx website has comprehensive information
about this chip.

6.2 Creatin a New Project


click on the Task Manager Icon to begin using Xilinx Tools
The desktop of a Windows computer. This should bring up a new window in Project
Navigator for you.
This window displays the most recent project that was visited.

29
Fig 6.1: Xilinx project navigator window (snapshot from Xilinx ISE software)

Getting a project off the ground to begin a new project, choose File->New Project from
the menu bar. This will open a new window on your desktop for the current project.
Complete the fields as follows:

30
Fig 6.2: New project initiation window (snapshot from Xilinx ISE software)

Name of Project: Give your new undertaking a name.


If you wish to save your new project in a folder on your desktop or in the Xilinx b in
directory, do not provide it as the project destination. The ideal location to store it is on your
E: drive. There should be no spaces in the project location path, for example: C: ABCDT
Anew lab sample exercises gate should not be used.) HDL should be left as the top-level
module type.
Example: Assuming the project is titled "RoBA," put it in the project name field, then
choose "Next" from the drop-down menu.

31
The following window should appear after clicking NEXT:

Fig 6.3: Device and Design Flow of Project (snapshot from Xilinx ISE software)

Click on the "value" section and choose the appropriate value for each of following properties:
from the resulting drop-down menu of options.
Device Family: The FPGA/CPLD family that is being utilized. The ZYNQ FPGAs will be
used in this lab.
Device: The unique identifier for the underlying hardware. You may use the code XC7Z010
for this particular experiment.

Package: The form factor and number of pins in a package. For this experiment, we're using
a ZTNQ FPGA in a CLG400 package.
It's a "-3" on the speed scale.
Software used for synthesis: XST

The instrument used to model and test the design's functioning is known as a simulator. An
integrated ISIM (VHDL/Verilog) model is included in the Xilinx ISE design environment. In
this case, the simulator of choice should be "ISIM (VHDL/Verilog)"; nevertheless, Xilinx
ISE Simulator may also be utilized
In order to save your entries, go ahead and click on NEXT.
32
A subfolder with the project name will hold all project files, including schematics, net lists,
and VHDL files. Only one top-level HDL source file may be used in a project (or schematic).
A component, hierarchical design may be created by adding modules to the project.
The remaining fields may be left as-is. After that, click on the following link:
Now, a new window will open, displaying the project's file name (RoBA) along with its
speed grade and package.
Select Add Source from the context menu when you right-click on the name.

Fig 6.4: Adding source code into the project

33
In Xilinx Tools, choose File->Open Project to get a list of all the projects currently open. Click
OK when you've selected your project.
If the project isn't already installed on your computer, go to the Add Source menu, select it, and
then browse for and select all of the executable files.
After that, the following window will appear on your screen:

Fig 6.5: Create new source window (A Snapshot from Xilinx ISE Software)

To proceed, verify that all files have green checkmarks but then click on ok. After that, a new
Editor window will appear, revealing that all of the files have the v file extension.

34
Fig 6.6: Editor window
Select uut-roba(roba multiplier.v) and set I as a TOP MODULE as show below

Fig 6.7: Setting code as a top module

35
Now select the test. v file then check for syntax errors (if any) which show in
SYNTHESIZE XST and IMPLEMENT DESIGN.

Fig 6.8: Checking syntax errors

Process “Check Syntax” and “Generate Post-Place & Route Static Timing” completed
successfully will appear in the console window at the bottom.

36
CHAPTER- 7

SIMULATION AND SYNTHESIS RESULTS

7.1 Un-Signed 8 Bit Posit Multiplier

Fig 7.1: Simulation result of 8 bit unsigned posit multiplier

7.2 Signed 8 Bit Posit Multiplier

Fig 7.2: Simulation result of 8 bit signed posit multiplier

37
7.3 Rtl and Technology Schematics for Existing Method

7.3.1 Rtl schematic for posit multiplier

Fig 7.1: Rtl schematic for posit multiplier

7.3.2 Rtl Schematic for dsr-right-n-s

Fig 7.2: Rtl schematic for dsr-right-n-s

7.3.3 Rtl schematic for dsr-left-n-s

38
Fig 7.3: Rtl schematic for dsr-left-n-s

7.3.4 Rtl schematic of transistor level posit multiplier

Fig 7.4: Rtl schematic of transistor level 8 bit posit multiplier

7.4 Rtl and Technology Schematics for Proposed Method


7.4.1 Reversible 8-bit posit multiplier

39
Fig 7.5: Reversible 8-bit posit multiplier

7.4.2 Transistor level of reversible 8-bit posit multiplier

Fig.7.6: Transistor level of reversible 8-bit Posit multiplier


7.5 Synthesis Result: Existing Method
7.5.1 Area

Fig.7.7:Synthesis result of area

40
7.5.2 Delay

Fig.7.8: Synthesis result of delay

41
7.6 Synthesis Result: Propose Method
7.6.1 Area

Fig7.9: Synthesis result of area

7.6.2 Delay

Fig. 7.10: Synthesis result of delay

42
7.7 Comparison Table

PARAMETERS EXISTING PROPOSED


LUTS 140 130
DELAY 8.806 ns 5.232 ns
Table 7.1: Comparison table

Comparing the 8-bit POSIT multiplier to the Rounded Based Algorithm (RoBA) multiplier, it
can be shown that the two have comparable power characteristics. However, the 8-bit modified
Rounding Being Its multiplier reduces area and latency by roughly the same amount as the 8-bit
Surmise multiplier in terms of technical complexity. The updated ROBA multiplier, as
compared to the Posit multiplier, simplifies the circuitry even more with its schematic
implementation of an 8-bit Posit multiplier

43
CHAPTER 8
CONCLUSION & FUTURE SCOPE

We presented the RoBA multiplier, a fast yet energy-efficient approximation multiplier. Inputs were
rounded in the shape of 2n to create the high-accuracy multiplier. Because the computationally
difficult component was eliminated, performance and energy consumption were both improved at
the expense of a very minor inaccuracy. Signed & unsigned multiplications might both benefit from
the new technique.
By using Reversible Multigate operations we can reduce the Area as well as Delay, when compared
to the previous Models.
We also extended this project with the reversible gates to implement 16,32 bit posit multipliers and
also ALU Pipelining systems and MAC Units implementing.

44
REFERENCES

[1] D. Goldberg, “What every computer scientist should know about floating-point
arithmetic”, ACM Computing Surveys (CSUR), vol. 23, no. 1, pp. 5–48, Mar.
1991. do i:./.10.1145/103162.103163
[2] IEEE Computer Society Standards Committee and American National Standards
Institute, “IEEE Standard for Binary Floating-Point Arithmetic”, ANSI/IEEE Std
754-1985, 1985. Do i: 10.1109/ieeestd.1985.82928.
[3] “IEEE Standard for Floating-Point Arithmetic”, IEEE Std 754-2008, pp. 1–70,
2008. do i: 10.1109/ieeestd.2008.4610935.
[4] J. L. Gustafson, The End of Error: Unum Computing. CRC Press, Feb. 5, 2015, vol.
24, is bn: 9781482239867.
[5] J. L. Gustafson and I. T. Yonemoto, “Beating Floating Point at its Own Game:
Posit Arithmetic”, Supercomputing Frontiers and Innovations, vol. 4, no. 2, pp.
71–86, Jun.2017. do i: 10.14529/jsfi170206.
[6] L. van Dam, “Enabling High Performance Posit Arithmetic Applications Using
Hard- ware Acceleration”, Master’s thesis, Delft University of Technology, the
Netherlands, Sep. 17, 2018, is bn: 9789461869579.
[7] A. A. D. Barrio, N. Bagherzadeh, and R. Hermida, “Ultra-low-power adder stage
design for exascale floating point units”, ACM Trans. Embed. Compute. Syst., vol.
13, no. 3s, 150:1–150:24, Mar. 2014. Do i: / 10.1145/2567932.
[8] J. L. Gustafson. (Oct. 10, 2 0 1 7 ). Posit Arithmetic, [ Online]. Available:
4 https://posithub.org/docs/Posits4.pdf
[9] J. L. Gustafson, “A Radical Approach to Computation with Real Numbers”, Su-
precomputing Frontiers and Innovations, vol. 3, no. 2, pp. 38–53, Sep. 2016. doi:
.10.14529/jsfi160203.
[10] P o s i t Working Group. (Jun. 23, 2018). Posit Standard Documentation, [Online].
Available: https://posithub.org/docs/posit_standard.pdf .
[11] R. Munafo. (2018). Survey of Floating-Point Formats, [Online]. Available:
http://www.mrob.com/pub/math/floatformats.html
[12] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,
M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R.
Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,

45
I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O.
Vinyals, P. Warden. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. (2015).
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software
available from tensorflow.org, [Online]. Available: https://www.tensorflow.org/.
[13] S. van der Linde, “Posits also vervanging van floating-points: Een vergelijking
van UnumType III Posits met IEEE 754 Floating Points met Mathematica end
Python”, Bachelor’s Thesis, Delft University of Technology, Sep. 26, 2018.
[14] M. Klöwer, P. D. Düben, and T. N. Palmer, “Posits as an alternative to floats
for weather and climate models,” in Proc. Conf. Next Gener. Arithmetic, Mar.
2019, pp. 1–8
[15] H. Zhang, J. He, and S.-B. Ko, “Efficient posit multiply-accumulate unit generator
for deep learning applications,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
Sapporo, Japan, May 2019, pp. 1–5.
[16] Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D.
Kudithipudi, “Performance-efficiency trade-off of low-precision numerical
formats in deep neural networks,” in Proc. Conf. Next Gener. Arithmetic, Mar.
2019, pp. 1–9.
[17] Hao Zhang , Member, Ieee, And Seok-Bum Ko , Senior Member, Ieee”Design Of
Power Efficient Posit Multiplier” Ieee Transactions On Circuits And Systems—Ii:
Express Briefs, Vol. 67, No. 5, May 2020

46

You might also like