0% found this document useful (0 votes)
38 views72 pages

WWWW

This document is a mini project report on the high-performance VLSI implementation of a 3-parallel FIR filter using a Vedic multiplier, submitted by students from CMR Engineering College for their Bachelor's degree in Electronics and Communication Engineering. It includes acknowledgements, a declaration of originality, and a comprehensive literature survey on FIR filters and Vedic multipliers, highlighting their importance in digital signal processing. The report emphasizes area optimization, power consumption reduction, and the performance improvements achieved through the proposed design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views72 pages

WWWW

This document is a mini project report on the high-performance VLSI implementation of a 3-parallel FIR filter using a Vedic multiplier, submitted by students from CMR Engineering College for their Bachelor's degree in Electronics and Communication Engineering. It includes acknowledgements, a declaration of originality, and a comprehensive literature survey on FIR filters and Vedic multipliers, highlighting their importance in digital signal processing. The report emphasizes area optimization, power consumption reduction, and the performance improvements achieved through the proposed design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 72

AN INDUSTRIAL ORIENTED MINI PROJECT REPORT ON

HIGH-PERFORMANCE VLSI IMPLEMENTATION OF 3-


PARALLEL FIR FILTER WITH VEDIC MULTIPLIER

Submitted in partial fulfilment of the requirement for the award of degree of

BACHELOR OF TECHNOLOGY
IN

ELECTRONICS AND COMMUNICATION ENGINEERING


Submitted By

O.NITHIN 218R1A04N9
P.PRAVEEN 218R1A04O0
P.KISHORE 218R1A04O1
P.PURNACHANDU 218R1A04O2

Under the Esteemed Guidance of

Ms.L. LAVANYA
Assistant Professor
ECE DEPARTMENT

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

CMR ENGINEERING COLLEGE

(Approved by AICTE, UGC AUTONOMOUS, Accredited by NBA, NAAC)


Kandlakoya(V), Medchal(M), Telangana.

(2024-2025)
CMR ENGINEERING COLLEGE
(Approved by AICTE, UGC AUTONOMOUS, Accredited by NBA, NAAC)
Kandlakoya (V), Medchal , Telangana.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

CERTIFICATE
This is to certify that the industry oriented mini-project work entitled “HIGH-
PERFORMANCE VLSI IMPLEMENTATION OF 3-PARALLEL FIR FILTER WITH
VEDIC MULTIPLIER” is being submitted O.NITHIN bearing Roll No:218R1A04N9,
P.PRAVEEN bearing Roll No: 218R1A04O0, P.KISHORE bearing Roll No: 218R1A04O1,
P.PURNA CHANDU bearing Roll No:218R1A04O2 in B.Tech IV-I semester, Electronics
and Communication Engineering is a record bonafide work carried out by then during the
academic year 2024-25.The results embodied in this report have not been submitted to any
other University for the award of any degree.

INTERNAL GUIDE HEAD OF THE DEPARTMENT


Ms.L. LAVANYA Dr. SUMAN MISHRA

EXTERNAL EXAMINER
ACKNOWLEDGEMENTS

We sincerely thank the management of our college CMR Engineering College for providing
required facilities during our project work.

We derive great pleasure in expressing our sincere gratitude to our Principal

Dr. A. S. Reddy for his timely suggestions, which helped us to complete the project work
successfully.

It is the very auspicious moment we would like to express our gratitude to

Dr. SUMAN MISHRA, Head of the Department, ECE for his consistent encouragement
during the progress of this project.

We take it as a privilege to thank our project coordinator Mr. K. SUBRAMANYA CHARI,


associate Professor, Department of ECE for the ideas that led to complete the project work
and we also thank him for his continuous guidance, support and unfailing patience,
throughout the course of this work.

We sincerely thank our project internal guide Ms.L.LAVANYA, Assistant Professor of ECE
for guidance and encouragement in carrying out this project work.
DECLARATION
We hereby declare that the mini-project entitled “HIGH-PERFORMANCE VLSI
IMPLEMENTATION OF 3-PARALLEL FIR FILTER WITH VEDIC MULTIPLIER”
is the work done by us in campus at CMR ENGINEERING COLLEGE, Kandlakoya
during the academic year 2024-2025 and is submitted as mini project in partial fulfilment of
the requirements for the award of degree of BACHELOR OF TECHNOLOGY in
ELECTRONICS AND COMMUNICATION ENGINEERING FROM JAWAHARLAL
NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD.

O.NITHIN (218R1A04N9)

P.PRAVEEN (218R1A04O0)

P.KISHORE (218R1A04O1)

P.PURNA CHANDU (218R1A04O2)


CONTENTS
CHAPTERS PAGE
CHAPTER-1 INTRODUCTION 1
1.1 INTRODUCTION 1

CHAPTER-2 LITERAURE SURVEY 2

2.1 DESIGN AND IMPLEMENTATION OF AREA EFFICIENT 2


2-PARALLEL FILTERS ON FPGA USING IMAGE SYSTEM
2.2 SHORT-LENGTH FIR FILTERS AND THEIR USE IN FAST 2
NON-RECURSIVE FILTERING
2.3 LOW-AREA/POWER PARALLEL FIR DIGITAL FILTER 3
IMPLEMENTATION
2.4 EFFICIENT COMPLEXITY REDUCTION TECHIQUE FOR 3
PARALLEL FIR DIGITAL FILTER BASED ON FAST FIR
ALGORITHM
2.5 HARDWARE-EFFICIENT VLSI IMPLMENTATION FOR 4
3-PARALLEL LINEARPHASE FIR DIGITAL FILTER OF
ODD LENGTH
2.6 EXPLOITING COEFFICIENT SYMMETRY IN CONVENTIONAL 5
POLYPHASE FIR FILTERS
2.7 EFFICIENT FIR FILTER DESING USING BOTH MULTIPLIER 5
FOR VLSI APPLICATION
CHAPTER-3 INTRODUCTION TO FILTERS & MULTIPLIERS 7
3.1 PARALLEL FIR FILTER 7
3.2 VEDIC MATHEMATICS 8
3.3 DESIGN IMPLEMENTATION 10
3.3.1 DESIGN OF 2X2 VEDIC MULTIPLIER 11
3.3.2 DESIGN OF 4X4 VEDIC MULTIPLIER 12
3.3.3 DESIGN OF 8X8 VEDIC MULTIPLIER 13
3.3.4 DESIGN OF 16X16 VEDIC MULTIPLIER 14
3.3.5 DESIGN OF 32X32 VEDIC MULTIPLIER 15
CHAPTER-4 INTRODUCTION TO VLSI 17
4.1 INTRODUTION TO VLSI 17
4.2 VLSI DESIGN FLOW 18
4.3 EMERGENCE OF HARDWARE DESCRIPTION LANGUAGE 18
4.4 HISTORY OF VERILOG 19
4.5 BASIC CONCEPTS 19
4.5.1 HARDWARE DESCRIPTION LANGUAGE 19
4.5.2 VERILOG INTRODUCTION 20
4.5.3 VERILOG FEATURES 20
4.6 DESIGN FLOW 20
4.6.1 DESIGN SPECIFICATION 21
4.6.2 RTL DESCRIPTION 21
4.6.3 FUNCTIONAL VERIFICATION & TESTING 21
4.6.4 LOGIC SYNTHESIS 21
4.6.5 LOGICAL VERIFICATION AND TESTING 21
4.6.6 FLOOR PLANNING AUTOMATIC PLACE AND ROUTE 22
4.6.7 PHYSICAL LAYOUT 22
4.6.8 LAYOUT VERIFICATION 23
4.6.9 IMPLEMENTATION 23
4.7 MODULES 23
4.8 PORTS 24
4.8.1 PORT DECLARATION 24
4.8.2 VERILOG KEYWORD TYPE OF PORT 24
4.8.3 PORT CONNECTION RULES 25
CHAPTER-5 SOFTWARE TOOLS 27
5.1 INTRODUCTION TO XILINX ISE 27
CHAPTER-6 PROGRAMMING 36
6.1 CODE 36
CHAPTER-7 IMPLEMENTATION 54
7.1 PROPOSED IMPLEMENTATION 54
CHAPTER-8 RESULTS 58
8.1 SIMULATION RESULT OF 3-PARALLEL FIR-FILTER USING VEDIC MULTIPLIER 58
8.2 PARAMETERS 59
8.2.1 AREA 59
8.2.2 I/O BONDED 59
8.2.3 DELAY 59
8.2.4 POWER CONSUMPTION 59
8.3 APPLICATIONS 60
8.4 ADVANTAGES 60
CHAPTER-9 CONCLUSION & FUTURE SCOPE 61
9.1 CONCLUSION 61
9.2 FUTURE SCOPE 61
9.3 REFERENCE 62
LIST OF FIGURES
NAME OF THE FIGURE
Fig No Page No

3.1 3-PARALLEL ODD-LENGTH FIR FILTER


3.2 2X2 VEDIC MULTIPLIER
3.3 LOGIC DESIGN OF 2X2 MULTIPLIER
3.4 4X4 MULTIPLIER
3.5 8X8 MULTIPLIER
3.6 16X16 MULTIPLIER

3.7 32X32 MULTIPLIER

4.1 LOGICAL VERIFICATION AND TESTING

4.2 MODULE DESIGN

4.3 PORT CONNECTION


7.1 SCHEMATIC DIAGRAMS 64-BIT VEDIC MULTIPLIER
7.2 RTL INTERNAL DIAGRAM OF 3-PARALLEL FIR-FILTER
USING VEDIC MULTIPLIER
7.3 RTL INTERNAL DIAGRAM OF VEDIC 32X32 MULTIPLIER
7.4 RTL INTERNAL DIAGRAM OF VEDIC 16X16 MULTIPLIER
7.5 RTL INTERNAL DIAGRAM OF VEDIC 8X8 MULTIPLIER
7.6 RTL INTERNAL DIAGRAM OF VEDIC 2X2 MULTIPLIER
8.1 SIMULATION RESULT OF 3-PARALLEL FIR-FILTER USING
VEDIC MULTIPLIER

viii
LIST OF TABLES

TABLE NO LIST OF TABLE NAME Pg.No

8.2.1 AREA 59

8.2.2 I/O BONDED 59

8.2.3 DELAY 59

8.2.4 POWER CONSUMPTION 59

ix
ABSTRACT

The most important criteria for the design and implementation of DSP processor is area
optimization and reduction in power consumption. The fundamental block for the design
and implementation of the DSP processor is the Finite Impulse Response Filter. The Finite
Impulse Response (FIR) Filter consists of three basic modules which are adder blocks, flip
flops and multiplier blocks. The performance of the FIR Filter is largely influenced by the
multiplier, which is the slowest block out of all. In this paper, the Finite Impulse Response
Filter has been proposed using Vedic Multiplier and the proposed 3-parallel FIR filters
have been compared for various parameters. An improvement has been obtained both in
terms of area and delay. Also, low power consumption and reduction in terms of delay and
operational frequency of the booth multiplier makes it highly suitable for the designing of
the FIR Filter for low voltage and low power VLSI applications. The adder and multiplier
are two of the most important components in the filter architecture. In recent research,
reports ways of reducing the hardware complexity of parallel poly phase FIR filter
structure. The performance of the multiplier and adder blocks dictates the computational
speed and power dissipation of the entire filter. Accordingly, different types of adders and
multipliers are available in digital circuits.

x
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION
Digital filter plays a vital role in Digital Signal Processing. Finite impulse response (FIR)
filters are the most popular type of digital filter implementation in software. So, based on
the requirements of the application parallel FIR filters are implemented to modify the
sample rate or power consumption. Digital parallel FIR filters can benefit from minimizing
power consumption and boosting throughput. In recent decades, the FIR filter has been in
the focus of researchers . While parallel FIR filters have received a lot of attention in the
literature, most of it focusses on minimizing the count of multipliers by employing the fast
FIR algorithm. With aim of complexity improvement Conventional method, thus
implements fast FIR algorithms (FFA) by iterating small-sized filtering structures and
reducing the count of computational (i.e., adder and multiplier) units by reducing the
number of involved parallel sub filter units.
Further, for the implementation of linear phase parallel FIR filters updated FFA were
developed. In particular, concept of symmetric coefficients was used in these algorithms
recommended for odd-length FIR filters, leading to reduction in the count of multipliers in
the sub filter units by half, while increasing the number of adders in the pre/post
processing blocks. uses the polyphase coefficient symmetry property in parallel FIR filter
structure. The adder and multiplier are two of the most important components in the filter
architecture. Recent researches reports ways of reducing the hardware complexity of
parallel polyphase FIR filter structure. The performance of the multiplier and adder blocks
dictates the computational speed and power dissipation of the entire filter. Accordingly,
different types of adders and multipliers are available in digital circuits.
In authors have investigated an alternative, technology independent approach to design
area-efficient parallel polyphase FIR filter for DSP applications as the FIR filter’s
performance relies critically on the multipliers and adders. Motivated by the above
discussion, in this paper, we have applied Ripple carry, Carry lookahead and Brent Kung
adder on the structure given in. We have followed it by the usage of different multipliers
like Vedic multiplier .

1
CHAPTER 2
LITERATURE SURVEY

2.1 Design and implementation of area efficient 2-parallel filters on


FPGA using image system.

Parallel FIR filter is mostly used among various types of filter in Digital Signal Processing
(DSP). This paper shows the design of area-efficient 2-parallel FIR filter using VHDL and
its implementation on FPGA using image system. This paper gives the details basic blocks
of area-efficient 2- parallel FIR digital filter. In this paper proposed 2-parallel digital FIR
filter and area-efficient 2-parallel FIR filter are explained. Its simulation using Xilinx 14.2
are also discussed. It also presents the FPGA implementation of primary 2-parallel filter
and area-efficient 2-parallel on Xilinx 14.2 Spartan 3E Starter Board XC3S500E chips and
its results. Since adders are light weight in silicon area when compare with the multipliers,
therefore multipliers are replaced by the adder to reduce area and delay of the parallel FIR
filter. Xilinx ISE is used for simulating the design of the filter.

2.2 Short-length FIR filters and their use in fast non-recursive filtering.

This paper provides the basic tools required for an efficient use of the recently proposed
fast FIR algorithms. These algorithms not only reduce arithmetic complexity but also
partially maintain the multiply-accumulate structure, thus resulting in efficient
implementations. A set of basic algorithms is derived, together with some rules for
combining them. Their efficiency is compared with that of classical schemes in the case of
three different criteria, corresponding to various types of implementation. It is shown that
this class of algorithms (which includes classical ones as special cases) makes it possible to
find the best tradeoff corresponding to any criterion.

2
2.3 Low-area/power parallel FIR digital filter implementation.

This paper presents a novel approach for implementing area-efficient parallel (block) finite
impulse response (FIR) filters that require less hardware than traditional block FIR filter
implementations. Parallel processing is a powerful technique because it can be used to
increase the throughput of a FIR filter or reduce the power consumption of a FIR filter.
However, a traditional block filter implementation causes a linear increase in the hardware
cost (area) by a factor of L, the block size. In many design situations, this large hardware
penalty cannot be tolerated. Therefore, it is important to design parallel FIR filter structures
that require less area than traditional block FIR filtering structures. In this paper, we
propose a method to design parallel FIR filter structures that require a less-than-linear
increase in the hardware cost. A novel adjacent coefficient sharing based sub-structure
sharing technique is introduced and used to reduce the hardware cost of parallel FIR filters.
A novel coefficient quantization technique, referred to as a scalable maximum absolute
difference (MAD) quantization process, is introduced and used to produce quantized filters
with good spectrum characteristics. By using a combination of fast FIR filtering
algorithms, a novel coefficient quantization process and area reduction techniques, we
show that parallel FIR filters can be implemented with up to a 45% reduction in hardware
compared to traditional parallel FIR filters.

2.4 Efficient complexity reduction technique for parallel FIR digital


Filter based on Fast FIR algorithm.

The objective of the paper is to reduce the hardware complexity of higher order FIR
filter with symmetric coefficients. The aim is to design efficient Fast Finite-Impulse
Response (FIR) Algorithms (FFAs) for parallel FIR filter structure with the
constraint that the filter tap must be a multiple of 2. In our work we have briefly
discussed for L = 4 parallel implementation. The parallel FIR filter structure based on
proposed FFA technique has been implemented based on carry save and ripple carry
adder for further optimization.

The reduction in silicon area complexity is achieved by eliminating the bulky multiplier
with an adder namely ripple carry and carry save adder.

3
For example, for a 6-parallel 1024-tap filter, the proposed structure saves 14
multipliers at the expense of 10 adders, whereas for a six-parallel 512-tap filter, the
proposed structure saves 108 multipliers at the expense of 10 adders. Overall, the
proposed parallel FIR structures can lead to significant hardware savings for
symmetric coefficients from the existing FFA parallel FIR filter, especially when the
length of the filter is very large.

2.5 Hardware-efficient VLSI implementation for 3-parallel linear-


phase FIR digital filter of odd length.

Based on fast FIR algorithms (FFA), this paper proposes new 3-parallel finite-impulse
response (FIR) filter structures, which are beneficial to symmetric convolutions of odd
length in terms of the hardware cost. The proposed 3- parallel FIR structures exploit
the inherent nature of the symmetric coefficients of odd length, according to the length
of filter, (N mod 3), reducing half the number of multipliers in sub filter section at the
expense of additional adders in preprocessing and postprocessing blocks. The overhead
from the additional adders in preprocessing and postprocessing blocks stay fixed, not
increasing along with the length of the FIR filter, whereas the number of reduced
multipliers increases along with the length of the FIR filter. For example, for a 81-tap
filter, the proposed A structure saves 26 multipliers at the expense of 5 adders, whereas
for a 591-tap filter, the proposed structure saves 196 multipliers at the expense of 5
adders still. Overall, the proposed 3-parallel FIR structures can lead to significant
hardware savings for symmetric coefficients of odd length from the existing FFA
parallel FIR filter, especially when the length of the filter is large.

4
2.6 Exploiting coefficient symmetry in conventional polyphase FIR
filters.

The conventional polyphase architecture for linear-phase finite impulse response (FIR)
filter loses its coefficient symmetry property due to the inefficient arrangement of the
filter coefficients among its sub filters. Although, existing polyphase structures can
avail the benefits of coefficient symmetry property, at the cost of versatility and
complex sub filters arrangement of the conventional polyphase structure. To address
these issues, in this paper, we first present the mathematical expressions for inherent
characteristics of the conventional polyphase structure. Thereafter, we use these
expressions to develop a generalized mathematical framework which exploits
coefficient symmetry by retaining the direct use of conventional FIR filter coefficients.
Further, the transfer function expressions for the proposed Type-1/ transposed Type-1
polyphase structures using coefficient symmetry are derived. The proposed structures
can reduce the requirement of multiplier units in polyphase FIR filters by half. We also
demonstrate the decimator design using the proposed Type-1 polyphase structure and
the interpolator design using the proposed transposed Type-1 polyphase structure.
Moreover, the phase and magnitude characteristics of the proposed Type-1/transposed
Type-1 polyphase structures are presented. It is revealed via numerical examples that
all sub filters of the proposed symmetric polyphase structure possess linear-phase
characteristics.

2.7 Efficient FIR filter design using Booth multiplier for VLSI
applications.

The most important criteria for the design and implementation of DSP processor is
area optimization and reduction in power consumption. The fundamental block for the
design and implementation of the DSP processor is the Finite Impulse Response Filter.
The Finite Impulse Response (FIR) Filter consists of three basic modules which are
adder blocks, flip flops and multiplier blocks .The performance of the FIR Filter is
largely influenced by the multiplier, which is the slowest block out of all. In this paper,
the Finite Impulse Response Filter has been proposed using two different multipliers

5
namely Array multiplier and Booth Multiplier and both the proposed FIR filters have
been compared for various parameters.

The proposed filters are designed using Verilog HDL and is implemented using Xilinx
14.7 ISE tools. An improvement has been obtained both in terms of area and delay.
Also low power consumption and reduction in terms of delay and operational
frequency of the booth multiplier makes it highly suitable for the designing of the FIR
Filter for low voltage and low power VLSI applications.

6
CHAPTER 3

INTRODUCTION TO VLSI & MULTIPLIERS


3.1 PARALLEL FIR FILTER

In general form an N- tap FIR filter can be expressed as.

Polyphase decomposition is a typical approach for realizing FIR digital filter structures, in

which small parallel FIR filter blocks are initially created, and then large block-sized ones

are built by cascading or iterating small parallel FIR filter blocks. Polyphase

decomposition can be used to derive the traditional M-parallel FIR filter can as

Where Xq(z) = Polyphase component of the input

Er(z) = Polyphase component of filter transfer function.

Y p(z) = Polyphase component of the output. For p, q, and r = 0; 1; 2; ::::::M􀀀1. The 3-

parallel polyphase FFA based odd length FIR filter can be derived as

7
FIG 3.1 3-Parallel Odd-Length FIR Filter

polyphase FIR filter contains, three basic building blocks: a multiplier, an adder, and few

delay elements. Multiplier block contributes lion share of maximum delay in the design

which instigates optimization requirement of both the adder and the multiplier.

3.2 VEDIC MATHEMATICS

Vedic mathematics is part of four Vedas (books of wisdom). It is part of Sthapatya-


Veda (book on civil engineering and architecture), which is an upa-veda
(supplement) of Atharva Veda. It gives explanation of several mathematical terms
including arithmetic, geometry (plane, co-ordinate), trigonometry, quadratic equations,
factorization and even calculus.
8
His Holiness Jagadguru Shankaracharya Bharati Krishna Teerthaji Maharaja (1884-
1960) comprised all this work together and gave its mathematical explanation while
discussing it for various applications. Swamiji constructed 16 sutras (formulae) and
16 Upa sutras (sub formulae) after extensive research in Atharva Veda. Obviously these
formulae are not to be found in present text of Atharva Veda because these formulae
were constructed by Swamiji himself. Vedic mathematics is not only a mathematical
wonder but also it is logical. That’s why it has such a degree of eminence which
cannot be disapproved. Due these phenomenal characteristics, Vedic maths has already
crossed the boundaries of India and has become an interesting topic of research abroad.
Vedic maths deals with several basic as well as complex mathematical operations.
Especially, methods of basic arithmetic are extremely simple and powerful.

The word “Vedic” is derived from the word “Veda” which means the store-house
of all knowledge. Vedic mathematics is mainly based on 16 Sutras (or aphorisms)
dealing with various branches of mathematics like arithmetic, algebra, geometry etc.

These Sutras along with their brief meanings are enlisted below alphabetically.

1. (Anurupye) Shunyamanyat – If one is in ratio, the other is zero.

2. Chalana-Kalanabyham – Differences and Similarities.

3. Ekadhikina Purvena – By one more than the previous One.

4. Ekanyunena Purvena – By one less than the previous one.

5. Gunakasamuchyah – The factors of the sum is equal to the sum of the factors.

6. Gunitasamuchyah – The product of the sum is equal to the sum of the product.

7. Nikhilam Navatashcaramam Dashatah – All from 9 and last from 10.

8. Paraavartya Yojayet – Transpose and adjust.

9. Puranapuranabyham – By the completion or noncompletion.

10. Sankalana- vyavakalanabhyam – By addition and by subtraction.

11. Shesanyankena Charamena – The remainders by the last digit.

12. Shunyam Saamyasamuccaye – When the sum is the same that sum is zero.

13. Sopaantyadvayamantyam – The ultimate and twice the penultimate.

14. Urdhva-tiryagbhyam – Vertically and crosswise.


9
15. Vyashtisamanstih – Part and Whole.

16. Yaavadunam – Whatever the extent of its deficiency.

These methods and ideas can be directly applied to trigonometry, plain and
spherical geometry, conics, calculus (both differential and integral), and applied
mathematics of various kinds. As mentioned earlier, all these Sutras were
reconstructed from ancient Vedic texts early in the last century. Many Sub-sutras were also
discovered at the same time, which are not discussed here. The beauty of Vedic
mathematics lies in the fact that it reduces the otherwise cumbersome-looking
calculations in conventional mathematics to a very simple one. This is so because the
Vedic formulae are claimed to be based on the natural principles on which the human mind
works. This is a very interesting field and presents some effective algorithms which
can be applied to various branches of engineering such as computing and digital signal
processing [ 1,4]. The multiplier architecture can be generally classified into three
categories. First is the serial multiplier which emphasizes on hardware and minimum
amount of chip area. Second is parallel multiplier (array and tree) which carries out
high speed mathematical operations. But the drawback is the relatively larger chip
area consumption. Third is serial- parallel multiplier which serves as a good trade-off
between the times consuming serial multiplier and the area consuming parallel multipliers.

3.3 DESIGN IMPLEMENTATION

The Arithmetic module is split into smaller modules, which is multiplier and arithmetic
module. These three modules are implemented using Verilog HDL. The 2x2 bit multiplier
is obtained by "Vertical- crosswise Algorithm" based on Urdhva Tiryakbhyam Sutra. The
basic 2x2 bit multiplier is designed first using verilog code and then, 4x4 blocks were
designed using 2x2 blocks further 8x8 bits multiplier from 4-bit multiplier blocks and
conclusively Multiplication of 16x16 bit is obtained with final 16-bit multiplier.

10
3.3.1 Design of 2x2 vedic multiplier:

FIG 3.2 2x2 vedic multiplier

Figure illustrates the steps to to multiply two 2 bit numbers . Converting the above figure
to a hardware equivalent we have 3 and gates which will act as 2 bit multipliers and two
half adders to add the products to get the final product. Here is the hardware detail of the
multiplier.

11
FIG 3.3 Logic design of 2x2 multiplier

3.3.2 Design of 4X4 multiplier:


Using 4 such 2x2 multipliers and 3 adders we can built 4x4 bit multipliers as shown in the
design. Proper instantiating of the 2x2 multipliers and adders. We have to first write code
for 4bit and 6 bit adders. Its your choice to choose your adders. If in case you want to have
better performance you can replace these normal adders with CSA or compressors. For a
simpler design we have used the "+" operator which is supported by the XST synthesis
tool which by default selects a low hardware adder. Arrangement of the adders and the
addition is explained from the figure shown below:

12
FIG 3.4 4X4 multiplier

3.3.3 Design of 8X8 multiplier:

Similar to the previous design of 4x4 multiplier , we need 4 such 4x4 multipliers to
develop 8x8 multipliers. Here we need to first design 8bit and 12 bit adders and by proper
instantiating of the module and connections as shown in the figure we have designed a 8x8
bit multiplier. At this point of time its necessary for you to even verify the RTL code and
check if the hardware is as per your design. PlanAhead tool by xilinx gives better view of
the hardware design with design elaborate option(will explain this in my next posts). Refer
the addition tree diagram to know the process for 8x8 multiplier:

13
FIG 3.5 8x8 multiplier

3.3.4 Design of 16x16 multiplier:

The design of 16×16 block is a similar arrangement of 8×8 blocks in an optimized

manner .The first step in the design of 16×16 block will be grouping the 8 bit (byte) of

each 16 bit input. These lower and upper bytes pairs of two inputs will form vertical and

crosswise product terms. Each input byte is handled by a separate 8×8 Vedic multiplier to

produce sixteen partial product rows. These partial products rows are added in a 16-bit

carry look ahead adder optimally to generate final product bits.The schematic of a 16×16

block designed using 8×8 blocks. The partial products represent the Urdhva verticaland

cross product terms.

14
FIG 3.6 16x16 multiplier

3.3.5 Design of 32X32 Multiplier:

The design of 32×32 block is a similar arrangement of 16×16 blocks in an optimized

manner .The first step in the design of 32×32 block will be grouping the 16 bit (byte) of

each 32 bit input. These lower and upper bytes pairs of two inputs will form vertical and

crosswise product terms. Each input byte is handled by a separate 16×16 Vedic multiplier

to produce sixteen partial product rows. These partial products rows are added in a 32-bit

carry look ahead adder optimally to generate final product bits. The schematic of a 32×32

block designed using 8×8 blocks. The partial products represent the Urdhva vertical and

cross product terms.

15
FIG 3.6 32x32 multiplier

16
CHAPTER 4

INTRODUCTION TO VLSI

4.1 INTRODUCTION TO VLSI

Digital systems are highly complex at their most detailed level. They may consist of
millions of elements i.e., transistors or logic gates. For many decades, logic schematics
served as then Gur Franca of logic design, but not anymore. Today, hardware complexity
has grown to such a degree that a schematic with logic gates is almost useless as it shows
only a web of connectivity and not functionality of design. Since the 1970s, computer
engineers, electrical engineers and electronics engineers have moved toward Hardware
description language (HDLs).

Digital circuit has rapidly evolved over the last twenty five years. The earliest digital
circuits were designed with vacuum tubes and transistors. Integrated circuits were then
invented where logic gates were placed on a single chip. The first IC chip was small scale
integration (SSI) chips where the gate count is small. When technology became
sophisticated, designers were able to place circuits with hundreds of gates on a chip. These
chips were called MSI chips with advent of LSI; designers could put thousands of gates on
a single chip. At this point, design process is getting complicated and designers felt the
need to automate these processes.

With the advent of VLSI technology, designers could design single chip with more than
hundred thousand gates. Because of the complexity of these circuits computer aided
techniques became critical for verification and for designing these digital circuits.

One way to lead with increasing complexity of electronic systems and the increasing time
to market is to design at high levels of abstraction. Traditional paper and pencil and
capture and simulate methods have largely given way to the described UN synthesized
approach.

For these reasons, hardware description languages have played an important role in
describe and synthesis design methodology. They are used for specification, simulation
and synthesis of an electronic system. This helps to reduce the complexity in designing and
products are made to be available in market quickly.

17
The components of a digital system can be classified as being specific to an application or
as being standard circuits. Standard components are taken from a set that has been used in
other systems. MSI components are standard circuits and their use results in a significant
reduction in the total cost as compared to the cost of using SSI Circuits. In contrasts,
specific components are particular to the system being implemented and are not commonly
found among the standard components.

The implementation of specific circuits with LSI chips can be done by means of IC that
can be programmed to provide the required logic.

4.2 VLSI DESIGN FLOW

Typical design flow for designing VLSI circuits is shown in the tool flow diagram. This
design flow is typically used by designers who use HDLs. In any design, specification is
first. Specification describes the functionality, interface and overall architecture of the
digital circuit to be designed. At this point, architects need not think about how they will
implement their circuit. A behavioral description is then created to analyze the design in
terms of functionality, performances and other high level issues. The behavioral
description is manually converted to an RTL (Register Transfer Level) description in an
HDL. The designer has to describe the data flow that will implement the desired digital
circuit. From this point onward the design process is done with assistance of CAD tools.

Logic synthesis tools convert the RTL description to a gate level net list. A gate level net
list is a description of the circuit in terms of gates and connections between them. The gate
level net list is input to an automatic place and route tool, which creates a layout. The
layout is verified and then fabricated on a chip. Thus most digital design activity is
concentrated on manually optimizing the RTL description of the circuit. After the RTL
description is frozen, CAD tools are available to assist the designer in further process
Designing at RTL level has shrunk design cycle times from years to a few months.

4.3 EMERGENCE OF HARDWARE DESCRIPTION LANGUAGE

As designs got larger and complex, logic simulation assumed an important role in design
process. For a long time, programming languages such as fortran, pascal & c were been
used to describe the computer programs that were been used to describe the computer
18
programs that were sequential in nature. Similarly in digital design field, designers felt the
need for a standard language to describe digital circuits. Thus HDL is came in to existence.
HDLs allowed the designers to model the concurrency of processes found in hardware
elements. HDLs such as VERILOG HDL & VHDL (Very high speed integrated circuit
hardware description language).

4.4 HISTORY OF VERILOG

Verilog was started in the year 1984 by Gateway Design Automation Inc as a proprietary
hardware modeling language. It is rumored that the original language was designed by
taking features from the most popular HDL language of the time, called HiLo, as well as
from traditional computer languages such as C. At that time, Verilog was not standardized
and the language modified itself in almost all the revisions that came out within 1984 to
1990.

Verilog simulator first used in 1985 and extended substantially through 1987.The
implementation of Verilog simulator sold by Gateway. The first major extension of Verilog
is Verilog-XL, which added a few features and implemented the infamous "XL algorithm"
which is a very efficient method for doing gate-level simulation. Later 1990, Cadence
Design System, whose primary product at that time included thin film process simulator,
decided to acquire Gateway Automation System, along with other Gateway products.,
Cadence now become the owner of the Verilog language, and continued to market Verilog
as both a language and a simulator. At the same time, Synopsys was marketing the top-
down design methodology, using Verilog. This was a powerful combination.

In 1990, Cadence organized the Open Verilog International (OVI), and in 1991 gave it the
documentation for the Verilog Hardware Description Language. This was the event which
"opened" the language.

4.5 BASIC CONCEPTS

4.5.1 Hardware Description Language

Two things distinguish an HDL from a linear language like “C”: Concurrency:

19
• The ability to do several things simultaneously i.e. different code-blocks can run
concurrently.

Timing:

• Ability to represent the passing of time and sequence events accordingly

4.5.2 VERILOG Introduction

• Verilog HDL is a Hardware Description Language (HDL).

• A Hardware Description Language is a language used to describe a digital system; one


may describe a digital system at several levels.

• An HDL might describe the layout of the wires, resistors and transistors on an Integrated
Circuit (IC) chip, i.e., the switch level.

• It might describe the logical gates and flip flops in a digital system, i.e., the gate level.

• An even higher level describes the registers and the transfers of vectors of information
between registers. This is called the Register Transfer Level (RTL).

• Verilog supports all of these levels.

• A powerful feature of the Verilog HDL is that you can use the same language for
describing, testing and debugging your system.

4.5.3 VERILOG Features

• Strong Background: Supported by open verilog international and Institute of Electrical


and Electronics Engineering standardized.

• Industrial support: Simulation is very fast and synthesis is very efficient. • Universal:
Entire process is allowed in one design environment.

• Extensibility: It also allows Verilog PLI for extension of Verilog capabilities

4.6 DESIGN FLOW

The typical design flow is shown in figure,


20
4.6.1 Design Specification

• The project Specifications and requirements are written first

• The digital circuit functionality is explained for the architecture to be designed.

• Specification: It uses wave former, test bencher or word for drawing waveform.

4.6.2 RTL Description

• CAD Tools are used for coding format for the Conversation of Specification.

Coding Styles:

• Gate Level Modeling

• Data Flow Modeling

• Behavioral Modeling

4.6.3 Functional Verification &Testing

• The method of coding with respective inputs and outputs are going to be tested.

• Check the RTL Description once again if testing fails.

• Simulation: Using Xilinx , Verilog-XL ,ModelSim,

4.6.4 Logic Synthesis

• RTL description into Gate level -Net list form conservation.

• The circuit is described as a function of gates and connections.

• Synthesis: Synthesis is done by Altera and Xilinx ,Simplify Pro, Leonardo Spectrum
Design Compiler, FPGA Compiler.

4.6.5 Logical Verification and Testing

• Simulation and synthesis are used for functional Checking of HDL coding. Check the
RTL description if fails.

21
FIG 4.1 Logical Verification and Testing

4.6.6 Floor Planning Automatic Place and Route

• Layout is created with the respective gate level Net list.

• the blocks of the net list are arranged on the chip.

• Place & Route: Implement FPGA vendors P&R tool for FPGA. Very costly P&R tools
like Apollo required for ASIC tools.

4.6.7 Physical Layout

• The process of describing a circuit description into the physical layout is called the
Physical design, it explains the interconnections between the cells and routes position.

4.6.8 Layout Verification

• Under layout verification first the physical layout structure has to be verified.

22
• Floor Planning Automatic Place and Route and RTL Description can be done for any
modifications.

4.6.9 Implementation

• The design process is the final stage in implementation.

• coding and RTL can be Implemented using Integrated circuits.

4.7 MODULES

Distinct parts of a Verilog module consists of are as shown in below figure. The keyword
module is the beginning of a module definition. In a module definition the module name,
port list, port declarations, and optional parameters must come first in its definition. If the
module has any ports to interact with the external environment then only Port list and port
declarations are present. There are five components within a module

• variable declarations,

• dataflow statements

• instantiation of lower modules

• behavioral blocks

• tasks or functions.

The following components may appear in any order and at any place in a given module
definition.

The end module statement must always come last in a module definition. All components
except module, module name, and end module are optional and can be mixed and matched
as per design needs. Multiple modules definition in a single file are allowed by Verilog. In
the file the modules can be defined in any order.

23
FIG 4.2 Module Design

Example Module Structure:

module <module name>(<module_terminals_list>);

….. <module internals> ….

Endmodule

4.8 PORTS

The interface between a module and its environment is provided by the ports. The
input/output pins of an Integrated Circuit chip are its ports. The environment cannot see
the internals of the module. It is a great advantage for the designer. As long as the interface
is not modified the internals of the module may be mod ifiedwithout affecting its
environment. Terminals are the synonyms for the ports.

4.8.1 Port Declaration

the module may contain the declaration of all ports in the given list of ports . The
declaration of ports are explained in detail below.

4.8.2 Verilog Keyword Type of Port

Input:- Input port

Output:- Output port

24
inout :-Bidirectional port depending on the direction of the port signal, each port in the port
list is given a label as follows input, output, or inout.

4.8.3 Port Connection Rules

A port consisting of two units, primary unit is into the module and secondary unit is out of
the module. The primary and secondary units are connected. When modules are
instantiated within other modules there are rules governing port connections within
module. If any port connection rules are violated then the Verilog simulator complains.
The figure 5.6 shows the port connection rules.

FIG 4.3 Port Connection Rules

Inputs:

• Internally must be of net data type (e.g. wire)

• Externally the inputs may be connected to a reg or net data type

Outputs:

• Internally may be of net or reg data type

• Externally must be connected to a net data type

Inouts:

• Internally must be of net data type (tri recommended)

• Externally must be connected to a net data type (tri recommended)

Ports Connection to External Signals

Signals and Ports in a module can be connected in two ways. In the module definition
those two methods cannot be mixed.

25
• Port by order list

• Port by name

Port by order list

Most spontaneous method for learners is the Connecting port by order list. The order in
which the ports in the ports list in the module definition must be connected in the same
order.

Syntax for instantiation with port order list:

module name instance name (signal, signal...);

The external signals a, b, out appear in exactly the same order as the ports a, b, out in the
module defined in adder in the below example.

Example

Port by name

For larger designs where the module have say 30 ports ,it is almost impractical and
possibility of errors if remembering the order of the ports in the module definition. There
is capability to connect external signals to ports by the port names, rather than by position
provided by the Verilog. Syntax for instantiation with port name:

Module name instance name (.port name(signal), .port name (signal)… );

26
The port connections in any order as long as the port name in the module definition
correctly matches the external signal.

CHAPTER 5

SOFTWARE TOOLS

5.1 Introduction to XILINX ISE

Create a New Project Create a new ISE project which will target the FPGA device on the
Spartan-3 Startup Kit demo board. To create a new project: 1. Select File > New Project...
The New Project Wizard appears. 2. Type tutorial in the Project Name field. 3. Enter or
browse to a location (directory path) for the new project. A tutorial subdirectory is created
automatically. 4. Verify that HDL is selected from the Top-Level Source Type list. 5. Click
Next to move to the device properties page. 6. Fill in the properties in the table as shown
below:

♦ Product Category: All

♦ Family: Spartan3

♦ Device: XC3S200

♦ Package: FT256

♦ Speed Grade: -4

♦ Top-Level Source Type: HDL

♦ Synthesis Tool: XST (VHDL/Verilog)

Simulator: ISE Simulator (VHDL/Verilog)

♦ Preferred Language: Verilog (or VHDL)

♦ Verify that Enable Enhanced Design Summary is selected.

27
Leave the default values in the remaining fields. When the table is complete, your project
properties will look like the following:

Creating a new Project and Source

Start the Xilinx ISE 8.1i project navigator by double clicking the Xilinx ISE 8.1i icon on
your desktop.

Xilinx ISE 8.1i.lnk

Click on File and select New Project

Select a project location and type the name you would like to call your project counter:

28
Click next

Click New Source

Click next

Select Verilog Module in the New Source Wizard window:

29
Creating a Verilog Source:

Create the top-level Verilog source file for the project as follows:

1. Click New Source in the New Project dialog box.

2. Select Verilog Module as the source type in the New Source dialog box.

3. Type in the file name counter.

4. Verify that the Add to Project checkbox is selected.

5. Click Next.

6. Declare the ports for the counter design by filling in the port information as shown
below:

30
The source file containing the counter module displays in the Workspace, and the counter
displays in the Sources tab, as shown below:

Using Language Templates (Verilog):

31
The next step in creating the new source is to add the behavioral description for counter.
Use a simple counter code example from the ISE Language Templates and customize it for
the counter design. 1. Place the cursor on the line below the output [3:0] COUNT_OUT;
statement. 2. Open the Language Templates by selecting Edit → Language Templates…
Note: You can tile the Language Templates and the counter file by selecting Window →
Tile Vertically to make them both visible. 3. Using the “+” symbol, browse to the
following code example: Verilog → Synthesis Constructs → Coding Examples →
Counters → Binary → Up/Down Counters → Simple Counte.

4. With Simple Counter selected, select Edit → Use in File, or select the Use Template in
File toolbar button. This step copies the template into the counter source file.

5. Close the Language Templates

Design Simulation:

Verifying Functionality using Behavioral Simulation Create a test bench waveform


containing input stimulus you can use to verify the functionality of the counter module.
The test bench waveform is a graphical view of a test bench. Create the test bench
waveform as follows:

1. Select the counter HDL file in the Sources window.

2. Create a new test bench source by selecting Project → New Source.

3. In the New Source Wizard, select Test Bench WaveForm as the source type, and type
counter_tbw in the File Name field.

4. Click Next.

5. The Associated Source page shows that you are associating the test bench waveform
with the source file counter. Click Next.

6. The Summary page shows that the source will be added to the project, and it displays the
source directory, type, and name. Click Finish.

32
7. You need to set the clock frequency, setup time and output delay times in the Initialize
Timing dialog box before the test bench waveform editing window opens. The
requirements for this design are the following:

♦ The counter must operate correctly with an input clock frequency = 25 MHz.

♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK

♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK. The
design requirements correspond with the values below. Fill in the fields in the Initialize
Timing dialog box with the following information:

♦ Clock High Time: 20 ns.

♦ Clock Low Time: 20 ns.

♦ Input Setup Time: 10 ns.

♦ Output Valid Delay: 10 ns. ♦ Offset: 0 ns.

♦ Global Signals: GSR (FPGA) Note: When GSR(FPGA) is enabled, 100 ns. is added to
the Offset value automatically.

♦ Initial Length of Test Bench: 1500 ns.

33
Click Finish to complete the timing initialization. 9. The blue shaded areas that precede the
rising edge of the CLOCK correspond to the Input Setup Time in the Initialize Timing
dialog box. Toggle the DIRECTION port to define the input stimulus for the counter
design as follows:

♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so that the
counter will count up.

♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION low so that the
counter will count down.

34
35
CHAPTER 6

PROGRAMMING

6.1 Code
`timescale 1ns / 1ps

//////////////////////////////////////////////////////////////////////////////////

// Company:

// Engineer:

//

// Create Date: 17:56:02 08/21/2023

// Design Name:

// Module Name: fir_filter

// Project Name:

// Target Devices:

// Tool versions:

// Description:

//

// Dependencies:

//

// Revision:

// Revision 0.01 - File Created

// Additional Comments:

//

//////////////////////////////////////////////////////////////////////////////////

module FIR_FILTER(input clk , input rst , input [31:0]X , output [63:0]Y1,Y2,Y3

36
);

wire [63:0]m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,y1,y2,y3,L0,L1,L2,L3,temp,temp1;

wire [31:0]x1,x2,x3,x4;

wire [31:0] b0,b1,b2,b3,b4,i1,i2,i3,i4;

assign b0=32'd1;

assign b1=32'd2;

assign b2=32'd3;

assign i1=3*X;

assign i2=3*X+1;

assign i3=3*X+2;

assign Y1=3*y1;

assign Y2=3*y2+1;

assign Y3=3*y3+2;

dff_pipeline dff_0 (.clk(clk) ,.rst(rst),.din(i1),.dout(x1[31:0]));

dff_pipeline dff_1 (.clk(clk) ,.rst(rst),.din(i2),.dout(x2[31:0]));

dff_pipeline dff_2 (.clk(clk) ,.rst(rst),.din(i3),.dout(x3[31:0]));

//1-parallel

vedic_32x32 p11(i1,b0,m1);

vedic_32x32 p12(x2,b2,m2);

vedic_32x32 p13(x3,b1,m3);

add_64bit p1(L0,m1,m2);

add_64bit p2(y1,L0,m3);

//2-parallel

//assign i4=i1+i2;

add_32_bit z12(i4,i1,i2);

37
vedic_32x32 p21(i4,b0,m4); //e0

vedic_32x32 p22(i4,b1,m5); //e1

vedic_32x32 p23(x3,b1,m6); //e2

vedic_32x32 p24(x3,b2,m7); //e1

vedic_32x32 p31(i1,b2,m8);

vedic_32x32 p32(i2,b1,m9);

vedic_32x32 p33(i3,b0,m10);

add_64bit p3(L1,m4,m5); // e0+e1

assign temp=L1-m9-m1; //

add_64bit p4(L2,m6,m7); //e2+e1

assign temp1=L2-m3;

add_64bit p5(y2,temp,temp1);

add_64bit p6(L3,m8,m9);

add_64bit p7(y3,L3,m10);

endmodule

module vedic_32x32(a,b,c);

input [31:0]a;

input [31:0]b;

output [63:0]c;

wire [31:0]q0;

wire [31:0]q1;

wire [31:0]q2;

wire [31:0]q3;

wire [31:0]temp1;

wire [47:0]temp2;

wire [47:0]temp3;

38
wire [47:0]temp4;

wire [31:0]q4;

wire [47:0]q5;

wire [47:0]q6;

// using 4 16x16 multipliers

vedic_16x16 z1(a[15:0],b[15:0],q0[31:0]);

vedic_16x16 z2(a[31:16],b[15:0],q1[31:0]);

vedic_16x16 z3(a[15:0],b[31:16],q2[31:0]);

vedic_16x16 z4(a[31:16],b[31:16],q3[31:0]);

// stage 1 adders

assign temp1 ={16'b0,q0[31:16]};

add_32_bit z5(q4,q1[31:0],temp1);

assign temp2 ={16'b0,q2[31:0]};

assign temp3 ={q3[31:0],16'b0};

add_48_bit z6(q5,temp2,temp3);

assign temp4={16'b0,q4[31:0]};

//stage 2 adder

add_48_bit z7(q6,temp4,q5);

// fnal output assignment

assign c[15:0]=q0[15:0];

assign c[63:16]=q6[47:0];

endmodule

module vedic_16x16(a,b,c);

input [15:0]a;

input [15:0]b;

output [31:0]c;

39
wire [15:0]q0;

wire [15:0]q1;

wire [15:0]q2;

wire [15:0]q3;

wire [15:0]temp1;

wire [23:0]temp2;

wire [23:0]temp3;

wire [23:0]temp4;

wire [15:0]q4;

wire [23:0]q5;

wire [23:0]q6;

// using 4 8x8 multipliers

vedic_8x8 z1(a[7:0],b[7:0],q0[15:0]);

vedic_8x8 z2(a[15:8],b[7:0],q1[15:0]);

vedic_8x8 z3(a[7:0],b[15:8],q2[15:0]);

vedic_8x8 z4(a[15:8],b[15:8],q3[15:0]);

// stage 1 adders

assign temp1 ={8'b0,q0[15:8]};

add_16_bit z5(q4,q1[15:0],temp1);

assign temp2 ={8'b0,q2[15:0]};

assign temp3 ={q3[15:0],8'b0};

add_24_bit z6(q5,temp2,temp3);

assign temp4={8'b0,q4[15:0]}

//stage 2 adder

add_24_bit z7(q6,temp4,q5);

40
// fnal output assignment

assign c[7:0]=q0[7:0];

assign c[31:8]=q6[23:0];

endmodule

module vedic_8x8(a,b,c);

input [7:0]a;

input [7:0]b;

output [15:0]c;

wire [7:0]q0;

wire [7:0]q1;

wire [7:0]q2;

wire [7:0]q3;

wire [7:0]temp1;

wire [11:0]temp2;

wire [11:0]temp3;

wire [11:0]temp4;

wire [7:0]q4;

wire [11:0]q5;

wire [11:0]q6;

// using 4 4x4 multipliers

vedic_4x4 z1(a[3:0],b[3:0],q0[7:0]);

vedic_4x4 z2(a[7:4],b[3:0],q1[7:0]);

vedic_4x4 z3(a[3:0],b[7:4],q2[7:0]);

vedic_4x4 z4(a[7:4],b[7:4],q3[7:0]);

// stage 1 adders

assign temp1 ={4'b0,q0[7:4]};

41
add_8_bit z5(q4,q1[7:0],temp1);

assign temp2 ={4'b0,q2[7:0]};

assign temp3 ={q3[7:0],4'b0};

add_12_bit z6(q5,temp2,temp3);

assign temp4={4'b0,q4[7:0]};

// stage 2 adder

add_12_bit z7(q6,temp4,q5);

// fnal output assignment

assign c[3:0]=q0[3:0];

assign c[15:4]=q6[11:0];

endmodule

module vedic_4x4(a,b,c);

input [3:0]a;

input [3:0]b;

output [7:0]c;

wire [3:0]q0;

wire [3:0]q1;

wire [3:0]q2;

wire [3:0]q3;

wire [3:0]temp1;

wire [5:0]temp2;

wire [5:0]temp3;

wire [5:0]temp4;

wire [3:0]q4;

wire [5:0]q5;

wire [5:0]q6;

42
// using 4 2x2 multipliers

vedic_2x2 z1(a[1:0],b[1:0],q0[3:0]);

vedic_2x2 z2(a[3:2],b[1:0],q1[3:0]);

vedic_2x2 z3(a[1:0],b[3:2],q2[3:0]);

vedic_2x2 z4(a[3:2],b[3:2],q3[3:0]);

// stage 1 adders

assign temp1 ={2'b0,q0[3:2]};

add_4_bit z5(q4,q1[3:0],temp1);

assign temp2 ={2'b0,q2[3:0]};

assign temp3 ={q3[3:0],2'b0};

add_6_bit z6(q5,temp2,temp3);

assign temp4={2'b0,q4[3:0]};

// stage 2 adder

add_6_bit z7(q6,temp4,q5);

// fnal output assignment

assign c[1:0]=q0[1:0];

assign c[7:2]=q6[5:0];

endmodule

module vedic_2x2(a,b,c);

input [1:0]a;

input [1:0]b;

output [3:0]c;

wire [3:0]temp;

//stage 1

// four multiplication operation of bits accourding to vedic logic done using and gates

assign c[0]=a[0]&b[0];

43
assign temp[0]=a[1]&b[0];

assign temp[1]=a[0]&b[1];

assign temp[2]=a[1]&b[1];

//stage two

// using two half adders

half_adder z1(temp[0],temp[1],c[1],temp[3]);

half_adder z2(temp[2],temp[3],c[2],c[3]);

endmodule

module dff_pipeline( input clk , input rst , input [31:0]din , output reg [31:0]dout

);

always@(din or rst) begin

if (rst) begin

dout <= 32'd0;

end else begin

dout <= din;

end

end

endmodule

module full_adder(x,y,c_in,s,c_out);

input x,y,c_in;

output s,c_out;

wire w;

assign s = x^y^c_in;

assign c_out = (y&c_in)| (x&y) | (x&c_in);

endmodule

module half_adder(x,y,s,c);

44
input x,y;

output s,c;

assign s=x^y;

assign c=x&y;

endmodule

module add_4_bit(answer,input1,input2);

parameter N=4;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_6_bit(answer,input1,input2);

parameter N=6;

input [N-1:0] input1,input2;

45
output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_8_bit(answer,input1,input2);

parameter N=8;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

46
half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_12_bit(answer,input1,input2);

parameter N=12;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_16_bit(answer,input1,input2);

47
parameter N=16;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_24_bit(answer,input1,input2);

parameter N=24;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

48
begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_32_bit(answer,input1,input2);

parameter N=32;

input [N-1:0]input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

49
endmodule

module add_48_bit(answer,input1,input2);

parameter N=48;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

module add_64bit(answer,input1,input2);

parameter N=64;

input [N-1:0] input1,input2;

output [N-1:0] answer;

wire carry_out;

wire [N-1:0] carry;

50
genvar i;

generate

for(i=0;i<N;i=i+1)

begin: generate_N_bit_Adder

if(i==0)

half_adder f(input1[0],input2[0],answer[0],carry[0]);

else

full_adder f(input1[i],input2[i],carry[i-1],answer[i],carry[i]);

end

assign carry_out = carry[N-1];

endgenerate

endmodule

`timescale 1ns / 1ps

////////////////////////////////////////////////////////////////////////////////

// Company:

// Engineer:

//

// Create Date: 14:41:16 08/11/2023

// Design Name: FIR_FILTER

// Module Name: C:/teja12/Projects/fir_filter_xor_mux/Fir_filter/tb.v

// Project Name: Fir_filter

// Target Device:

// Tool versions:

// Description:

//

// Verilog Test Fixture created by ISE for module: FIR_FILTER

51
//

// Dependencies:

//

// Revision:

// Revision 0.01 - File Created

// Additional Comments:

//

////////////////////////////////////////////////////////////////////////////////

module tb;

// Inputs

reg clk;

reg rst;

reg [31:0] X;

// Outputs

wire [63:0] Y1,Y2,Y3;

// Instantiate the Unit Under Test (UUT)

FIR_FILTER uut (

.clk(clk),

.rst(rst),

.X(X),

.Y1(Y1),

.Y2(Y2),

.Y3(Y3)

);

initial begin

// Initialize Inputs

52
clk = 0;

forever #5 clk=~clk;

end

initial begin

rst = 1;

#10;

rst=0;

end

initial begin

#5

X = 50;

#100;

X = 60;

#100;

X = 70;

#100;

X = 80;

#100;

X = 90;

#1000;

$finish;

// Wait 100 ns for global reset to finish

// Add stimulus here

end

endmodule

53
CHAPTER 7

IMPLEMENTATION

7.1 PROPOSED IMPLEMENTATION

RTL Schematic Diagrams of 3-Parallel FIR-Filter using Vedic Multiplier.

The below figure-1, is the RTL Schematic diagram of 3-Parallel FIR-Filter using Vedic
Multiplier Here clk, rst, X are the inputs of the FIR Filter and Y1,Y2,Y3 is the outputs.

FIG: 7.1 Schematic Diagrams 64-Bit Vedic Multiplier.

54
FIG: 7.2 RTL Internal diagram of 3-Parallel FIR-Filter using Vedic Multiplier.

FIG: 7.3 RTL Internal diagram of Vedic 32x32 Multiplier.

55
FIG: 7.4 RTL Internal diagram of Vedic 16x16 Multiplier.

FIG: 7.5 RTL Internal diagram of Vedic 8x8 Multiplier.

56
FIG: 7.6 RTL Internal diagram of Vedic 2x2 Multiplier

57
CHAPTER 8

RESULTS

8.1 SIMULATION RESULT OF 3-Parallel FIR-Filter using Vedic


Multiplier.

The design procedure for the multipliers consists of obtaining the input coefficients. For
the Booth multiplier the coefficients are in signed representation whereas for the Vedic
multiplier form. The program for the implementation is written in Verilog-HDL and
simulated using Xilinx 14.7 Simulator.

FIG: 8.1 Simulation result of 3-Parallel FIR-Filter using Vedic Multiplier.

The above diagram-7, is the simulation result of final output here we have
given input X=345 then output Y1=3261,Y2=3262,Y3=3254 ,this output is
verified according to inputs in this way FIR Filter using Vedic multiplier
multiplies 32-bit numbers. The inputs we can give in the binary form also
with selecting binary format then we have to select output as binary form.
58
8.2 PARAMETERS

8.2.1 AREA

Area consumed for proposed 3-Parallel FIR Filter with Vedic


Multiplier for 32 bits.

8.2.2 I/O BONDED

8.2.3 DELAY

8.2.4 POWER CONSUMPTION

Power consumption for proposed 3-Parallel FIR Filter with Vedic Multiplier for 32 bits.

59
8.3 APPLICATIONS

 Biomedical Engineering
 Speech processing and recognition
 Digital communications
 Signal Processing

 Audio and Video Processing

 Telecommunications

 Radar Systems

 Medical Imaging

 Image Filtering and Enhancement

 Digital Communication Systems & Control Systems

 Embedded Systems

 Data Compression

8.4 ADVANTAGES

 Low power
 High speed
 Digital Implementation
 Ease of Implementation

60
CHAPTER 9

CONCLUSION & FUTURE SCOPE

9.1 CONCLUSION

Conventionally the FIR filters which have huge application in Digital Signal Processing
were developed using traditional DSP algorithms. With the advancement in the technology,
the FIR filters are being developed using VLSI technology. This leads to the extensive
decrease in the area occupied on chip and power consumed by the filter. The FIR filter
consists of three blocks: the multiplier, adder and the delay block. Out of all three, the
multiplier is the slowest of all. The research work presented in this paper has achieved
adequate results and has demonstrated the efficiency of high level optimization techniques.
In this work, the FIR filter has been designed using Vedic multiplier. From this work, it is
concluded that chip area of FIR filter designed using Vedic multiplier is significantly
reduced and that too without increasing any power dissipation, thereby making the system
faster.

9.2 FUTURE SCOPE

The future scope of high-performance VLSI implementation of 3 parallel FIR filters with
Vedic multipliers is promising, driven by increasing demands in digital signal processing
across telecommunications, multimedia, and real-time applications. Continued
advancements in Vedic multiplication techniques will enhance speed and efficiency, while
the integration of machine learning can lead to adaptive filtering solutions. Emphasizing
low-power designs will be crucial for portable devices, alongside scalable and flexible
architectures that allow for dynamic adjustments based on application needs. Hardware
acceleration using GPUs and DSPs will further improve processing capabilities, enabling
higher data throughput for technologies like 5G.

61
9.3 REFERENCE

[1] K. K. Parhi,VLSI Digital Signal Processing System : Design and Implementation


(Wiley, New York, 1999) .
[2] L. K. Phimu and M. Kumar, “Design and implementation of area efficient 2-parallel
filters on FPGA using image system” in proc. ICECDS, 2017.
[3] Z.-J. Mou, and P. Duhamel, “Short-length FIR filters and their use in fast nonrecursive
filtering”IEEE Trans. Signal Process., vol. 39, no. 6, 1991.
[4] D. A. Parker, and K. K. Parhi, “Low-area/power parallel FIR digital filter
implementation,” J. VLSI Signal Process. Syst, vol. 17, 1997.
[5] J. Selvakumar and Vidhyacharan Bhaskar, “Efficient complexity reduction technique
for parallel FIR digital Filter based on Fast FIR algorithm,”International Journal of
Computer Applications ,Vol. 55, 2012.
[6] Y.C. Tsao, and K. Choi„“Hardware-efficient VLSI implementation for 3-parallel linear-
phase FIR digital filter of odd length ” in proc. IEEE ISCAS, 2012.
[7] Y.C. Tsao, and K. Choi„“Hardware-efficient VLSI implementation for 3-parallel linear-
phase FIR digital filter of odd length ” in proc. IEEE ISCAS, 2012.
[8] Q. Tian, Y. Wang, G. Liu, X. Liu, J. Diao, and Hui Xu “Hardware-efficient parallel FIR
filter structure based on modified Cook-Toom algorithm ”in proc. IEEE APCCAS, 2018.
[9] K Anjali Rao, Abhishek Kumar, Neetesh Purohit, “Efficient implementation for 3-
parallel linear- phase FIR digital odd length filters ” in proc. IEEE CICT, 2020 .
[10] A.Kumar, S. Yadav and N. Purohit, “Exploiting coefficient symmetry in conventional
polyphase FIR filters,” IEEE Access, vol.7, 2019.
[11] S.Y. Park, and Pramod K. Meher, “Efficient FPGA and ASIC realizations of DA-
based reconfigurable FIR digital filter,” IEEE Trans. Circuit and Syst.II, vol.61, no. 7,
2014.

62

You might also like