0% found this document useful (0 votes)
107 views64 pages

Page No: List of Figures List of Tables

Uploaded by

Yatheesh Kaggere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views64 pages

Page No: List of Figures List of Tables

Uploaded by

Yatheesh Kaggere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

TABLE OF CONTENTS

CONTENTS PAGE NO

LIST OF FIGURES i-ii

LIST OF TABLES iii

ABSTRACT iv

CHAPTER 1

1.INTRODUCTION 1-3

CHAPTER 2

2.LITERATURE SURVEY 4-7

CHAPTER 3

3.PROJECT DISCRIPTION 8-12

3.1 PROPOSED SYSTEM 8-12

CHAPTER 4

4.TOOLS REQUIRED 13

4.1 INTRODUCTION TO VLSI 13-29

4.1.1 HISTORICAL PERSPECTIVE 13-16

4.1.2 VLSI DESIGN FLOW 16-18

4.1.3 DESIGN HIERARCHY 18-19

4.1.4VLSI DESIGN STYLES 19-29

4.2 INTRODUCTION TO XILINX 29-34

4.2.1 MIGRATING PROJECTS FROM PREVIOUS ISE SOFTWARE 29

4.2.2 TO MIGRATE A PROJECT 29-30

4.2.3 IP MODULES 30

4.2.4 OBSOLETE SOURCE FILE TYPES 30


4.2.5 USING ISE EXAMPLE PROJECTS 30-31

4.2.6 CREATING A PROJECT 31-32

4.2.7 DESIGN PANEL 32

4.2.8 CREATING A COPY OF A PROJECT 32

4.2.9 USING THE PROJECT BROWSER 33

4.2.10. EXCLUDE GENERATED FILES FROM THE COPY 34

4.2.11 CREATING A PROJECT ARCHIVE 34

4.2.12 ARCHIVE A PROJECT 34

4.3 INTRODUCTION TO VERILOG 35

4.3.1 OVERVIEW 35-36

4.3.2 HISTORY 36

4.3.2 (A) BEGINNING 36

4.3.2 (B) VERILOG-95 36

4.3.2 (C) VERILOG 2001 36-37

4.3.2 (D) VERILOG 2005 37

4.3.2 (E) SYSTEMVERILOG 37-38

4.3.2 (F) EXAMPLES 38-41

4.3.3 CONSTANTS 41

4.3.4 SYNTHESIZABLE CONSTRUCTS 41-43

4.3.5 INITIAL VS ALWAYS 43-44

4.3.6 RACE CONDITION 45

4.3.7 OPERATORS 45-47

4.3.8 SYSTEM TASKS 47


CHAPTER 5

SIMULATION RESULTS 48-49

CHAPTER 6

CONCLUSION 50

REFERENCES 51-52

SAMPLE CODE 53-56


LIST OF FIGURES

FIG 3.1 SHIFT REGISTER 9

FIG 3.2 SERIAL-IN TO PARALLEL-OUT SHIFT REGISTER 9

FIG 3.3 SERIAL-IN TO SERIAL-OUT SHIFT REGISTER 10

FIG 3.4 4-BIT PARALLEL-IN TO SERIAL-OUT SHIFT REGISTER 11

FIG 3.5 4-BIT PARALLEL-IN TO PARALLEL-OUT SHIFT REGISTER 12

FIGURE 4.1: OVERVIEW OF THE PROMINENT TRENDS IN 13

INFORMATION TECHNOLOGIES

FIGURE-4.2: EVOLUTION OF INTEGRATION DENSITY AND MIN 15

FEATURE SIZE, AS SEEN IN THE EARLY 1980S

FIGURE-4.3: LEVEL OF INTEGRATION OVER TIME, FOR MEMORY CHIPS 15

AND LOGIC CHIPS

FIGURE-4.4: TYPICAL VLSI DESIGN FLOW IN THREE DOMAINS (Y- 16

CHART REPRESENTATION)

FIGURE-4.5: A MORE SIMPLIFIED VIEW OF VLSI DESIGN FLOW 17

FIGURE-4.6: STRUCTURAL DECOMPOSITION OF A FOUR-BIT ADDER 18

CIRCUIT, SHOWING THE HIERARCHY DOWN TO GATE LEVEL

FIGURE-4.7: REGULAR DESIGN OF A 2-1 MUX, A DFF AND AN ADDER, 19

USING INVERTERS AND TRI-STATE BUFFERS

FIGURE-4.8: GENERAL ARCHITECTURE OF XILINX FPGAS 20

FIGURE-4.9: DETAILED VIEW OF SWITCH MATRICES AND 20

INTERCONNECTION ROUTING BETWEEN CLBS

i
FIGURE-4.10: XC2000 CLB OF THE XILINX FPGA 21

FIGURE-4.11: BASIC PROCESSING STEPS REQUIRED FOR GATE ARRAY 22

IMPLEMENTATION

FIGURE-4.12: A CORNER OF A TYPICAL GATE ARRAY CHIP 22

FIGURE-4.13: METAL MASK DESIGN TO REALIZE A COMPLEX LOGIC 23

FUNCTION ON A CHANNELED GA PLATFORM

FIGURE-4.14: LAYOUT VIEWS OF A CONVENTIONAL GA CHIP AND A 24

GATE ARRAY WITH TWO MEMORY BANKS

FIGURE-4.15: THE PLATFORM OF A SEA-OF-GATES (SOG) CHIP 24

FIGURE-4.16: COMPARISON BETWEEN THE CHANNELED (GA) VS. THE 24

CHANNEL LESS (SOG) APPROACHES

FIGURE-4.17: A STANDARD CELL LAYOUT EXAMPLE 25

FIGURE-4.18: A SIMPLIFIED FLOOR PLAN OF STANDARD-CELLS-BASED 26

DESIGN

FIGURE-4.19: SIMPLIFIED FLOOR PLAN CONSISTING OF TWO 27

SEPARATE BLOCKS AND A COMMON SIGNAL BUS

FIGURE-4.20: MASK LAYOUT OF A STANDARD-CELL-BASED CHIP 28

WITH A SINGLE BLOCK OF CELLS AND THREE MEMORY BANKS

FIGURE-4.21: OVERVIEW OF VLSI DESIGN STYLES 29

ii
LIST OF TABLES

TABLE-1.1: EVOLUTION OF LOGIC COMPLEXITY IN INTEGRATED CIRCUITS 14

TABLE 5: LIST OF OPERATORS 45-47

iii
ABSTRACT
In this paper, four-bit unsigned up counter with an asynchronous clear and a clock enable is
designed in Xilinx ISE 14.2 and implemented on high performance Virtex-6 FPGA,
XC6VLX240T device, -1 speed grade, FFG1156 package and ML605 board. User
constraints file (ucf) and net list constraints design (ncd) file are taken into consideration
with X Power 14.2 for power consumption analysis. We take two codes. Our first code maps
the clock enable signal to LUTs then the power consumption is 3.423 Watt. Our second code
maps the clock enable signal to control ports then the power consumption is 3.625 Watt. By
changing mapping style, we reduce 6% power reduction and also reduce number of LUT and
D flip-flop used in implementation leads to area efficient design. By efficiently mapping, we
reduce power consumption in multiple of power reduction with single statements. The
experimental result shows the power analysis of both HDL mapping code.

iv
Low Power VLSI Circuit Design with Efficient HDL Coding

INTRODUCTION
This paper proposes a low-power and area-efficient shift register using pulsed
latches. The area and power consumption are reduced by replacing flip-flops with
pulsed latches. This method solves the timing problem between pulsed latches
through the use of multiple non-overlap delayed pulsed clock signals instead of the
conventional single pulsed clock signal. The shift register uses a small number of the
pulsed clock signals by grouping the latches to several sub shifter registers and using
additional temporary storage latches. A 256-bit shift register using pulsed latches was
fabricated using a 0.18 um CMOS process with vdd=1.8v. The core area is 6600um2.
The power consumption is 1.2mW at a 100 MHz clock frequency. The proposed shift
register saves 37% area and 44% power compared to the conventional shift register
with flip-flops.

In VLSI design power consumption has become a very important issue.


Sequential logic circuits, such as registers, memory elements, counters etc., are
heavily used in the implementation of Very Large Scale Integrated (VLSI) circuits.
Power dissipation is critical for battery-operated systems, such as laptops, calculators,
cell phones and MP3 players since it determines the battery life. Therefore, designs
are needed that can consume less power while maintaining comparable performance.
Flip-flop is a data storage element. The operation of the flip-flops is done by its clock
frequency. When multistage Flip-Flop is operated with respect to clock frequency, it
processes with high clock switching activity and then increases time latency. The
timing elements and clock interconnection Networks such as flip-flops and latches, is
One of the most power consuming components in modern very large Scale
integration (VLSI) system. The area, power and transistor count will compared and
designed using several latches and flip flop stages. This thesis explored using pulsed
latches for timing optimization purposes. Flip-flops are the basic storage elements
used extensively in all kinds of digital designs. Flip Flop is a circuit which is used to
store state information. Power consumption is one of the main objectives in designing
a flip flop.

Department of ECE, MRCE Page 1


Low Power VLSI Circuit Design with Efficient HDL Coding

A Shift register is the basic building block in a VLSI circuit. Shift registers are
commonly used in many applications, such as digital filters [1], communication
receivers [2], and image processing ICs [3]–[5]. Recently, as the size of the image
data continues to increase due to the high demand for high quality image data, the
word length of the shifter register increases to process large image data in image
processing ICs. An image-extraction and vector generation VLSI chip uses a 4K-bit
shift register [3]. A 10-bit 208 channel output LCD column driver IC uses a 2K-bit
shift register [4]. A 16-megapixel CMOS image sensor uses a 45K-bit shift register
[5]. As the word length of the shifter register increases, the area and power
consumption of the shift register become important design considerations.

The architecture of a shift register is quite simple. An N-bit shift register is


composed of series connected N data flip-flops. The speed of the flip-flop is less
important than the area and power consumption because there is no circuit between
flip-flips in the shift register. The smallest flip-flop is suitable for the shift register to
reduce the area and power consumption. Recently, pulsed latches have replaced flip-
flops in many applications, because a pulsed latch is much smaller than a flip-flop
[6]–[9]. But the pulsed latch cannot be used in a shift register due to the timing
problem between pulsed latches. This paper proposes a low-power and area-efficient
shift register using pulsed latches. The shift register solves the timing problem using
multiple non-overlap delayed pulsed clock signals instead of the conventional single
pulsed clock signal. The shift register uses a small number of the pulsed clock signals
by grouping the latches to several sub shifter registers and using additional temporary
storage latches. The rest of the paper is organized as follows: Section II describes the
architecture of the proposed shift register. Section III presents the measurement
results of the fabricated chip.

Flip flops are the basic storage elements used extensively in all kinds of digital
designs. The current trends will eventually mandate low power design automation on
a very large scale to match the trends of power consumption of today’s and future
integrated chips. Power consumption of Very Large Scale Integrated (VLSI) design is
given by generalized relation, P = CV2f .Since power is proportional to the square of
the voltage as per the relation; voltage scaling is the most prominent way to reduce
power dissipation. the pulsed latch consumes less power than the flip flop.

Department of ECE, MRCE Page 2


Low Power VLSI Circuit Design with Efficient HDL Coding

A master-slave flip-flop using two latches can be replaced by a pulsed latch


consisting of a latch and a pulsed clock signal. All pulsed latches share the pulse
generation circuit for the pulsed clock signal. As a result, the area and power
consumption of the pulsed latch become almost half of those of the master-slave flip-
flop. The pulsed latch is an attractive solution for small area and low power
consumption. The pulsed latch cannot be used in shift registers due to the timing
problem. The shift register consists of several latches and a pulsed clock signal
(CLK_pulse). The operation waveforms show the timing problem in the shifter
register. The output signal of the first latch (Q1) changes correctly because the input
signal of the first latch (IN) is constant during the clock pulse width. But the second
latch has an uncertain output signal (Q2) because its input signal (Q1) changes during
the clock pulse width. One solution for the timing problem is to add delay circuits
between latches, as shown in Fig. 3(a). The output signal of the latch is delayed and
reaches the next latch after the clock pulse. As shown in Fig. 3(b) the output signals of
the first and second latches (Q1 and Q2) change during the clock pulse width , but the
input signals of the second and third latches (D2 and D3) become the same as the
output signals of the first and second latches (Q1 and Q2) after the clock pulse. As a
result, all latches have constant input signals during the clock.

Department of ECE, MRCE Page 3


Low Power VLSI Circuit Design with Efficient HDL Coding

LITERATURE SURVEY

Single event effects (SEEs) caused by radiation are a major concern when
working with circuits that need to operate in certain environments, like for example in
space applications. In this paper, new techniques for the implementation of moving
average filters that provide protection against SEEs are presented, which have a lower
circuit complexity and cost than traditional techniques like triple modular redundancy
(TMR). The effectiveness of these techniques has been evaluated using a software
fault injection platform and the circuits have been synthesized for a commercial
library in order to assess their complexity. The main idea behind the presented
approach is to exploit the structure of moving average filter implementations to deal
with SEEs at a higher level of abstraction. Gigabit Ethernet on Category-5 cable is the
next generation high-speed Ethernet LAN for twisted pair copper medium with a
minimum required reach of 100 meters. This paper presents a brief overview of the
transmission scheme agreed upon by the IEEE 802.3ab task force for 1Gb/s full-
duplex operation over 4 pairs of category-5 cable. Some system level simulation
results are presented followed by a discussion of the type of digital and analog circuits
required for a single chip mixed-signal CMOS implementation of the transceiver. For
reliable operation under worst case cabling conditions, the DSP portion of the
transceiver has to perform over 150 Giga operations per second. A feature-extraction
and vector-generation VLSI has been developed for real-time image recognition.

An arrayed-shift-register architecture has been employed in conjunction with


a pipelined directional-edge-filtering circuitry. As a result, it has become possible to
scan an image, pixel by pixel, with a 64 x 64-pixel recognition window and generate a
64-dimensional feature vector in every 64 clock cycles. In order to determine the
threshold for edge-filtering operation adaptive to local luminance variation, a high-
speed median circuit has been developed. A binary median search algorithm has been
implemented using high-precision majority voting circuits working in the mixed-
signal principle. A prototype chip was designed and fabricated in a 0.18-mum 5-metal
CMOS technology. A

high-speed feature vector generation in less than 9.7 ns/vector element has been
experimentally demonstrated. It is possible to scan a VGA-size image at a rate of 6.1
frames/s, thus generating as many as 1.5 x 106 feature vectors per second for

Department of ECE, MRCE Page 4


Low Power VLSI Circuit Design with Efficient HDL Coding

recognition. This is more than 103 times faster than software processing running on a
3-GHz general-purpose processor.

This paper presents a 10-bit column driver IC for active-matrix LCDs, with a
proposed iterative charge-sharing based (ICSB) capacitor-string that interpolates two
output voltages from a resistor-string DAC. Iterative mode change between a
capacitive voltage division mode and a charge sharing mode in the ICSB capacitor-
string interpolation suppresses the effect of mismatches between capacitors and that
of parasitic capacitances; thus, a highly linear capacitor sub-DAC is realized. In
addition, the area-sharing layout technique, which stacks the interpolation capacitor-
string on top of the R-DAC area, reduces the driver channel size and extends the bit
resolution of the gamma-corrected nonlinear main R-DAC. Consequently, the
proposed ICSB capacitor-string interpolation scheme provides highly uniform channel
performance by passively dividing the coarse voltages from the global resistor-string
DAC with high area efficiency, and more effective bit resolution for nonlinear gamma
correction.

The prototype column driver IC was implemented using a 0.11-μm CMOS


process. The area occupation of the DAC and buffer amplifier per channel is only 188
× 15 μm2, and the static power consumption is 0.9μA/channel with no additional static
power dissipation for the interpolation. The measured maximum DNL and INL are
0.25 LSB and 0.43 LSB, respectively. The measured maximum inter-channel DVO is
5.6 mV. The proposed chip achieves state-of-the-art performance in terms of chip size
and channel-to-channel uniformity. The design and scaling of a 21 mm × 21 mm
CMOS image sensor for charged-particle imaging, ¿EM7,¿ is presented and
compared to its smaller prototype, EM5. The sensor contains ~50 million transistors
spanning its 16 million pixels, and includes over 4,100 parallel analog processing and
A/D conversion circuits, utilizing 12 parallel 10-bit readout busses for high data
throughput. The clock distribution design in EM7 minimizes the clock delay by
dividing the chip into multiple parallel sections, each driven locally by a tree-like
clock structure. By this technique, simulations showed that the readout shift-register
clock delay is reduced from 4.7 ns to 0.14 ns, and the row shift-register clock delay is
reduced from 1.7 ns to 0.12 ns. With similar local buffering, the ADC gray code
counter delay is reduced from 35 ns to 0.9 ns. These improvements allow EM7 to
sustain image acquisition at 75 frames/s, for a continuous data throughput of over

Department of ECE, MRCE Page 5


Low Power VLSI Circuit Design with Efficient HDL Coding

10Gb/s. The large chip dimensions and the increased power consumption in EM7 also
require more robust power distribution. A matrix-math simulation shows the worst-
case pixel IR voltage drop was improved from 20 mV to 8 mV. Similarly, the pixel's
worst-case analog output's IR drop is reduced from 80.7 mV to 2.58 mV, and its
bandwidth is thus increased from 6.92MHz to 14.4MHz. The power supply IR drop in
the output processing stage's op-amps is reduced from 327 mV to 35 mV, their open-
loop gain variation is reduced from 525% to 28%, and their worst-case bandwidth is
increased from 0.87 MHz to 764MHz.

This paper presents new techniques to evaluate the energy and delay of flip-
flop and latch designs and shows that no single existing design performs well across
the wide range of operating regimes present in complex systems. We propose the use
of a selection of flip-flop and latch designs, each tuned for different activation
patterns and speed requirements. We illustrate our technique on a pipelined MIPS
processor datapath running SPECint95 benchmarks, where we reduce total flip-flop
and latch energy by over 60% without increasing cycle time.

Flip-flops (FFs) are key building blocks in the design of high-speed energy-
efficient microprocessors, as their data-to-output delay (D-Q) and power dissipation
strongly affect the processor's clock period and overall power. From previous
analyses, the Transmission-Gate Pulsed Latch (TGPL) proved to be the most energy-
efficient FF in a large portion of the design space, ranging from high speed
(minimizing ED' products with j>;1) to minimum ED product designs, while simple
Master-Slave FFs (TGFF and ACFF ) are the most energy-efficient in the low-power
E-D space region.

TGPL also has the lowest D Q delay along with STFF. However, the latter has
considerably worse energy efficiency, hence, the TGPL is the best reference for a
comparison. In this work, two new FFs are introduced, the Conditional Push-Pull
Pulsed Latch (CP3L), and a version with a Shareable (CSP3L) Pulse Generator (PG).

Department of ECE, MRCE Page 6


Low Power VLSI Circuit Design with Efficient HDL Coding

The adoption of a fast push-pull second stage, which requires a conditional


PG, enables 50-to-100% delay improvements compared to TGPL, and absolute D-Q
up to 0.7FO4. CP3L and CSP3L also exhibit superior energy efficiency to TGPL in
terms of minimum ED3 and ED products. A test chip is fabricated in 65nm CMOS
technology (VDD=1V) to measure delay and energy consumption of CP3L, CSP3L
and TGPL in minimum ED and ED3 sizing. Different loadings are used in the mini
mum ED (16χ) and the minimum ED3 (64χ) cases.

In this paper, we propose a set of rules for consistent estimation of the real
performance and power features of the flip-flop and master-slave latch structures.

A new simulation and optimization approach is presented, targeting both high-


performance and power budget issues. The analysis approach reveals the sources of
performance and power-consumption bottlenecks in different design styles. Certain
misleading parameters have been properly modified and weighted to reflect the real
properties of the compared structures. Furthermore, the results of the comparison of
representative master-slave latches and flip-flops illustrate the advantages of our
approach and the suitability of different design styles for high-performance and low-
power applications.

Department of ECE, MRCE Page 7


Low Power VLSI Circuit Design with Efficient HDL Coding

PROJECT DISCRIPTION

3.1 PROPOSED SYSTEM:

Shift Registers the Shift Register is another type of sequential logic circuit
that is used for the storage or transfer of data in the form of binary numbers and then
"shifts" the data out once every clock cycle, hence the name shift register. It basically
consists of several single bit "D-Type Data Latches", one for each bit (0 or 1)
connected together in a serial or daisy-chain arrangement so that the output from one
data latch becomes the input of the next latch and so on. The data bits may be fed in
or out of the register serially, i.e. one after the other from either the left or the right
direction, or in parallel, i.e. all together. The number of individual data latches
required to make up a single Shift Register is determined by the number of bits to be
stored with the most common being 8-bits wide, i.e. eight individual data latches.
Shift Registers are used for data storage or data movement and are used in calculators
or computers to store data such as two binary numbers before they are added together,
or to convert the data from either a serial to parallel or parallel to serial format. The
individual data latches that make up a single shift register are all driven by a common
clock (Clk) signal making them synchronous devices. Shift register IC's are generally
provided with a clear or reset connection so that they can be "SET" or "RESET" as
required. Generally, shift registers operate in one of four different modes with the
basic movement of data through a shift register being: • Serial-in to Parallel-out
(SIPO) - The register is loaded with serial data, one bit at a time, with the stored data
being available in parallel form

. • Serial-in to Serial-out (SISO) - The data is shifted serially "IN" and "OUT" of the
register, one bit at a time in either a left or right direction under clock control.

• Parallel-in to Serial-out (PISO) - The parallel data is loaded into the register
simultaneously and is shifted out of the register serially one bit at a time under clock
control.

• Parallel-in to Parallel-out (PIPO) - The parallel data is loaded simultaneously into


the register, and transferred together to their respective outputs by the same clock
pulse. The effect of data movement from left to right through a shift register can be
presented graphically as:

Department of ECE, MRCE Page 8


Low Power VLSI Circuit Design with Efficient HDL Coding

Fig 3.1 shift register

Also, the directional movement of the data through a shift register can be
either to the left, (left shifting) to the right, (right shifting) left-in but right-out,
(rotation) or both left and right shifting within the same register thereby making it
bidirectional.

Serial-in to Parallel-out (SIPO)

Fig 3.2 Serial-in to Parallel-out shift register

The operation is as follows. Let us assume that all the flip-flops (FFA to FFD)
have just been RESET (CLEAR input) and that all the outputs QA to QD are at logic
level "0" i.e., no parallel data output. If a logic "1" is connected to the DATA input
pin of FFA then on the first clock pulse the output of FFA and therefore the resulting
QA will be set HIGH to logic "1" with all the other outputs still remaining LOW at
logic "0". Assume now that the DATA input pin of FFA has returned LOW again to

Department of ECE, MRCE Page 9


Low Power VLSI Circuit Design with Efficient HDL Coding

logic "0" giving us one data pulse or 0-1-0. The second clock pulse will change the
output of FFA to logic "0" and the output of FFB and QB HIGH to logic "1" as its
input D has the logic "1" level on it from QA. The logic "1" has now moved or been
"shifted" one place along the register to the right as it is now at QA. When the third
clock pulse arrives this logic "1" value moves to the output of FFC (QC) and so on
until the arrival of the fifth clock pulse which sets all the outputs QA to QD back
again to logic level "0" because the input to FFA has remained constant at logic level
"0". The effect of each clock pulse is to shift the data contents of each stage one place
to the right, and this is shown in the following table until the complete data value of 0-
0-0-1 is stored in the register. This data value can now be read directly from the
outputs of QA to QD. Then the data has been converted from a serial data input signal
to a parallel data output. The truth table and following waveforms show the
propagation of the logic "1" through the register from left to right as follows. Basic
Movement of Data through a Shift Register Clock Pulse No QA QB QC QD 0 0 0 0 0
1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 5 0 0 0 0 Note that after the fourth clock pulse
has ended the 4-bits of data (0-0-0-1) are stored in the register and will remain there
provided clocking of the register has stopped. In practice the input data to the register
may consist of various combinations of logic "1" and "0". Commonly available SIPO
IC's include the standard 8-bit 74LS164 or the 74LS594.

Serial-in to Serial-out (SISO)

Fig 3.3 Serial-in to Serial-out shift register

This shift register is very similar to the SIPO above, except were before the
data was read directly in a parallel form from the outputs QA to QD, this time the data
is allowed to flow straight through the register and out of the other end. Since there is
only one output, the DATA leaves the shift register one bit at a time in a serial
pattern, hence the name Serial-in to Serial-Out Shift Register or SISO.

Department of ECE, MRCE Page 10


Low Power VLSI Circuit Design with Efficient HDL Coding

The SISO shift register is one of the simplest of the four configurations as it
has only three connections, the serial input (SI) which determines what enters the left
hand flip-flop, the serial output (SO) which is taken from the output of the right hand
flip-flop and the sequencing clock signal (Clk). The logic circuit diagram below
shows a generalized serial-in serial-out shift register. 4-bit Serial-in to Serial-out Shift
Register You may think what the point of a SISO shift register is if the output data is
exactly the same as the input data. Well this type of Shift Register also acts as a
temporary storage device or as a time delay device for the data, with the amount of
time delay being controlled by the number of stages in the register, 4, 8, 16 etc or by
varying the application of the clock pulses.

Commonly available IC's include the 74HC595 8-bit Serial-in/Serial-out Shift


Register all with 3-state outputs.

Parallel-in to Serial-out (PISO)

The Parallel-in to Serial-out shift register acts in the opposite way to the serial-
in to parallel-out one above. The data is loaded into the register in a parallel format
i.e. all the data bits enter their inputs simultaneously, to the parallel input pins PA to
PD of the register. The data is then read out sequentially in the normal shift-right
mode from the register at Q representing the data present at PA to PD. This data is
outputted one bit at a time on each clock cycle in a serial format. It is important to
note that with this system a clock pulse is not required to parallel load the register as it
is already present, but four clock pulses are required to unload the data.

Fig 3.4 4-bit Parallel-in to Serial-out Shift Register

Department of ECE, MRCE Page 11


Low Power VLSI Circuit Design with Efficient HDL Coding

As this type of shift register converts parallel data, such as an 8-bit data word
into serial format, it can be used to multiplex many different input lines into a single
serial DATA stream which can be sent directly to a computer or transmitted over a
communications line. Commonly available IC's include the 74HC166 8-bit Parallel-
in/Serial-out Shift Registers.

Parallel-in to Parallel-out (PIPO)

The final mode of operation is the Parallel-in to Parallel-out Shift Register.


This type of register also acts as a temporary storage device or as a time delay device
similar to the SISO configuration above. The data is presented in a parallel format to
the parallel input pins PA to PD and then transferred together directly to their
respective output pins QA to QA by the same clock pulse. Then one clock pulse loads
and unloads the register. This arrangement for parallel loading and unloading is
shown below. 4-bit Parallel-in to Parallel-out Shift Register

Fig 3.5 4-bit Parallel-in to Parallel-out Shift Register

The PIPO shift register is the simplest of the four configurations as it has only
three connections, the parallel input (PI) which determines what enters the flip-flop,
the parallel output (PO) and the sequencing clock signal (Clk). Similar to the Serial-in
to Serial-out shift register, this type of register also acts as a temporary storage device
or as a time delay device, with the amount of time delay being varied by the
frequency of the clock pulses. Also, in this type of register there are no

Department of ECE, MRCE Page 12


Low Power VLSI Circuit Design with Efficient HDL Coding

interconnections between the individual flip-flops since no serial shifting of the data is
required

TOOLS REQUIRED

4.1 Introduction to VLSI:

4.1.1 Historical Perspective:

The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and
consumer electronics has been rising steadily, and at a very fast pace. Typically, the
required computational power (or, in other words, the intelligence) of these
applications is the driving force for the fast development of this field. Figure 4.1 gives
an overview of the prominent trends in information technologies over the next few
decades. The current leading-edge technologies (such as low bit-rate video and
cellular communications) already provide the end-users a certain amount of
processing power and portability.

Figure 4.1: Overview of the prominent trends in information technologies.

This trend is expected to continue, with very important implications on VLSI


and systems design. One of the most important characteristics of information services
is their increasing need for very high processing power and bandwidth (in order to
handle real-time video, for example). The other important characteristic is that the

Department of ECE, MRCE Page 13


Low Power VLSI Circuit Design with Efficient HDL Coding

information services tend to become more and more personalized (as opposed to
collective services such as broadcasting), which means that the devices must be more
intelligent to answer individual demands, and at the same time they must be portable
to allow more flexibility/mobility Table 1.1 shows the evolution of logic complexity
in integrated circuits over the last three decades, and marks the milestones of each era.
Here, the numbers for circuit complexity should be interpreted only as representative
examples to show the order-of-magnitude. A logic block can contain anywhere from
10 to 100 transistors, depending on the function. State-of-the-art examples of ULSI
chips, such as the DEC Alpha or the INTEL Pentium contain 3 to 6 million
transistors.

ERA YEAR COMPLEXITY


(no. of logic blocks per chip)
Single transistor 1959 less than 1
Unit logic (one gate) 1960 1
Multi-function 1962 2-4
Complex function 1964 5 - 20
Medium Scale Integration 1967 20 - 200 (MSI)
Large Scale Integration 1972 200 - 2000 (LSI)
Very Large Scale Integration 1978 2000 - 20000(VLSI)
Ultra Large Scale Integration 1989 20000 - ? (ULSI)

Table-1.1: Evolution of logic complexity in integrated circuits.


The most important message here is that the logic complexity per chip has
been (and still is) increasing exponentially. The monolithic integration of a large
number of functions on a single chip usually provides:

 Less area/volume and therefore, compactness


 Less power consumption
 Less testing requirements at system level
 Higher reliability, mainly due to improved on-chip interconnects
 Higher speed, due to significantly reduced interconnection length
 Significant cost savings

Department of ECE, MRCE Page 14


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.2: Evolution of integration density and min feature size, as seen in the early 1980s.

Therefore, the current trend of integration will also continue in the foreseeable
future. Advances in device manufacturing technology, and especially the steady
reduction of minimum feature size (minimum length of a transistor or an interconnect
realizable on chip) support this trend. Figure 4.2 shows the history and forecast of
chip complexity - and minimum feature size - over time, as seen in the early 1980s. At
that time, a minimum feature size of 0.3 microns was expected around the year 2000.
A minimum size of 0.25 microns was readily achievable by the year 1995. As a direct
result of this, the integration density has also exceeded previous expectations - the
first 64 Mbit DRAM, and the INTEL Pentium microprocessor chip containing more
than 3 million transistors were already available by 1994, pushing the envelope of
integration density.

Figure-4.3: Level of integration over time, for memory chips and logic chips.

Department of ECE, MRCE Page 15


Low Power VLSI Circuit Design with Efficient HDL Coding

Generally speaking, logic chips such as microprocessor chips and digital


signal processing (DSP) chips contain not only large arrays of memory (SRAM) cells,
but also many different functional units. As a result, their design complexity is
considered much higher than that of memory chips, although advanced memory chips
contain some sophisticated logic functions. This is translated into the increase in the
design cycle time, which is the time period from the start of the chip development
until the mask-tape delivery time. However, in order to make the best use of the
current technology, the chip development time has to be short enough to allow the
maturing of chip manufacturing and timely delivery to customers. As a result, the
level of actual logic integration tends to fall short of the integration level achievable
with the current processing technology. Sophisticated computer-aided design (CAD)
tools and methodologies are developed and applied in order to manage the rapidly
increasing design complexity.

4.1.2 VLSI Design Flow:


The design process, at various levels, is usually evolutionary in nature.
It starts with a given set of requirements. Initial design is developed and tested against
the requirements. When requirements are not met, the design has to be improved. If
such improvement is either not possible or too costly, then the revision of
requirements and its impact analysis must be considered. The Y-chart (first introduced
by D. Gajski) shown in Fig. 4.4 illustrates a design flow for most logic chips, using
design activities on three different axes (domains) which resemble the letter Y.

Figure-4.4: Typical VLSI design flow in three domains (Y-chart representation).

Department of ECE, MRCE Page 16


Low Power VLSI Circuit Design with Efficient HDL Coding

The Y-chart consists of three major domains, namely:

 behavioral domain,
 structural domain,
 Geometrical layout domain.

The design flow starts from the algorithm that describes the behavior of the
target chip. The corresponding architecture of the processor is first defined. It is
mapped onto the chip surface by floor planning. The next design evolution in the
behavioral domain defines finite state machines (FSMs) which are structurally
implemented with functional modules such as registers and arithmetic logic units
(ALUs). These modules are then geometrically placed onto the chip surface using
CAD tools for automatic module placement followed by routing, with a goal of
minimizing the interconnects area and signal delays. The third evolution starts with a
behavioral module description. Individual modules are then implemented with leaf
cells. In standard-cell based design, leaf cells are already pre-designed and stored in a
library for logic design use.

Figure-4.5: A more simplified view of VLSI design flow.

Department of ECE, MRCE Page 17


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure 4.5 provides a more simplified view of the VLSI design flow, taking
into account the various representations, or abstractions of design - behavioral, logic,
circuit and mask layout. Note that the verification of design plays a very important
role in every step during this process. The failure to properly verify a design in its
early phases typically causes significant and expensive re-design at a later stage.

4.1.3 Design Hierarchy:

The use of hierarchy or � divide and conquer� technique involves dividing a

module into sub- modules and then repeating this operation on the sub-modules until
the complexity of the smaller parts becomes manageable. This approach is very
similar to the software case where large programs are split into smaller and smaller
sections until simple subroutines, with well-defined functions and interfaces, can be
written. In Section 1.2, we have seen that the design of a VLSI chip can be
represented in three domains. Correspondingly, a hierarchy structure can be described
in each domain separately. However, it is important for the simplicity of design that
the hierarchies in different domains can be mapped into each other easily. This
physical view describes the external geometry of the adder and how pin locations
allow some signals (in this case the carry signals) to be transferred from one sub-
block to the other without external routing. At lower levels of the physical hierarchy,
the internal mask.

Figure-4.6: Structural decomposition of a four-bit adder circuit, showing the


hierarchy down to gate level.

Department of ECE, MRCE Page 18


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.7: Regular design of a 2-1 MUX, a DFF and an adder, using inverters and
tri-state buffers.

4.1.4 VLSI Design Styles:

Several design styles can be considered for chip implementation of specified


algorithms or logic functions. Each design style has its own merits and shortcomings,
and thus a proper choice has to be made by designers in order to provide the
functionality at low cost.
(i) Field Programmable Gate Array (FPGA)
Fully fabricated FPGA chips containing thousands of logic gates or
even more, with programmable interconnects, are available to users for their custom
hardware programming to realize desired functionality. A typical field programmable
gate array (FPGA) chip consists of I/O buffers, an array of configurable logic blocks
(CLBs), and programmable interconnect structures. The programming of the
interconnects is implemented by programming of RAM cells whose output terminals
are connected to the gates of MOS pass transistors. A general architecture of FPGA
from XILINX is shown in Fig. 4.8. A more detailed view showing the locations of
switch matrices used for interconnect routing is given in Fig. 4.9. A simple CLB
(model XC2000 from XILINX) is shown in Fig. 4.10. It consists of four signal input
terminals (A, B, C, D), a clock signal terminal, user-programmable multiplexers, an
SR-latch, and a look-up table (LUT). The LUT is a digital memory that stores the
truth table of the Boolean function.

Department of ECE, MRCE Page 19


Low Power VLSI Circuit Design with Efficient HDL Coding

The CLB is configured such that many different logic functions can be
realized by programming its array. More sophisticated CLBs have also been
introduced to map complex functions. At this stage, the chip design is completely
described in terms of available logic cells. Next, the placement and routing step
assigns individual logic cells to FPGA sites (CLBs) and determines the routing
patterns among the cells in accordance with the net list. After routing is completed,
the on-chip

Figure-4.8: General architecture of Xilinx FPGAs.

Figure-4.9: Detailed view of switch matrices and interconnection routing between


CLBs.

Department of ECE, MRCE Page 20


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.10: XC2000 CLB of the Xilinx FPGA.

Performance of the design can be simulated and verified before downloading


the design for programming of the FPGA chip. The programming of the chip remains
valid as long as the chip is powered-on or until new programming is done. In most
cases, full utilization of the FPGA chip area is not possible - many cell sites may
remain unused.

The largest advantage of FPGA-based design is the very short turn-around


time, i.e., the time required from the start of the design process until a functional chip
is available. The typical price of FPGA chips are usually higher than other realization
alternatives (such as gate array or standard cells) of the same design, but for small-
volume production of ASIC chips and for fast prototyping, FPGA offers a very
valuable option.
(ii) Gate Array Design
In view of the fast prototyping capability, the gate array (GA) comes after the
FPGA. While the design implementation of the FPGA chip is done with user
programming, that of the gate array is done with metal mask design and processing.
Gate array implementation requires a two-step manufacturing process: The first
phase, which is based on generic (standard) masks, results in an array of uncommitted
transistors on each GA chip. These uncommitted chips can be stored for later
customization, which is completed by defining the metal interconnects between the
transistors of the array (Fig. 4.11). Since the patterning of metallic interconnects is
done at the end of the chip fabrication, the turn-around time can be still short, a few
days to a few weeks. Figure 4.12 shows a corner of a gate array chip which contains

Department of ECE, MRCE Page 21


Low Power VLSI Circuit Design with Efficient HDL Coding

bonding pads on its left and bottom edges, diodes for I/O protection, nMOS
transistors and pMOS transistors for chip output driver circuits in the neighboring
areas of bonding pads, arrays of nMOS transistors and pMOS transistors, underpass
wire segments, and power and ground buses along with contact windows.

Figure-4.11: Basic processing steps required for gate array implementation.

Figure-4.12: A corner of a typical gate array chip.

Figure 4.13 shows a magnified portion of the internal array with metal mask
design (metal lines highlighted in dark) to realize a complex logic function. Typical
gate array platforms allow dedicated areas, called channels, for intercell routing as
shown in Figs. 4.12 and 4.13 between rows or columns of MOS transistors. The
interconnection patterns to realize basic logic gates can be stored in a library, some

Department of ECE, MRCE Page 22


Low Power VLSI Circuit Design with Efficient HDL Coding

other platforms also offer dedicated memory (RAM) arrays to allow a higher density
where memory functions are required. Figure 4.14 shows the layout views of a
conventional gate array and a gate array platform with two dedicated memory banks.
With the use of multiple interconnect layers, the routing can be achieved over the
active cell areas; thus, the routing channels can be removed as in Sea-of-Gates (SOG)
chips. Here, the entire chip surface is covered with uncommitted nMOS and pMOS
transistors. As in the gate array case, neighboring transistors can be customized using
a metal mask to form basic logic gates. For intercell routing, however, some of the
uncommitted transistors must be sacrificed. This approach results in more flexibility
for interconnections, and usually in a higher density. The basic platform of a SOG
chip is shown in Fig. 4.15. Figure 4.16 offers a brief comparison between the
channeled (GA) vs. the channel less (SOG) approaches.

Figure-4.13: Metal mask design to realize a complex logic function on a channeled GA


platform.

Department of ECE, MRCE Page 23


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.14: Layout views of a conventional GA chip and a gate array with two memory
banks.

Figure-4.15: The platform of a Sea-of-Gates (SOG) chip.

In general, the GA chip utilization factor, as measured by the used chip area
divided by the total chip area, is higher than that of the FPGA and so is the chip
speed, since more customized design can be achieved with metal mask designs. The
current gate array chips can implement as many as hundreds of thousands of logic
gates.

Figure-4.16: Comparison between the channeled (GA) vs. the channelless (SOG)
approaches.

Department of ECE, MRCE Page 24


Low Power VLSI Circuit Design with Efficient HDL Coding

(iii) Standard-Cells Based Design


The standard-cells based design is one of the most prevalent full custom
design styles which require development of a full custom mask set. The standard cell
is also called the polycell. In this design style, all of the commonly used logic cells are
developed, characterized, and stored in a standard cell library. A typical library may
contain a few hundred cells including inverters, NAND gates, NOR gates, complex
AOI, OAI gates, D-latches, and flip-flops. The characterization of each cell is done
for several different categories. It consists of

 delay time vs. load capacitance


 circuit simulation model
 timing simulation model
 fault simulation model
 cell data for place-and-route
 mask data

To enable automated placement of the cells and routing of inter-cell


connections, each cell layout is designed with a fixed height.The power and ground
rails typically run parallel to the upper and lower boundaries of the cell, thus,
neighboring cells share a common power and ground bus. The input and output pins
are located on the upper and lower boundaries of the cell. Figure 4.17 shows the
layout of a typical standard cell. Notice that the nMOS transistors are located closer to
the ground rail while the pMOS transistors are placed closer to the power rail.

Figure-4.17: A standard cell layout example.

Figure 4.18 shows a floor plan for standard-cell based design. Inside the I/O
frame which is reserved for I/O cells, the chip area contains rows or columns of
standard cells. Between cell rows are channels for dedicated inter-cell routing. As in

Department of ECE, MRCE Page 25


Low Power VLSI Circuit Design with Efficient HDL Coding

the case of Sea-of-Gates, with over-the- cell routing, the channel areas can be reduced
or even removed provided that the cell rows offer sufficient routing space. The
physical design and layout of logic cells ensure that when cells are placed into rows,
their heights are matched and neighboring cells can be abutted side-by-side, which
provides natural connections for power and ground lines in each row. The signal
delay, noise margins, and power consumption of each cell should be also optimized
with proper sizing of transistors using circuit simulation.

Figure-4.18: A simplified floor plan of standard-cells-based design.

If a number of cells must share the same input and/or output signals, a
common signal bus structure can also be incorporated into the standard-cell-based
chip layout. Figure 4.19 shows the simplified symbolic view of a case where a signal
bus has been inserted between the rows of standard cells. Note that in this case the
chip consists of two blocks, and power/ground routing must be provided from both
sides of the layout area. Standard-cell based designs may consist of several such
macro-blocks, each corresponding to a specific unit of the system architecture such as
ALU, control logic, etc.

Department of ECE, MRCE Page 26


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.19: Simplified floor plan consisting of two separate blocks and a common signal
bus.

After chip logic design is done using standard cells in the library, the most
challenging task is to place individual cells into rows and interconnect them in a way
that meets stringent design goals in circuit speed, chip area, and power consumption.
Many advanced CAD tools for place-and-route have been developed and used to
achieve such goals. Also from the chip layout, circuit models which include
interconnect parasitic can be extracted and used for timing simulation and analysis to
identify timing critical paths. For timing critical paths, proper gate sizing is often
practiced to meet the timing requirements. In many VLSI chips, such as
microprocessors and digital signal processing chips, standard-cells based design is
used to implement complex control logic modules. Some full custom chips can be
also implemented exclusively with standard cells.

Finally, Fig. 4.20 shows the detailed mask layout of a standard-cell-based chip
with an uninterrupted single block of cell rows, and three memory banks placed on
one side of the chip. Notice that within the cell block, the separations between
neighboring rows depend on the number of wires in the routing channel between the
cell rows. If a high interconnect density can be achieved in the routing channel, the
standard cell rows can be placed closer to each other, resulting in a smaller chip area.
The availability of dedicated memory blocks also reduces the area, since the
realization of memory elements using standard cells would occupy a larger area.

Department of ECE, MRCE Page 27


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.20: Mask layout of a standard-cell-based chip with a single block of cells


and three memory banks.

(iv) Full Custom Design


Although the standard-cells based design is often called full custom design, in
a strict sense, it is somewhat less than fully custom since the cells are pre-designed for
general use and the same cells are utilized in many different chip designs. In a fuller
custom design, the entire mask design is done anew without use of any library.
However, the development cost of such a design style is becoming prohibitively high.
Thus, the concept of design reuse is becoming popular in order to reduce design cycle
time and development cost For logic chip design, data-path cells and PLAs. In real
full-custom layout in which the geometry, orientation and placement of every
transistor is done individually by the designer, design productivity is usually very low
- typically 10 to 20 transistors per day, per designer. In digital CMOS VLSI, full-
custom design is rarely used due to the high labor cost. Exceptions to this include the
design of high-volume products such as memory chips, high- performance
microprocessors and FPGA masters. Figure 4.21 shows the full layout of the Intel 486
microprocessor chip, which is a good example of a hybrid full-custom design. Here,
one can identify four different design styles on one chip: Memory banks (RAM
cache), data-path units consisting of bit-slice cells, control circuitry mainly consisting
of standard cells and PLA blocks.

Department of ECE, MRCE Page 28


Low Power VLSI Circuit Design with Efficient HDL Coding

Figure-4.21: Overview of VLSI design styles.

4.2 Introduction to Xilinx:


4.2.1 Migrating Projects from Previous ISE Software Releases:
When you open a project file from a previous release, the ISE® software
prompts you to migrate your project. If you click Backup and Migrate or Migrate
only, the software automatically converts your project file to the current release. If
you click Cancel, the software does not convert your project and, instead, opens
Project Navigator with no project loaded.

Note: After you convert your project, you cannot open it in previous versions of the
ISE software, such as the ISE 11 software. However, you can optionally create a
backup of the original project as part of project migration, as described below.

4.2.2 To Migrate a Project

 In the ISE 12 Project Navigator, select File > Open Project.


 In the Open Project dialog box, select the .xise file to migrate.
 Note: You may need to change the extension in the Files of type field
to display .npl (ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE
10 software) project files.

 In the dialog box that appears, select Backup and Migrate or


Migrate Only.

Department of ECE, MRCE Page 29


Low Power VLSI Circuit Design with Efficient HDL Coding

 The ISE software automatically converts your project to an ISE 12


project.
 Note: If you chose to Backup and Migrate, a backup of the original
project is created at project_name_ise12migration.zip.

 Implement the design using the new version of the software.

Note: Implementation status is not maintained after migration.

4.2.3 IP Modules:

If your design includes IP modules that were created using CORE


Generator™ software or Xilinx® Platform Studio (XPS) and you need to modify
these modules, you may be required to update the core. However, if the core netlist
is present and you do not need to modify the core, updates are not required and the
existing net list is used during implementation.

4.2.4 Obsolete Source File Types:


The ISE 12 software supports all of the source types that were supported in
the ISE 11 software. If you are working with projects from previous releases, state
diagram source files (.dia), ABEL source files (.abl), and test bench waveform source
files (.tbw) are no longer supported. For state diagram and ABEL source files, the
software finds an associated HDL file and adds it to the project, if possible. To
convert a TBW file after project migration, see Converting a TBW File to an HDL
Test Bench

4.2.5 Using ISE Example Projects:

To help familiarize you with the ISE® software and with FPGA and CPLD
designs, a set of example designs is provided with Project Navigator. The examples
show different design techniques and source types, such as VHDL, Verilog,
schematic, or EDIF, and include different constraints and IP.

To Open an Example

 Select File > Open Example.


 In the Open Example dialog box, select the Sample Project Name.

Department of ECE, MRCE Page 30


Low Power VLSI Circuit Design with Efficient HDL Coding

o Note To help you choose an example project, the Project Description


field describes each project. In addition, you can scroll to the right to
see additional fields, which provide details about the project.

 In the Destination Directory field, enter a directory name or browse to the


directory.
 Click OK.

o The example project is extracted to the directory you specified in the


Destination Directory field and is automatically opened in Project
Navigator. You can then run processes on the example project and
save any changes.

Note If you modified an example project and want to overwrite it with the
original example project, select File > Open Example, select the Sample Project
Name, and specify the same Destination Directory you originally used. In the dialog
box that appears, select Overwrite the existing project and click OK.

4.2.6 Creating a Project:


Project Navigator allows you to manage your FPGA and CPLD designs using
an ISE® project, which contains all the source files and settings specific to your
design. First, you must create a project and then, add source files, and set process
properties. After you create a project, you can run processes to implement, constrain,
and analyze your design. Project Navigator provides a wizard to help you create a
project as follows.

Note If you prefer, you can create a project using the New Project dialog
box instead of the New Project Wizard. To use the New Project dialog box, deselect
the Use New Project wizard option in the ISE General page of the Preferences
dialog box.

Department of ECE, MRCE Page 31


Low Power VLSI Circuit Design with Efficient HDL Coding

To Create a Project

 Select File > New Project to launch the New Project Wizard.
 In the Create New Project page, set the name, location, and project type, and
click Next.
 For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project
page, select the input and constraint file for the project, and click Next.
 In the Project Settings page, set the device and project properties, and click
Next.
 In the Project Summary page, review the information, and click Finish to
create the project

4.2.7 Design panel:


Project Navigator manages your project based on the design properties (top-
level module type, device type, synthesis tool, and language) you selected when you
created the project. It organizes all the parts of your design and keeps track of the
processes necessary to move the design from design entry through implementation to
programming the targeted Xilinx® device.

Note For information on changing design properties, see Changing Design


Properties.

 You can now perform any of the following:


Create new source files for your project.
Add existing source files to your project.
Run processes on your source files.

4.2.8 Creating a Copy of a Project:


You can create a copy of a project to experiment with different source options
and implementations. Depending on your needs, the design source files for the copied
project and their location can vary as follows:

 Design source files are left in their existing location, and the copied project

points to these files.


 Design source files, including generated files, are copied and placed in a

specified directory.

Department of ECE, MRCE Page 32


Low Power VLSI Circuit Design with Efficient HDL Coding

4.2.9 Using the Project Browser:


Alternatively, you can create an archive of your project, which puts all of
the project contents into a ZIP file. Archived projects must be unzipped before
being opened in Project Navigator. For information on archiving, see Creating a
Project Archive.

To Create a Copy of a Project

1. Select File > Copy Project.


2. In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as
long as you specify a different location.

3. Enter a directory Location to store the copied project.


4. Optionally, enter a Working directory.
By default, this is blank, and the working directory is the same as the project
directory. However, you can specify a working directory if you want to keep
your ISE® project file (.xise extension) separate from your working area.

5. Optionally, enter a Description for the copy.


The description can be useful in identifying key traits of the project for
reference later.

6. In the Source options area, do the following:


Select one of the following options:

 Keep sources in their current locations - to leave the design source files in
their existing location.

Department of ECE, MRCE Page 33


Low Power VLSI Circuit Design with Efficient HDL Coding

4.2.10. Exclude generated files from the copy:

When you select this option, the copied project opens in a state in which
processes have not yet been run. To automatically open the copy after creating it,
select

Open the copied project.

Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made. Click OK.

4.2.11 Creating a Project Archive:

A project archive is a single, compressed ZIP file with a .zip extension. By default, it
contains all project files, source files, and generated files, including the following:

 User-added sources and associated files


 Remote sources
 Verilog `include files
 Files in the macro search path
 Generated files
 Non-project files

4.2.12 Archive a Project:

 Select Project > Archive.


 In the Project Archive dialog box, specify a file name and directory
for the ZIP file.
 Optionally, select Exclude generated files from the archive to
exclude generated files and non-project files from the archive.
 Click OK.

A ZIP file is created in the specified directory. To open the archived project,
you must first unzip the ZIP file, and then, you can open the project.

Note Sources that reside outside of the project directory are copied into a
remote_sources subdirectory in the project archive.

Department of ECE, MRCE Page 34


Low Power VLSI Circuit Design with Efficient HDL Coding

4.3 Introduction to Verilog:


In the semiconductor and electronic-design industry, Verilog is a hardware
description language(HDL) used to model electronic systems. Verilog HDL, not to be
confused with VHDL (a competing language), is most commonly used in the design,
verification, and implementation of digital logic chips at the register-transfer
level of abstraction. It is also used in the verification of analog and mixed-signal
circuits.

4.3.1 Overview:

Hardware description languages such as Verilog differ from


software programming languages because they include ways of describing the
propagation of time and signal dependencies (sensitivity). There are two assignment
operators, a blocking assignment (=), and a non-blocking (<=) assignment. The non-
blocking assignment allows designers to describe a state-machine update without
needing to declare and use temporary storage variables (in any general programming
language we need to define some temporary storage spaces for the operands to be
operated on subsequently; those are temporary storage variables). Since these
concepts are part of Verilog's language semantics, designers could quickly write
descriptions of large circuits in a relatively compact and concise form. At the time of
Verilog's introduction (1984), Verilog represented a tremendous productivity
improvement for circuit designers who were already using graphical schematic
capture software and specially-written software programs to document and simulate
electronic circuits.

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling
of shared signal lines, where multiple sources drive a common net. When a wire has
multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths. A subset of statements in the Verilog language
is synthesizable. Verilog modules that conform to a synthesizable coding style, known
as RTL (register-transfer level), can be physically realized by synthesis software.
Synthesis software algorithmically transforms the (abstract) Verilog source into a net
list, a logically equivalent description consisting only of elementary logic primitives
(AND, OR, NOT, flip-flops, etc.) that are available in a

Department of ECE, MRCE Page 35


Low Power VLSI Circuit Design with Efficient HDL Coding

specific FPGA or VLSI technology. Further manipulations to the net list ultimately
lead to a circuit fabrication blueprint (such as a photo mask set for an ASIC or a bit
stream file for an FPGA).

4.3.2 History:
4.3.2 (a) Beginning

Verilog was the first modern hardware description language to be invented. It


was created by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The
wording for this process was "Automated Integrated Design Systems" (later renamed
to Gateway Design Automation in 1985) as a hardware modeling language. Gateway
Design Automation was purchased by Cadence Design Systems in 1990. Cadence
now has full proprietary rights to Gateway's Verilog and the Verilog-XL, the HDL-
simulator that would become the de-facto standard (of Verilog logic simulators) for
the next decade. Originally, Verilog was intended to describe and allow simulation;
only afterwards was support for synthesis added.

4.3.2(b) Verilog-95

With the increasing success of VHDL at the time, Cadence decided to make
the language available for open standardization. Cadence transferred Verilog into the
public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-
1995, commonly referred to as Verilog-95. In the same time frame Cadence initiated
the creation of Verilog-A to put standards support behind its analog simulator Spectre.
Verilog-A was never intended to be a standalone language and is a subset of Verilog-
AMS which encompassed Verilog-95.

4.3.2(c) Verilog 2001

Extensions to Verilog-95 were submitted back to IEEE to cover the


deficiencies that users had found in the original Verilog standard. These extensions
became IEEE Standard 1364-2001 known as Verilog-2001. Verilog-2001 is a
significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform
signed operations using awkward bit-level manipulations The same function under
Verilog-2001 can be more succinctly described by one of the built-in operators: +, -, /,
*, >>>. A generate/end generate construct (similar to VHDL's generate/end generate)

Department of ECE, MRCE Page 36


Low Power VLSI Circuit Design with Efficient HDL Coding

allows Verilog-2001 to control instance and statement instantiation through normal


decision operators (case/if/else). Using generate/end generate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual
instances. File I/O has been improved by several new system tasks. And finally, a few
syntax additions were introduced to improve code readability (e.g. always @*, named
parameter override, C-style function/task/module header declaration).

4.3.2(d) Verilog 2005

Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-


2005) consists of minor corrections, spec clarifications, and a few new language
features (such as the unwire keyword). A separate part of the Verilog
standard, Verilog-AMS, attempts to integrate analog and mixed signal modeling with
traditional Verilog.

4.3.2(e) SystemVerilog

SystemVerilog is a superset of Verilog-2005, with many new features and


capabilities to aid design verification and design modeling. As of 2009, the
SystemVerilog and Verilog language standards were merged into SystemVerilog
2009 (IEEE Standard 1800-2009). In the late 1990s, the Verilog Hardware
Description Language (HDL) became the most widely used language for describing
hardware for simulation and synthesis. However, the first two versions standardized
by the IEEE (1364-1995 and 1364-2001) had only simple constructs for creating tests.
As design sizes outgrew the verification capabilities of the language, commercial
Hardware Verification Languages (HVL) such as Open Vera and e were created.
Companies that did not want to pay for these tools instead spent hundreds of man-
years creating their own custom tools. This productivity crisis (along with a similar
one on the design side) led to the creation of Accellera, a consortium of EDA
companies and users who wanted to create the next generation of Verilog. The
donation of the Open-Vera language formed the basis for the HVL features of
SystemVerilog.Accellera’s goal was met in November 2005 with the adoption of the
IEEE standard P1800-2005 for SystemVerilog, IEEE (2005). The most valuable
benefit of SystemVerilog is that it allows the user to construct reliable, repeatable
verification environments, in a consistent syntax, that can be used across multiple
projects

Department of ECE, MRCE Page 37


Low Power VLSI Circuit Design with Efficient HDL Coding

Some of the typical features of an HVL that distinguish it from a Hardware


Description Language such as Verilog or VHDL are
 Constrained-random stimulus generation
 Functional coverage
 Higher-level structures, especially Object Oriented Programming
 Multi-threading and intercrosses communication
 Support for HDL types such as Verilog’s 4-state values
 Tight integration with event-simulator for control of the design
There are many other useful features, but these allow you to create test
benches at a higher level of abstraction than you are able to achieve with an HDL or a
programming language such as C. System Verilog provides the best framework to
achieve coverage-driven verification (CDV). CDV combines automatic test
generation, self-checking test benches, and coverage metrics to significantly reduce
the time spent verifying a design. The purpose of CDV is to:
 Eliminate the effort and time spent creating hundreds of tests.

 Ensure thorough verification using up-front goal setting.

4.3.2(f) Examples

Ex1: A hello world program looks like this:

module main;
initial
begin
$display("Hello world!");
$finish;
end
end module

Ex2: A simple example of two flip-flops follows:

module top-level(clock, reset);


input clock;
input reset;
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)

Department of ECE, MRCE Page 38


Low Power VLSI Circuit Design with Efficient HDL Coding

if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
end module

The "<=" operator in Verilog is another aspect of its being a hardware


description language as opposed to a normal procedural language. This is known as a
"non-blocking" assignment. Its action doesn't register until the next clock cycle. This
means that the orders of the assignments are irrelevant and will produce the same
result: flop1 and flop2 will swap values every clock. The other assignment operator,
"=", is referred to as a blocking assignment. When "=" assignment is used, for the
purposes of logic, the target variable is updated immediately. In the above example,
had the statements used the "=" blocking operator instead of "<=", flop1 and flop2
would not have been swapped.

Ex3: An example counter circuit follows:

module Div20x (rst, clk, cet, cep, count, tc);


// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
parameter size = 5;
parameter length = 20;
input rst; // These inputs/outputs represent
input clk; // connections to the module.
input cet;
input cep;

Department of ECE, MRCE Page 39


Low Power VLSI Circuit Design with Efficient HDL Coding

output [size-1:0] count;


output tc;
reg [size-1:0] count; // Signals assigned
// within an always
// (or initial)block
// must be of type reg
wire tc; // Other signals are of type wire
// The always statement below is a parallel
// executes any time the signals
// rst or clk transition from low to high
always @ (posedge clk or posedge rst)
if (rst) // This causes reset of the cntr
count <= {size{1'b0}};
else
if (cet && cep) // Enables both true
begin
if (count == length-1)
count <= {size{1'b0}};
else
count <= count + 1'b1;
end
// the value of tc is continuously assigned
// the value of the expression
assign tc = (cet && (count == length-1));
end module

Department of ECE, MRCE Page 40


Low Power VLSI Circuit Design with Efficient HDL Coding

Ex4: An example of delays:

reg a, b, c, d;
wire e;
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end

4.3.3 Constants:

The definition of constants in Verilog supports the addition of a width


parameter. The basic syntax is:

<Width in bits>'<base letter><number>

Examples:

 12'h123 - Hexadecimal 123 (using 12 bits)


 20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
 4'b1010 - Binary 1010 (using 4 bits)
 6'o77 - Octal 77 (using 6 bits)

4.3.4 Synthesizable Constructs:

There are several statements in Verilog that have no analog in real hardware,
e.g. $display. Consequently, much of the language cannot be used to describe
hardware. The examples presented here are the classic subset of the language that has
a direct mapping to real gates.

// Mux examples - Three ways to do the same thing.


// The first example uses continuous assignment
wire out;
assign out = sel ? a : b;
// the second example uses a procedure
// to accomplish the same thing.

Department of ECE, MRCE Page 41


Low Power VLSI Circuit Design with Efficient HDL Coding

reg out;
always @(a or b or sel)
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always @(a or b or sel)
if (sel)
out = a;
else
out = b;

The next interesting structure is a transparent latch; it will pass the input to the
output when the gate signal is set for "pass-through", and captures the input and stores
it upon transition of the gate signal to "hold". In the example below the "pass-
through" level of the gate would be when the value of the if clause is true, i.e. gate =
1. This is read "if gate is true, the din is fed to latch out continuously." Once the if
clause is false, the last value at latch out will remain and is independent of the value
of din.

EX6: // Transparent latch example


reg out;
always @(gate or din)
if(gate)
out = din; // Pass through state
// Note that the else isn't required here. The variable
// When gate goes low, out will remain constant.

The flip-flop is the next significant template; in Verilog, the D-flop is the
simplest, and it can be modeled as:

reg q;

Department of ECE, MRCE Page 42


Low Power VLSI Circuit Design with Efficient HDL Coding

always @(posedge clk)


q <= d;

The significant thing to notice in the example is the use of the non-blocking
assignment. A basic rule of thumb is to use <= when there is a
posedge or negedge statement within the always clause. A variant of the D-flop is
one with an asynchronous reset.

reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;

The next variant is including both an asynchronous reset and asynchronous set
condition; again the convention comes into play, i.e. the reset term is followed by the
set term.

reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;

Note: If this model is used to model a Set/Reset flip flop then simulation errors
can result. Consider the following test sequence of events. 1) reset goes high 2) clk
goes high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6) set
going low.

4.3.5 Initial Vs Always:

There are two separate ways of declaring a Verilog process. These are
the always and the initial keywords. The always keyword indicates a free-running
process. The initial keyword indicates a process executes exactly once. Both

Department of ECE, MRCE Page 43


Low Power VLSI Circuit Design with Efficient HDL Coding

constructs begin execution at simulator time 0, and both execute until the end of the
block. Once an always block has reached its end, it is rescheduled (again).

//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Any time a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;

These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an always keyword without
the @(...) sensitivity list. It is possible to use always as shown below:

always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin

The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever. The other interesting exception is the use of
the initial keyword with the addition of the forever keyword.

Department of ECE, MRCE Page 44


Low Power VLSI Circuit Design with Efficient HDL Coding

4.3.6 Race Condition:

The order of execution isn't always guaranteed within Verilog. This can best
be illustrated by a classic example. Consider the code snippet below:

initial
a = 0;
initial
b = a;
initial
begin
#1;
$display ("Value a=%b Value of b=%b",a,b);
end

What will be printed out for the values of a and b? Depending on the order of
execution of the initial blocks, it could be zero and zero.

4.3.7 Operators:
Note: These operators are not shown in order of precedence.

Operator Operator Operation performed


type symbols
~ Bitwise NOT (1's complement)

& Bitwise AND


Bitwise
| Bitwise OR

^ Bitwise XOR

~^ or ^~ Bitwise XNOR

! NOT
Logical
&& AND

|| OR

Reduction & Reduction AND

Department of ECE, MRCE Page 45


Low Power VLSI Circuit Design with Efficient HDL Coding

~& Reduction NAND

| Reduction OR

~| Reduction NOR

^ Reduction XOR

~^ or ^~ Reduction XNOR

+ Addition

- Subtraction

- 2's complement
Arithmetic

* Multiplication

/ Division

** Exponentiation (*Verilog-2001)

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to


Relational

== Logical equality (bit-value 1'bX is removed from


comparison)
!= Logical inequality (bit-value 1'bX is removed
from comparison)
=== 4-state logical equality (bit-value 1'bX is taken as
literal)
!== 4-state logical inequality (bit-value 1'bX is taken
as literal)
>> Logical right shift
Shift
<< Logical left shift

>>> Arithmetic right shift (*Verilog-2001)

Department of ECE, MRCE Page 46


Low Power VLSI Circuit Design with Efficient HDL Coding

<<< Arithmetic left shift (*Verilog-2001)

Concatenation {,} Concatenation

Replication {n{m}} Replicate value m for n times

Conditional ?: Conditional

Table 5: List of Operators.

4.3.8 System Tasks:

System tasks are available to handle simple I/O, and various design measurement
functions. All system tasks are prefixed with $ to distinguish them from user tasks and
functions. This section presents a short list of the most often used tasks. It is by no
means a comprehensive list.

 $display - Print to screen a line followed by an automatic newline.


 $write - Write to screen a line without the newline.
 $swrite - Print to variable a line without the newline.
 $sscanf - Read from variable a format-specified string. (*Verilog-2001)
 $fopen - Open a handle to a file (read or write)
 $fdisplay - Write to file a line followed by an automatic newline.
 $fwrite - Write to file a line without the newline.
 $fscanf - Read from file a format-specified string. (*Verilog-2001)
 $fclose - Close and release an open file handle.
 $readmemh - Read hex file content into a memory array.
 $readmemb - Read binary file content into a memory array.
 $monitor - Print out all the listed variables when any change value.
 $time - Value of current simulation time.
 $dumpfile - Declare the VCD (Value Change Dump) format output file name.
 $dumpvars - Turn on and dump the variables.
 $dumpports - Turn on and dump the variables in Extended-VCD format.

 $random - Return a random value.

Department of ECE, MRCE Page 47


Low Power VLSI Circuit Design with Efficient HDL Coding

SIMULATION RESULTS

Department of ECE, MRCE Page 48


Low Power VLSI Circuit Design with Efficient HDL Coding

Department of ECE, MRCE Page 49


Low Power VLSI Circuit Design with Efficient HDL Coding

Department of ECE, MRCE Page 50


Low Power VLSI Circuit Design with Efficient HDL Coding

CONCLUSION
We have presented high performance, automated FPGA designs of integer arithmetic
cores with scan FFs, following the principle of primitive instantiation and constrained
placement. The methodology ideally suits to circuits where the configured logic
elements are underutilized, or the nature of the circuit in itself permits certain design
specific changes such as reshuffling of inputs or priority encoding for insertion of
scan FFs with no hardware overhead. No amount of changes in the option settings for
any synthesis or optimization goal and effort for the behavioral designs can match up
to our proposed architecture, both in terms of area and speed.

Department of ECE, MRCE Page 51


Low Power VLSI Circuit Design with Efficient HDL Coding

REFERENCES
[1] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, “New protection techniques
against SEUs for moving average filters in a radiation environment,” IEEE Trans.
Nucl. Sci., vol. 54, no. 4, pp. 957–964, Aug. 2007.

[2] M. Hatamian et al., “Design considerations for gigabit ethernet 1000 base-T
twisted pair transceivers,” Proc. IEEE Custom Integr. Circuits Conf., pp. 335–342,
1998.

[3] H. Yamasaki and T. Shibata, “A real-time image-feature-extraction and vector-


generation vlsi employing arrayed-shift-register architecture,” IEEE J. Solid-State
Circuits, vol. 42, no. 9, pp. 2046–2053, Sep. 2007.

[4] H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, and G.-H. Cho, “A 10-bit column-
driver IC with parasitic-insensitive iterative charge-sharing based capacitor-string
interpolation for mobile active-matrix LCDs,” IEEE J. Solid-State Circuits, vol. 49,
no. 3, pp. 766–782, Mar. 2014.

[5] S.-H. W. Chiang and S. Kleinfelder, “Scaling and design of a 16-megapixel


CMOS image sensor for electron microscopy,” in Proc. IEEE Nucl. Sci. Symp. Conf.
Record (NSS/MIC), 2009, pp. 1249–1256. [6] S. Heo, R. Krashinsky, and K.
Asanovic, “Activity-sensitive flip-flop and latch selection for reduced energy,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep.
2007.

[7] S. Naffziger and G. Hammond, “The implementation of the nextgeneration 64 b


itanium microprocessor,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, Feb. 2002, pp. 276–504.

[8] H. Partovi et al., “Flow-through latch and edge-triggered flip-flop hybrid


elements,” IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 138–
139, Feb. 1996.

[9] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, “Conditional push-pull pulsed


latch with 726 fJops energy delay product in 65 nm CMOS,” in IEEE Int. Solid-State
Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 482–483.

Department of ECE, MRCE Page 52


Low Power VLSI Circuit Design with Efficient HDL Coding

[10] V. Stojanovic and V. Oklobdzija, “Comparative analysis of masterslave latches


and flip-flops for high-performance and low-power systems,” IEEE J. Solid-State
Circuits, vol. 34, no. 4, pp. 536–548, Apr. 1999.

[11] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,”


IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.

[12] S. Nomura et al., “A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps


decoding, 8-core media processor with embedded forwardbody-biasing and power-
gating circuit in 65 nm CMOS technology,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, Feb. 2008, pp. 262–264. [13] Y. Ueda et al., “6.33 mW
MPEG audio decoding on a multimedia processor,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2006, pp. 1636–1637.

[14] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, “Conditional-capture flip-flop for
statistical power reduction,” IEEE J. Solid-State Circuits, vol. 36, pp. 1263–1271,
Aug. 2001.

[15] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, “A 77% energy-saving 22-


transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in
40 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
Feb. 2011, pp. 338–339

Department of ECE, MRCE Page 53


Low Power VLSI Circuit Design with Efficient HDL Coding

Sample code:

`timescale 1ns / 1ps

//////////////////////////////////////////////////////////////////////////////////

// Company:

// Engineer:

//

// Create Date: 14:10:37 02/03/2018

// Design Name:

// Module Name: SCAN_COUNTER

// Project Name:

// Target Devices:

// Tool versions:

// Description:

//

// Dependencies:

//

// Revision:

// Revision 0.01 - File Created

// Additional Comments:

//

//////////////////////////////////////////////////////////////////////////////////

Department of ECE, MRCE Page 54


Low Power VLSI Circuit Design with Efficient HDL Coding

module mux2x1(din_0, din_1, sel, mux_out );

//-----------Input Ports---------------

input din_0, din_1, sel ;

//-----------Output Ports---------------

output mux_out;

//------------Internal Variables--------

wire mux_out;

//-------------Code Start-----------------

assign mux_out = (sel) ? din_1 : din_0;

endmodule //End Of Module mux

//////////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////////

module lut6_2(td,ld,q,ext,ud,o1,o2);

input td,ld,q,ext,ud;

output wire o1,o2;

wire x1,mux_out1;

mux2x1 M1(q, ext, ld, mux_out1 );

mux2x1 M2(ud, q, td, o2 );

assign x1 = mux_out1^ud;

assign o1 = x1&(~td);

endmodule

//////////////////////////////////////////////////////////////////////////////////

Department of ECE, MRCE Page 55


Low Power VLSI Circuit Design with Efficient HDL Coding

//////////////////////////////////////////////////////////////////////////////////

module carry(o1,o2,o3,c1,c2);

input o1,o2,o3;

output wire c1,c2;

mux2x1 M3(o2, o3, o1, c1 );

assign c2=o1^o3;

endmodule

//////////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////////

module dff (data,clk,q);

input data;

input clk ;

output reg q;

always @ ( posedge clk)

begin

q <= data;

end

endmodule

//////////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////////

module scan_counter (td,ld,q,ext,ud,sd,clk,cout,dout);

input td,ld,ud,sd,clk;

Department of ECE, MRCE Page 56


Low Power VLSI Circuit Design with Efficient HDL Coding

input [3:0]q,ext;

output wire cout;

output wire[3:0]dout;

wire [3:0] tempp;

lut6_2 L1(td,ld,q[3],ext[3],ud,o1,o2);

lut6_2 L2(td,ld,q[2],ext[2],ud,o3,o4);

lut6_2 L3(td,ld,q[1],ext[1],ud,o5,o6);

lut6_2 L4(td,ld,q[0],ext[0],ud,o7,o8);

mux2x1 M5(~ud, sd, td, temp );

carry C1(o7,o8,temp,c11,c21);

carry C2(o5,o6,c11,c12,c22);

carry C3(o3,o4,c12,c13,c23);

carry C4(o1,o2,c13,cout,c24);

//assign temp={c24,c23,c22,c21};

dff D1 (c21,clk,dout[0]);

dff D2 (c22,clk,dout[1]);

dff D3 (c23,clk,dout[2]);

dff D4 (c24,clk,dout[3]);

endmodule

Department of ECE, MRCE Page 57

You might also like