Page No: List of Figures List of Tables
Page No: List of Figures List of Tables
CONTENTS PAGE NO
ABSTRACT iv
CHAPTER 1
1.INTRODUCTION 1-3
CHAPTER 2
CHAPTER 3
CHAPTER 4
4.TOOLS REQUIRED 13
4.2.3 IP MODULES 30
4.3.2 HISTORY 36
4.3.3 CONSTANTS 41
CHAPTER 6
CONCLUSION 50
REFERENCES 51-52
INFORMATION TECHNOLOGIES
CHART REPRESENTATION)
i
FIGURE-4.10: XC2000 CLB OF THE XILINX FPGA 21
IMPLEMENTATION
DESIGN
ii
LIST OF TABLES
iii
ABSTRACT
In this paper, four-bit unsigned up counter with an asynchronous clear and a clock enable is
designed in Xilinx ISE 14.2 and implemented on high performance Virtex-6 FPGA,
XC6VLX240T device, -1 speed grade, FFG1156 package and ML605 board. User
constraints file (ucf) and net list constraints design (ncd) file are taken into consideration
with X Power 14.2 for power consumption analysis. We take two codes. Our first code maps
the clock enable signal to LUTs then the power consumption is 3.423 Watt. Our second code
maps the clock enable signal to control ports then the power consumption is 3.625 Watt. By
changing mapping style, we reduce 6% power reduction and also reduce number of LUT and
D flip-flop used in implementation leads to area efficient design. By efficiently mapping, we
reduce power consumption in multiple of power reduction with single statements. The
experimental result shows the power analysis of both HDL mapping code.
iv
Low Power VLSI Circuit Design with Efficient HDL Coding
INTRODUCTION
This paper proposes a low-power and area-efficient shift register using pulsed
latches. The area and power consumption are reduced by replacing flip-flops with
pulsed latches. This method solves the timing problem between pulsed latches
through the use of multiple non-overlap delayed pulsed clock signals instead of the
conventional single pulsed clock signal. The shift register uses a small number of the
pulsed clock signals by grouping the latches to several sub shifter registers and using
additional temporary storage latches. A 256-bit shift register using pulsed latches was
fabricated using a 0.18 um CMOS process with vdd=1.8v. The core area is 6600um2.
The power consumption is 1.2mW at a 100 MHz clock frequency. The proposed shift
register saves 37% area and 44% power compared to the conventional shift register
with flip-flops.
A Shift register is the basic building block in a VLSI circuit. Shift registers are
commonly used in many applications, such as digital filters [1], communication
receivers [2], and image processing ICs [3]–[5]. Recently, as the size of the image
data continues to increase due to the high demand for high quality image data, the
word length of the shifter register increases to process large image data in image
processing ICs. An image-extraction and vector generation VLSI chip uses a 4K-bit
shift register [3]. A 10-bit 208 channel output LCD column driver IC uses a 2K-bit
shift register [4]. A 16-megapixel CMOS image sensor uses a 45K-bit shift register
[5]. As the word length of the shifter register increases, the area and power
consumption of the shift register become important design considerations.
Flip flops are the basic storage elements used extensively in all kinds of digital
designs. The current trends will eventually mandate low power design automation on
a very large scale to match the trends of power consumption of today’s and future
integrated chips. Power consumption of Very Large Scale Integrated (VLSI) design is
given by generalized relation, P = CV2f .Since power is proportional to the square of
the voltage as per the relation; voltage scaling is the most prominent way to reduce
power dissipation. the pulsed latch consumes less power than the flip flop.
LITERATURE SURVEY
Single event effects (SEEs) caused by radiation are a major concern when
working with circuits that need to operate in certain environments, like for example in
space applications. In this paper, new techniques for the implementation of moving
average filters that provide protection against SEEs are presented, which have a lower
circuit complexity and cost than traditional techniques like triple modular redundancy
(TMR). The effectiveness of these techniques has been evaluated using a software
fault injection platform and the circuits have been synthesized for a commercial
library in order to assess their complexity. The main idea behind the presented
approach is to exploit the structure of moving average filter implementations to deal
with SEEs at a higher level of abstraction. Gigabit Ethernet on Category-5 cable is the
next generation high-speed Ethernet LAN for twisted pair copper medium with a
minimum required reach of 100 meters. This paper presents a brief overview of the
transmission scheme agreed upon by the IEEE 802.3ab task force for 1Gb/s full-
duplex operation over 4 pairs of category-5 cable. Some system level simulation
results are presented followed by a discussion of the type of digital and analog circuits
required for a single chip mixed-signal CMOS implementation of the transceiver. For
reliable operation under worst case cabling conditions, the DSP portion of the
transceiver has to perform over 150 Giga operations per second. A feature-extraction
and vector-generation VLSI has been developed for real-time image recognition.
high-speed feature vector generation in less than 9.7 ns/vector element has been
experimentally demonstrated. It is possible to scan a VGA-size image at a rate of 6.1
frames/s, thus generating as many as 1.5 x 106 feature vectors per second for
recognition. This is more than 103 times faster than software processing running on a
3-GHz general-purpose processor.
This paper presents a 10-bit column driver IC for active-matrix LCDs, with a
proposed iterative charge-sharing based (ICSB) capacitor-string that interpolates two
output voltages from a resistor-string DAC. Iterative mode change between a
capacitive voltage division mode and a charge sharing mode in the ICSB capacitor-
string interpolation suppresses the effect of mismatches between capacitors and that
of parasitic capacitances; thus, a highly linear capacitor sub-DAC is realized. In
addition, the area-sharing layout technique, which stacks the interpolation capacitor-
string on top of the R-DAC area, reduces the driver channel size and extends the bit
resolution of the gamma-corrected nonlinear main R-DAC. Consequently, the
proposed ICSB capacitor-string interpolation scheme provides highly uniform channel
performance by passively dividing the coarse voltages from the global resistor-string
DAC with high area efficiency, and more effective bit resolution for nonlinear gamma
correction.
10Gb/s. The large chip dimensions and the increased power consumption in EM7 also
require more robust power distribution. A matrix-math simulation shows the worst-
case pixel IR voltage drop was improved from 20 mV to 8 mV. Similarly, the pixel's
worst-case analog output's IR drop is reduced from 80.7 mV to 2.58 mV, and its
bandwidth is thus increased from 6.92MHz to 14.4MHz. The power supply IR drop in
the output processing stage's op-amps is reduced from 327 mV to 35 mV, their open-
loop gain variation is reduced from 525% to 28%, and their worst-case bandwidth is
increased from 0.87 MHz to 764MHz.
This paper presents new techniques to evaluate the energy and delay of flip-
flop and latch designs and shows that no single existing design performs well across
the wide range of operating regimes present in complex systems. We propose the use
of a selection of flip-flop and latch designs, each tuned for different activation
patterns and speed requirements. We illustrate our technique on a pipelined MIPS
processor datapath running SPECint95 benchmarks, where we reduce total flip-flop
and latch energy by over 60% without increasing cycle time.
Flip-flops (FFs) are key building blocks in the design of high-speed energy-
efficient microprocessors, as their data-to-output delay (D-Q) and power dissipation
strongly affect the processor's clock period and overall power. From previous
analyses, the Transmission-Gate Pulsed Latch (TGPL) proved to be the most energy-
efficient FF in a large portion of the design space, ranging from high speed
(minimizing ED' products with j>;1) to minimum ED product designs, while simple
Master-Slave FFs (TGFF and ACFF ) are the most energy-efficient in the low-power
E-D space region.
TGPL also has the lowest D Q delay along with STFF. However, the latter has
considerably worse energy efficiency, hence, the TGPL is the best reference for a
comparison. In this work, two new FFs are introduced, the Conditional Push-Pull
Pulsed Latch (CP3L), and a version with a Shareable (CSP3L) Pulse Generator (PG).
In this paper, we propose a set of rules for consistent estimation of the real
performance and power features of the flip-flop and master-slave latch structures.
PROJECT DISCRIPTION
Shift Registers the Shift Register is another type of sequential logic circuit
that is used for the storage or transfer of data in the form of binary numbers and then
"shifts" the data out once every clock cycle, hence the name shift register. It basically
consists of several single bit "D-Type Data Latches", one for each bit (0 or 1)
connected together in a serial or daisy-chain arrangement so that the output from one
data latch becomes the input of the next latch and so on. The data bits may be fed in
or out of the register serially, i.e. one after the other from either the left or the right
direction, or in parallel, i.e. all together. The number of individual data latches
required to make up a single Shift Register is determined by the number of bits to be
stored with the most common being 8-bits wide, i.e. eight individual data latches.
Shift Registers are used for data storage or data movement and are used in calculators
or computers to store data such as two binary numbers before they are added together,
or to convert the data from either a serial to parallel or parallel to serial format. The
individual data latches that make up a single shift register are all driven by a common
clock (Clk) signal making them synchronous devices. Shift register IC's are generally
provided with a clear or reset connection so that they can be "SET" or "RESET" as
required. Generally, shift registers operate in one of four different modes with the
basic movement of data through a shift register being: • Serial-in to Parallel-out
(SIPO) - The register is loaded with serial data, one bit at a time, with the stored data
being available in parallel form
. • Serial-in to Serial-out (SISO) - The data is shifted serially "IN" and "OUT" of the
register, one bit at a time in either a left or right direction under clock control.
• Parallel-in to Serial-out (PISO) - The parallel data is loaded into the register
simultaneously and is shifted out of the register serially one bit at a time under clock
control.
Also, the directional movement of the data through a shift register can be
either to the left, (left shifting) to the right, (right shifting) left-in but right-out,
(rotation) or both left and right shifting within the same register thereby making it
bidirectional.
The operation is as follows. Let us assume that all the flip-flops (FFA to FFD)
have just been RESET (CLEAR input) and that all the outputs QA to QD are at logic
level "0" i.e., no parallel data output. If a logic "1" is connected to the DATA input
pin of FFA then on the first clock pulse the output of FFA and therefore the resulting
QA will be set HIGH to logic "1" with all the other outputs still remaining LOW at
logic "0". Assume now that the DATA input pin of FFA has returned LOW again to
logic "0" giving us one data pulse or 0-1-0. The second clock pulse will change the
output of FFA to logic "0" and the output of FFB and QB HIGH to logic "1" as its
input D has the logic "1" level on it from QA. The logic "1" has now moved or been
"shifted" one place along the register to the right as it is now at QA. When the third
clock pulse arrives this logic "1" value moves to the output of FFC (QC) and so on
until the arrival of the fifth clock pulse which sets all the outputs QA to QD back
again to logic level "0" because the input to FFA has remained constant at logic level
"0". The effect of each clock pulse is to shift the data contents of each stage one place
to the right, and this is shown in the following table until the complete data value of 0-
0-0-1 is stored in the register. This data value can now be read directly from the
outputs of QA to QD. Then the data has been converted from a serial data input signal
to a parallel data output. The truth table and following waveforms show the
propagation of the logic "1" through the register from left to right as follows. Basic
Movement of Data through a Shift Register Clock Pulse No QA QB QC QD 0 0 0 0 0
1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 5 0 0 0 0 Note that after the fourth clock pulse
has ended the 4-bits of data (0-0-0-1) are stored in the register and will remain there
provided clocking of the register has stopped. In practice the input data to the register
may consist of various combinations of logic "1" and "0". Commonly available SIPO
IC's include the standard 8-bit 74LS164 or the 74LS594.
This shift register is very similar to the SIPO above, except were before the
data was read directly in a parallel form from the outputs QA to QD, this time the data
is allowed to flow straight through the register and out of the other end. Since there is
only one output, the DATA leaves the shift register one bit at a time in a serial
pattern, hence the name Serial-in to Serial-Out Shift Register or SISO.
The SISO shift register is one of the simplest of the four configurations as it
has only three connections, the serial input (SI) which determines what enters the left
hand flip-flop, the serial output (SO) which is taken from the output of the right hand
flip-flop and the sequencing clock signal (Clk). The logic circuit diagram below
shows a generalized serial-in serial-out shift register. 4-bit Serial-in to Serial-out Shift
Register You may think what the point of a SISO shift register is if the output data is
exactly the same as the input data. Well this type of Shift Register also acts as a
temporary storage device or as a time delay device for the data, with the amount of
time delay being controlled by the number of stages in the register, 4, 8, 16 etc or by
varying the application of the clock pulses.
The Parallel-in to Serial-out shift register acts in the opposite way to the serial-
in to parallel-out one above. The data is loaded into the register in a parallel format
i.e. all the data bits enter their inputs simultaneously, to the parallel input pins PA to
PD of the register. The data is then read out sequentially in the normal shift-right
mode from the register at Q representing the data present at PA to PD. This data is
outputted one bit at a time on each clock cycle in a serial format. It is important to
note that with this system a clock pulse is not required to parallel load the register as it
is already present, but four clock pulses are required to unload the data.
As this type of shift register converts parallel data, such as an 8-bit data word
into serial format, it can be used to multiplex many different input lines into a single
serial DATA stream which can be sent directly to a computer or transmitted over a
communications line. Commonly available IC's include the 74HC166 8-bit Parallel-
in/Serial-out Shift Registers.
The PIPO shift register is the simplest of the four configurations as it has only
three connections, the parallel input (PI) which determines what enters the flip-flop,
the parallel output (PO) and the sequencing clock signal (Clk). Similar to the Serial-in
to Serial-out shift register, this type of register also acts as a temporary storage device
or as a time delay device, with the amount of time delay being varied by the
frequency of the clock pulses. Also, in this type of register there are no
interconnections between the individual flip-flops since no serial shifting of the data is
required
TOOLS REQUIRED
The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and
consumer electronics has been rising steadily, and at a very fast pace. Typically, the
required computational power (or, in other words, the intelligence) of these
applications is the driving force for the fast development of this field. Figure 4.1 gives
an overview of the prominent trends in information technologies over the next few
decades. The current leading-edge technologies (such as low bit-rate video and
cellular communications) already provide the end-users a certain amount of
processing power and portability.
information services tend to become more and more personalized (as opposed to
collective services such as broadcasting), which means that the devices must be more
intelligent to answer individual demands, and at the same time they must be portable
to allow more flexibility/mobility Table 1.1 shows the evolution of logic complexity
in integrated circuits over the last three decades, and marks the milestones of each era.
Here, the numbers for circuit complexity should be interpreted only as representative
examples to show the order-of-magnitude. A logic block can contain anywhere from
10 to 100 transistors, depending on the function. State-of-the-art examples of ULSI
chips, such as the DEC Alpha or the INTEL Pentium contain 3 to 6 million
transistors.
Figure-4.2: Evolution of integration density and min feature size, as seen in the early 1980s.
Therefore, the current trend of integration will also continue in the foreseeable
future. Advances in device manufacturing technology, and especially the steady
reduction of minimum feature size (minimum length of a transistor or an interconnect
realizable on chip) support this trend. Figure 4.2 shows the history and forecast of
chip complexity - and minimum feature size - over time, as seen in the early 1980s. At
that time, a minimum feature size of 0.3 microns was expected around the year 2000.
A minimum size of 0.25 microns was readily achievable by the year 1995. As a direct
result of this, the integration density has also exceeded previous expectations - the
first 64 Mbit DRAM, and the INTEL Pentium microprocessor chip containing more
than 3 million transistors were already available by 1994, pushing the envelope of
integration density.
Figure-4.3: Level of integration over time, for memory chips and logic chips.
behavioral domain,
structural domain,
Geometrical layout domain.
The design flow starts from the algorithm that describes the behavior of the
target chip. The corresponding architecture of the processor is first defined. It is
mapped onto the chip surface by floor planning. The next design evolution in the
behavioral domain defines finite state machines (FSMs) which are structurally
implemented with functional modules such as registers and arithmetic logic units
(ALUs). These modules are then geometrically placed onto the chip surface using
CAD tools for automatic module placement followed by routing, with a goal of
minimizing the interconnects area and signal delays. The third evolution starts with a
behavioral module description. Individual modules are then implemented with leaf
cells. In standard-cell based design, leaf cells are already pre-designed and stored in a
library for logic design use.
Figure 4.5 provides a more simplified view of the VLSI design flow, taking
into account the various representations, or abstractions of design - behavioral, logic,
circuit and mask layout. Note that the verification of design plays a very important
role in every step during this process. The failure to properly verify a design in its
early phases typically causes significant and expensive re-design at a later stage.
module into sub- modules and then repeating this operation on the sub-modules until
the complexity of the smaller parts becomes manageable. This approach is very
similar to the software case where large programs are split into smaller and smaller
sections until simple subroutines, with well-defined functions and interfaces, can be
written. In Section 1.2, we have seen that the design of a VLSI chip can be
represented in three domains. Correspondingly, a hierarchy structure can be described
in each domain separately. However, it is important for the simplicity of design that
the hierarchies in different domains can be mapped into each other easily. This
physical view describes the external geometry of the adder and how pin locations
allow some signals (in this case the carry signals) to be transferred from one sub-
block to the other without external routing. At lower levels of the physical hierarchy,
the internal mask.
Figure-4.7: Regular design of a 2-1 MUX, a DFF and an adder, using inverters and
tri-state buffers.
The CLB is configured such that many different logic functions can be
realized by programming its array. More sophisticated CLBs have also been
introduced to map complex functions. At this stage, the chip design is completely
described in terms of available logic cells. Next, the placement and routing step
assigns individual logic cells to FPGA sites (CLBs) and determines the routing
patterns among the cells in accordance with the net list. After routing is completed,
the on-chip
bonding pads on its left and bottom edges, diodes for I/O protection, nMOS
transistors and pMOS transistors for chip output driver circuits in the neighboring
areas of bonding pads, arrays of nMOS transistors and pMOS transistors, underpass
wire segments, and power and ground buses along with contact windows.
Figure 4.13 shows a magnified portion of the internal array with metal mask
design (metal lines highlighted in dark) to realize a complex logic function. Typical
gate array platforms allow dedicated areas, called channels, for intercell routing as
shown in Figs. 4.12 and 4.13 between rows or columns of MOS transistors. The
interconnection patterns to realize basic logic gates can be stored in a library, some
other platforms also offer dedicated memory (RAM) arrays to allow a higher density
where memory functions are required. Figure 4.14 shows the layout views of a
conventional gate array and a gate array platform with two dedicated memory banks.
With the use of multiple interconnect layers, the routing can be achieved over the
active cell areas; thus, the routing channels can be removed as in Sea-of-Gates (SOG)
chips. Here, the entire chip surface is covered with uncommitted nMOS and pMOS
transistors. As in the gate array case, neighboring transistors can be customized using
a metal mask to form basic logic gates. For intercell routing, however, some of the
uncommitted transistors must be sacrificed. This approach results in more flexibility
for interconnections, and usually in a higher density. The basic platform of a SOG
chip is shown in Fig. 4.15. Figure 4.16 offers a brief comparison between the
channeled (GA) vs. the channel less (SOG) approaches.
Figure-4.14: Layout views of a conventional GA chip and a gate array with two memory
banks.
In general, the GA chip utilization factor, as measured by the used chip area
divided by the total chip area, is higher than that of the FPGA and so is the chip
speed, since more customized design can be achieved with metal mask designs. The
current gate array chips can implement as many as hundreds of thousands of logic
gates.
Figure-4.16: Comparison between the channeled (GA) vs. the channelless (SOG)
approaches.
Figure 4.18 shows a floor plan for standard-cell based design. Inside the I/O
frame which is reserved for I/O cells, the chip area contains rows or columns of
standard cells. Between cell rows are channels for dedicated inter-cell routing. As in
the case of Sea-of-Gates, with over-the- cell routing, the channel areas can be reduced
or even removed provided that the cell rows offer sufficient routing space. The
physical design and layout of logic cells ensure that when cells are placed into rows,
their heights are matched and neighboring cells can be abutted side-by-side, which
provides natural connections for power and ground lines in each row. The signal
delay, noise margins, and power consumption of each cell should be also optimized
with proper sizing of transistors using circuit simulation.
If a number of cells must share the same input and/or output signals, a
common signal bus structure can also be incorporated into the standard-cell-based
chip layout. Figure 4.19 shows the simplified symbolic view of a case where a signal
bus has been inserted between the rows of standard cells. Note that in this case the
chip consists of two blocks, and power/ground routing must be provided from both
sides of the layout area. Standard-cell based designs may consist of several such
macro-blocks, each corresponding to a specific unit of the system architecture such as
ALU, control logic, etc.
Figure-4.19: Simplified floor plan consisting of two separate blocks and a common signal
bus.
After chip logic design is done using standard cells in the library, the most
challenging task is to place individual cells into rows and interconnect them in a way
that meets stringent design goals in circuit speed, chip area, and power consumption.
Many advanced CAD tools for place-and-route have been developed and used to
achieve such goals. Also from the chip layout, circuit models which include
interconnect parasitic can be extracted and used for timing simulation and analysis to
identify timing critical paths. For timing critical paths, proper gate sizing is often
practiced to meet the timing requirements. In many VLSI chips, such as
microprocessors and digital signal processing chips, standard-cells based design is
used to implement complex control logic modules. Some full custom chips can be
also implemented exclusively with standard cells.
Finally, Fig. 4.20 shows the detailed mask layout of a standard-cell-based chip
with an uninterrupted single block of cell rows, and three memory banks placed on
one side of the chip. Notice that within the cell block, the separations between
neighboring rows depend on the number of wires in the routing channel between the
cell rows. If a high interconnect density can be achieved in the routing channel, the
standard cell rows can be placed closer to each other, resulting in a smaller chip area.
The availability of dedicated memory blocks also reduces the area, since the
realization of memory elements using standard cells would occupy a larger area.
Note: After you convert your project, you cannot open it in previous versions of the
ISE software, such as the ISE 11 software. However, you can optionally create a
backup of the original project as part of project migration, as described below.
4.2.3 IP Modules:
To help familiarize you with the ISE® software and with FPGA and CPLD
designs, a set of example designs is provided with Project Navigator. The examples
show different design techniques and source types, such as VHDL, Verilog,
schematic, or EDIF, and include different constraints and IP.
To Open an Example
Note If you modified an example project and want to overwrite it with the
original example project, select File > Open Example, select the Sample Project
Name, and specify the same Destination Directory you originally used. In the dialog
box that appears, select Overwrite the existing project and click OK.
Note If you prefer, you can create a project using the New Project dialog
box instead of the New Project Wizard. To use the New Project dialog box, deselect
the Use New Project wizard option in the ISE General page of the Preferences
dialog box.
To Create a Project
Select File > New Project to launch the New Project Wizard.
In the Create New Project page, set the name, location, and project type, and
click Next.
For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project
page, select the input and constraint file for the project, and click Next.
In the Project Settings page, set the device and project properties, and click
Next.
In the Project Summary page, review the information, and click Finish to
create the project
Design source files are left in their existing location, and the copied project
specified directory.
Keep sources in their current locations - to leave the design source files in
their existing location.
When you select this option, the copied project opens in a state in which
processes have not yet been run. To automatically open the copy after creating it,
select
Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made. Click OK.
A project archive is a single, compressed ZIP file with a .zip extension. By default, it
contains all project files, source files, and generated files, including the following:
A ZIP file is created in the specified directory. To open the archived project,
you must first unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a
remote_sources subdirectory in the project archive.
4.3.1 Overview:
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling
of shared signal lines, where multiple sources drive a common net. When a wire has
multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths. A subset of statements in the Verilog language
is synthesizable. Verilog modules that conform to a synthesizable coding style, known
as RTL (register-transfer level), can be physically realized by synthesis software.
Synthesis software algorithmically transforms the (abstract) Verilog source into a net
list, a logically equivalent description consisting only of elementary logic primitives
(AND, OR, NOT, flip-flops, etc.) that are available in a
specific FPGA or VLSI technology. Further manipulations to the net list ultimately
lead to a circuit fabrication blueprint (such as a photo mask set for an ASIC or a bit
stream file for an FPGA).
4.3.2 History:
4.3.2 (a) Beginning
4.3.2(b) Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make
the language available for open standardization. Cadence transferred Verilog into the
public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-
1995, commonly referred to as Verilog-95. In the same time frame Cadence initiated
the creation of Verilog-A to put standards support behind its analog simulator Spectre.
Verilog-A was never intended to be a standalone language and is a subset of Verilog-
AMS which encompassed Verilog-95.
4.3.2(e) SystemVerilog
4.3.2(f) Examples
module main;
initial
begin
$display("Hello world!");
$finish;
end
end module
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
end module
reg a, b, c, d;
wire e;
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
4.3.3 Constants:
Examples:
There are several statements in Verilog that have no analog in real hardware,
e.g. $display. Consequently, much of the language cannot be used to describe
hardware. The examples presented here are the classic subset of the language that has
a direct mapping to real gates.
reg out;
always @(a or b or sel)
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always @(a or b or sel)
if (sel)
out = a;
else
out = b;
The next interesting structure is a transparent latch; it will pass the input to the
output when the gate signal is set for "pass-through", and captures the input and stores
it upon transition of the gate signal to "hold". In the example below the "pass-
through" level of the gate would be when the value of the if clause is true, i.e. gate =
1. This is read "if gate is true, the din is fed to latch out continuously." Once the if
clause is false, the last value at latch out will remain and is independent of the value
of din.
The flip-flop is the next significant template; in Verilog, the D-flop is the
simplest, and it can be modeled as:
reg q;
The significant thing to notice in the example is the use of the non-blocking
assignment. A basic rule of thumb is to use <= when there is a
posedge or negedge statement within the always clause. A variant of the D-flop is
one with an asynchronous reset.
reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set
condition; again the convention comes into play, i.e. the reset term is followed by the
set term.
reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors
can result. Consider the following test sequence of events. 1) reset goes high 2) clk
goes high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6) set
going low.
There are two separate ways of declaring a Verilog process. These are
the always and the initial keywords. The always keyword indicates a free-running
process. The initial keyword indicates a process executes exactly once. Both
constructs begin execution at simulator time 0, and both execute until the end of the
block. Once an always block has reached its end, it is rescheduled (again).
//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Any time a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an always keyword without
the @(...) sensitivity list. It is possible to use always as shown below:
always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever. The other interesting exception is the use of
the initial keyword with the addition of the forever keyword.
The order of execution isn't always guaranteed within Verilog. This can best
be illustrated by a classic example. Consider the code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
$display ("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of
execution of the initial blocks, it could be zero and zero.
4.3.7 Operators:
Note: These operators are not shown in order of precedence.
^ Bitwise XOR
~^ or ^~ Bitwise XNOR
! NOT
Logical
&& AND
|| OR
| Reduction OR
~| Reduction NOR
^ Reduction XOR
~^ or ^~ Reduction XNOR
+ Addition
- Subtraction
- 2's complement
Arithmetic
* Multiplication
/ Division
** Exponentiation (*Verilog-2001)
Conditional ?: Conditional
System tasks are available to handle simple I/O, and various design measurement
functions. All system tasks are prefixed with $ to distinguish them from user tasks and
functions. This section presents a short list of the most often used tasks. It is by no
means a comprehensive list.
SIMULATION RESULTS
CONCLUSION
We have presented high performance, automated FPGA designs of integer arithmetic
cores with scan FFs, following the principle of primitive instantiation and constrained
placement. The methodology ideally suits to circuits where the configured logic
elements are underutilized, or the nature of the circuit in itself permits certain design
specific changes such as reshuffling of inputs or priority encoding for insertion of
scan FFs with no hardware overhead. No amount of changes in the option settings for
any synthesis or optimization goal and effort for the behavioral designs can match up
to our proposed architecture, both in terms of area and speed.
REFERENCES
[1] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, “New protection techniques
against SEUs for moving average filters in a radiation environment,” IEEE Trans.
Nucl. Sci., vol. 54, no. 4, pp. 957–964, Aug. 2007.
[2] M. Hatamian et al., “Design considerations for gigabit ethernet 1000 base-T
twisted pair transceivers,” Proc. IEEE Custom Integr. Circuits Conf., pp. 335–342,
1998.
[4] H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, and G.-H. Cho, “A 10-bit column-
driver IC with parasitic-insensitive iterative charge-sharing based capacitor-string
interpolation for mobile active-matrix LCDs,” IEEE J. Solid-State Circuits, vol. 49,
no. 3, pp. 766–782, Mar. 2014.
[14] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, “Conditional-capture flip-flop for
statistical power reduction,” IEEE J. Solid-State Circuits, vol. 36, pp. 1263–1271,
Aug. 2001.
Sample code:
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Design Name:
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
// Additional Comments:
//
//////////////////////////////////////////////////////////////////////////////////
//-----------Input Ports---------------
//-----------Output Ports---------------
output mux_out;
//------------Internal Variables--------
wire mux_out;
//-------------Code Start-----------------
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
module lut6_2(td,ld,q,ext,ud,o1,o2);
input td,ld,q,ext,ud;
wire x1,mux_out1;
assign x1 = mux_out1^ud;
assign o1 = x1&(~td);
endmodule
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
module carry(o1,o2,o3,c1,c2);
input o1,o2,o3;
assign c2=o1^o3;
endmodule
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
input data;
input clk ;
output reg q;
begin
q <= data;
end
endmodule
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
input td,ld,ud,sd,clk;
input [3:0]q,ext;
output wire[3:0]dout;
lut6_2 L1(td,ld,q[3],ext[3],ud,o1,o2);
lut6_2 L2(td,ld,q[2],ext[2],ud,o3,o4);
lut6_2 L3(td,ld,q[1],ext[1],ud,o5,o6);
lut6_2 L4(td,ld,q[0],ext[0],ud,o7,o8);
carry C1(o7,o8,temp,c11,c21);
carry C2(o5,o6,c11,c12,c22);
carry C3(o3,o4,c12,c13,c23);
carry C4(o1,o2,c13,cout,c24);
//assign temp={c24,c23,c22,c21};
dff D1 (c21,clk,dout[0]);
dff D2 (c22,clk,dout[1]);
dff D3 (c23,clk,dout[2]);
dff D4 (c24,clk,dout[3]);
endmodule