Probability-Driven Multi Bit Flip-Flop Design Optimization With Clock Gating
Probability-Driven Multi Bit Flip-Flop Design Optimization With Clock Gating
Data-Driven Clock-Gating (DDCG) and Multi Bit Flip-Flops (MBFFs) in which several
FFs are grouped and share a common clock driver are two effective lowpower design techniques.
Though commonly used by VLSI designers, those are usually separately treated. Past works
focused on MBFF usage in RTL, gate-level and their layout. Though collectively coving the
common design stages, the study of each aspect individually led to conflicts and contradiction
with the others. MBFF internal circuit design, its multiplicity and its synergy to the FFs data
toggling probabilities have not been studied so far. This work attempts to maximize the energy
savings by proposing a DDCG and MBFF combined algorithm, based on Flip-Flops (FFs) data
to-clock toggling ratio. It is shown that to maximize the power savings, the FFs should be
grouped in MBFFs in increasing order of their activities. A power savings model utilizing MBFF
multiplicities and FF toggling probabilities is developed, which was then used by the algorithm
in a practical design flow. We achieved 17% to 23% power savings compared to designs with
ordinary FFs by using Xilinx ISE tool which when compared with Conventional Flip-Flop it was
around 39%.
CHAPTER 1
INTRODUCTION
A recently published paper has emphasized the usage of Multi-Bit Flip-Flops (MBFFs) as
a design technique delivering considerable power reduction of digital systems. The data of digital
systems is usually stored in Flip-Flops (FFs), each having its own internal clock driver. Shown in
Fig. 1.1, an edge-triggered 1-bit FF contains two cascaded master and slave latches, driven by
opposite clocks CLK and CLK. It is shown that most of the FF’s energy is consumed by its
internal clock drivers, which are significant contributors to the total power consumption.
The data of digital systems are usually stored in flip-flops (FFs), each of which has its
own internal clock driver. In an attempt to reduce the clock power, several FFs can be grouped
into a module called a multibit FF (MBFF) that houses the clock drivers of all the underlying
FFs. We denote the grouping of kFFs into an MBFF by a k-MBFF. Kapoor et al. Traditionally,
digital control of SMPS was accomplished by applying a general purpose Digital Signal
Processor (DSP). Attempts were made to use DSPs to carry out the digital control algorithm,
housekeeping, supervisory tasks and communication. Apart from some limited applications, this
approach is unsuitable in most industrial instances due to its many drawbacks and limitations.
These include: the single arithmetic unit that limits the speed of computation resulting in a
limited control bandwidth, excessive delays in a multi converter case, limited capabilities to
generate non-sequential pulse as might be needed in non linear control, limited capabilities to
achieve high resolution of the output driving signal and its degrading as the number of control
channels increases, as well as other shortcomings.
The state transitions of FFs in digital systems like microprocessors and controllers
depend on the data they process. Assessing the effectiveness of clock gating requires therefore
extensive simulations and statistical analysis of FFs activity, as presented in this paper. Disabling
the clock input to a group of FFs (e.g., a register) in data-path circuits is very effective since
many bits behave similarly. Registers enabled by the same clock signal yield a high ratio of the
saved power to circuit overhead. Furthermore, the design effort to create the disabling signal is
low. Unlike data-path, control logic requires far greater design effort for successful clock gating.
This stems from the “random” nature of the control logic. The effectiveness of the proposed
gating methodology is demonstrated in this paper through the examples of a 3-D graphics
accelerator and a 16-bit microcontroller. These units were designed with full awareness of the
internal data dependencies and appropriate clock enabling signals were defined within the RTL
code. When the RTL code was then compiled and simulated at gate level, considerable “hidden”
disabling opportunities have been discovered.
The clock power savings always outweigh the short-circuit power penalty of the data
toggling. An MBFF grouping should be driven by logical, structural, and FF activity
considerations. While FFs grouping at the layout level have been studied thoroughly, the front-
end implications of MBFF group size and how it affects clock gating (CG) has attracted little
attention. This brief responds to two questions. The first is what the optimal bit multiplicity k of
data-driven clock-gated (DDCG) MBFFs should be. The second is how to maximize the power
savings based on data-to-clock toggling ratio (also termed activity and data toggling probability).
An MBFF usage at the RTL logic synthesis design stage can be found in Optimization for power
is always one of the most important design objectives in modern nanometer IC design. Recent
studies have shown the effectiveness of applying multi-bit flip-flops to save the power
consumption of the clock network. However, all the previous works applied multi-bit flip-flops
at earlier design stages, which could be very difficult to carry out the trade-off among power,
timing, and other design objectives. This paper presents a novel power optimization method by
incrementally applying more multi-bit flip-flops at the post-placement stage to gain more clock
power saving while considering the placement density and timing slack constraints, and
simultaneously minimizing interconnecting wire length. Experimental results based on the
industry benchmark circuits show that our approach is very effective and efficient, which can be
seamlessly integrated in modern design flow for a 55-nm 230-MHz design of a system on a chip.
In an attempt to reduce the clock power, several FFs can be grouped in a module such
that common clock drivers are shared for all the FFs. Two 1-bit FFs grouped into 2-bit MBFF,
called also dual-bit FF, is shown in Fig. 1.1. In a similar manner, grouping of FFs in 4-bit and 8-
bit MBFFs are possible too. We subsequently denote a k -bit MBFF by k -MBFF. MBFF is not
only reducing the gate capacitance driven by a clock tree. The wiring capacitive load is also
reduced because only a single clock wire is required for multiple FFs. It also reduces the depth
and the buffer sizes of the clock tree and also the number of sub-trees. Beyond clock power
savings those features also reduce the silicon area.
Fig. 1.1. 1-bit FF and 2-MBFF.
An MBFF grouping should be driven by logical, structural, and FF activity
considerations. While FFs grouping at the layout level have been studied thoroughly, the front-
end implications of MBFF group size and how it affects clock gating (CG) has attracted little
attention. This brief responds to two questions. The first is what the optimal bit multiplicity k of
data-driven clock-gated (DDCG) MBFFs should be. The second is how to maximize the power
savings based on data-to-clock toggling ratio (also termed activity and data toggling probability).
An MBFF usage at the RTL logic synthesis design stage can be found in a 55-nm 230-MHz
design of a system on a chip Santos restricted the MBFF grouping into FFs belonging to the
same bus. Both 2-MBFFs and 4-MBFFs were used with a 20% increase in tpCQ. A dynamic
power reduction of 13% was achieved with some degradation in timing convergence. This was
remedied by applying low voltage threshold cells on critical paths, which somewhat increased
the leakage power. The total area was increased by 2.3%, because of the timing fixes.
MBFFs benefits do not come for free. By sharing common drivers, the slopes of the clock
signals become slower, causing larger short-circuit current and clock-to-Q propagation delay
(pCQ t ) degradation, for a design implemented in a 90 nanometer, low-power, high voltage
threshold (HVT) CMOS technology, the 4- MBFFs exhibit a per-bit 30% reduction of dynamic
clock power, and a per-bit 10% area reduction. That came on the expense of a per-bit 20% data
power increase and also 20% degradation of pCQ t. However, due to the fact that the average
data-to-clock toggling ratio of a FF is very small, varying from 0.01 to 0.1 in most designs , the
clock power savings always outweighs the short-circuit power penalty on the data toggling.
This work answers two questions; what should be the optimal bit multiplicity of MBFFs,
and how to leverage from the knowledge of the average data to-clock toggling ratio (called also
activity and data toggling probability) of the FFs in the underlying design. To remedy the short-
circuit power penalty and pCQ t degradation due to the increase of the loads, the MBFF internal
derivers can be somewhat strengthen. This is shown pictorially in Fig. 1.1 by the larger 2-MBFF
drivers compared to 1-bit. The MBFF multiplicity k depends on the data toggling probability p.
Section 2 studies that dependency in an attempt to optimize the MBFF design flow and
maximize the power savings. To our best knowledge, that has not been studied so far. Electronics
Design Automation (EDA) tools, such as Cadence Liberate, support MBFF characterization.
MBFF gate-level design is possible with the latest Cadence and Synopsys HDL compilers. Their
logic-level internal considerations and algorithms of FF grouping into MBFFs have not been
published. In spite of its importance, very little attention has been paid in the literature to MBFF
multiplicity and grouping at the front-end design stage. MBFF grouping should be driven by
logical, structural and FFs activity considerations.
Fig. 1.3: The dependency of the MBFF energy savings on the toggling probability.
Let p be the data-to-clock toggling probability. Denote by E1 the expected energy
consumed by 1-bit FF. We conclude from Fig. 1.3(a) that
……..(1)
where 1 is the energy of the FF’s internal clock driver, and 1 is the energy of data
toggling. For 2-MBFF there are three possible scenarios: none of the FFs toggle, a single FF
toggles, and both FFs toggle. Assuming data toggling independence, the expected energy
consumption E2 is
……..(2)
where 2 is the energy of the internal clock driver, and 2 is the per-bit data toggling
energy. For the general case of k -MBFF, let k be the energy of the MBFF’s internal clock
driver and k be the per-bit data toggling energy. Considering all the combinations of toggling
FFs, the expected energy is
…………(3)
The equality in (3) is obtained by applying some rearrangements [8].
[1] Digital Systems Power Management for High Performance Mixed Signal Platforms
High performance mixed signal (HPMS) platforms require stringent overall system and
subsystem performance. The ability to design ultra-low power systems is used in a wide range of
platforms including consumer, mobile, identification, healthcare products and microcontrollers.
In this paper we present an overview of low power design techniques, challenges and
opportunities faced in an industrial research environment. The paper presents strategies on the
deployment of low power techniques that span from power-performance optimization scenarios
accounting for active and standby operation modes to the development of multi-core
architectures suitable for low voltage operation.
[2] The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating
Gating of the clock signal in VLSI chips is nowadays a mainstream design methodology
for reducing switching power consumption. In this paper we develop a probabilistic model of the
clock gating network that allows us to quantify the expected power savings and the implied
overhead. Expressions for the power savings in a gated clock tree are presented and the optimal
gater fan-out is derived, based on flip-flops toggling probabilities and process technology
parameters. The resulting clock gating methodology achieves 10% savings of the total clock tree
switching power. The timing implications of the proposed gating scheme are discussed. The
grouping of FFs for a joint clocked gating is also discussed. The analysis and the results match
the experimental data obtained for a 3-D graphics processor and a 16-bit microcontroller, both
designed at 65-nanometer technology.
[6] Pulsed-Latch Replacement Using Concurrent Time Borrowing and Clock Gating
Flip-flops are the most common form of sequencing elements; however, they have a
significantly higher sequencing overhead than latches in terms of delay, power, and area. Hence,
pulsed latches are a promising option to reduce power for high-performance circuits. In this
paper, to save power and compensate for timing violations, we fully utilize the intrinsic time
borrowing property of pulsed latches and consider clock gating during pulsed-latch replacement.
Experimental results show that our approach can generate very power efficient results.
[8] Effective and Efficient Approach for Power Reduction by Using Multi-Bit Flip-Flops
Power has become a burning issue in modern VLSI design. In modern integrated
circuits, the power consumed by clocking gradually takes a dominant part. Given a design, we
can reduce its power consumption by replacing some flip-flops with fewer multi-bit flip-flops.
However, this procedure may affect the performance of the original circuit. Hence, the flip-flop
replacement without timing and placement capacity constraints violation becomes a quite
complex problem. To deal with the difficulty efficiently, we have proposed several techniques.
First, we perform a co-ordinate transformation to identify those flip-flops that can be merged and
their legal regions. Besides, we show how to build a combination table to enumerate possible
combinations of flip-flops provided by a library. Finally, we use a hierarchical way to merge
flip-flops. Besides power reduction, the objective of minimizing the total wirelength is also
considered. The time complexity of our algorithm is Θ(n 1.12) less than the empirical complexity
of Θ(n2). According to the experimental results, our algorithm significantly reduces clock power
by 20-30% and the running time is very short. In the largest test case, which contains 1 700 000
flip-flops, our algorithm only takes about 5 min to replace flip-flops and the power reduction can
achieve 21%.
CHAPTER 3
CLOCK GATING FLIP-FLOPS
Flip-Flops: Flip flops are actually an application of logic gates. With the help of Boolean
logic you can create memory with them. Flip flops can also be considered as the most basic idea
of a Random Access Memory [RAM]. When a certain input value is given to them, they will be
remembered and executed, if the logic gates are designed correctly. A higher application of flip
flops is helpful in designing better electronic circuits.
The most commonly used application of flip flops is in the implementation of a feedback
circuit. As a memory relies on the feedback concept, flip flops can be used to design it.
MULTI-BIT FLIP-FLOPS
Multi-Bit Flip-Flops are capable of reducing the power consumption because they have
shared inverter inside the flipflop. Clock skew is also minimized at the same at the same time.
Sngle and multi-bit flip-flop have the same clock condition. set and reset condition is also same.
the example of multi-bit flip-flops is shown in fig 1.1 . 2-bit flip-flop is formed by merging of
single one bit flip flop. It share the clock buffer based and power reduction can be achieved.
Advantages of flip-flop
ALGORITHM.
Algorithm is split into three steps as, First is to identify the merged flip-flops. In second
we can build the combinational table according to the overlapped region in the first step. We can
build the combinational table in binary tree representation for easy representation. In the third
step, based on the combinational table, merging flip flops is formed.
IDENTIFICATION OF MERGEABLE FLIPFLOP.
Based on the flip-flops used in the digital circuits, identification of flip-flops used for
merging is done. During identification each flip flops have its separate clock
COMBINATIONAL TABLE.
To perform the efficient process we build the combinational table. If we merged the flip-
flops without making combination table it will not be efficient because mergeable flip-flops is
not in intersection value. Combinational table is made on the basis of the library initialization
value. Based on library bit values we build possible combination of flip-flops. The initializations
in algorithm are library is denoted as L, the combinational table. is denoted as T, b (ni) is denote
the bit width and ni denote the one combinational in T. Minimum size is denoted as 1 bit and
minimum library size is initialized by library because we are merge the number of one bit flip-
flops Figure 3.1 shows an example of dual-bit flip-flop cell. It has two data input pins, two data
output pins, one clock pin and reset pin. Use dual-bit flipflop can get the benefits of lower power
consumption then single-bit, and almost no other additional costs to pay. Figure 3.2 shows the
truth table of dual-bit flip-flop cell. when Clock is high , the value of Q1 and Q2 will pass to
D1and D2, or Q1 and Q2 value will remain same.
DISADVANTAGES:
Power consumption is high
CHAPTER 4
PROPOSED SYSTEM
Multi-bit Flip-Flop method is to eliminate the total inverter number by sharing the
inverters in the flip-flops. Data driven clock gating reduce redundant clock pulses. Combination
of Multi-bit Flip-Flop with Data driven clock gating will increase the further power saving.
Xilinx software tool is used for implementing this proposed system. This paper studies data-
driven clock gating, employed for FFs at the gate level, which is the most aggressive possible.
The clock signal driving a FF is disabled (gated) when the FFs state is not subject to change in
the next clock cycle. Data-driven gating is causing area and power overheads that must be
considered. In an attempt to reduce the overhead, it is proposed to group several FFs to be driven
by the same clock signal, generated by bring the enabling signals of the individual FFs. This may
however, lower the disabling effectiveness. It is therefore beneficial to group FFs whose
switching activities are highly correlated and derive a joint enabling signal. In a recent paper, a
model for data-driven gating is developed based on the toggling activity of the constituent FFs.
Limited power/thermal budgets for modern system on chips (SOCs) which integrate an
increasing number of transistors, power minimization has become one of the most important
objectives in designing SOCs for various applications. High power dissipation of an SOC will
not only increase its system costs but also affect the product lifetime and reliability. To optimize
power consumption in electrical and physical design, many design methodologies have been
introduced, such as creating multi-supply-voltage (MSV) designs replacing non-timing-critical
cells with their high𝑉𝑡 counter parts An electrical and physical design power optimization
methodology and design techniques developed to create an IC with an ARM 1136JF-S
microprocessor in 90-nm standard CMOS are presented. Design technology and methodology
enhancements to enable multiple supply voltage operation, leakage current and clock rate
optimization, single-pass RTL synthesis, VDD selection, power optimization and timing and
electrical closure in a multi-VDD domain design are described. A 40% reduction in dynamic and a
46% reduction in leakage power dissipation has been achieved while maintaining a 355-MHz
operating clock rate under typical conditions. Functional and electrical design requirements were
achieved with the first silicon, Power dissipation is quickly becoming one of the most important
limiters in nanometer IC design for leakage increases exponentially as the technology scaling
down. However, power and timing are often conflicting objectives during optimization. In this
paper, we propose a novel total power optimization flow under performance constraint. Instead
of using placement, gate sizing, and multiple-Vt assignment techniques independently, we
combine them together through the concept of slack distribution management to maximize the
potential for power reduction. We propose to use the linear programming (LP) based placement
and the geometric programming (GP) based gate sizing formulations to improve the slack
distribution, which helps to maximize the total power reduction during the Vt-assignment stage.
Our formulations include important practical design constraints, such as slew, noise and short
circuit power, which were often ignored previously. We tested our algorithm on a set of
industrial-strength manually optimized circuits from a multi-GHz 65nm microprocessor, and
obtained very promising results. To our best knowledge, this is the first work that combines
placement, gate sizing and Vt swapping systematically for total power (and in particular leakage)
management. minimizing clock networks Workload placement on servers has been traditionally
driven by mainly performance objectives. In this work, we investigate the design,
implementation, and evaluation of a power-aware application placement controller in the context
of an environment with heterogeneous virtualized server clusters.
The placement component of the application management middleware takes into account
the power and migration costs in addition to the performance benefit while placing the
application containers on the physical servers. The contribution of this work is two-fold: first, we
present multiple ways to capture the cost-aware application placement problem that may be
applied to various settings. For each formulation, we provide details on the kind of information
required to solve the problems, the model assumptions, and the practicality of the assumptions on
real servers. In the second part of our study, we present the pMapper architecture and placement
algorithms to solve one practical formulation of the problem: minimizing power subject to a
fixed performance requirement. We present an automatic register placement technique that
enables the synthesis of low-power clock trees for low-power ICs. On 7 industrial designs,
comparing to (1) a commercial base flow and (2) the power-aware placement technique in the
technique respectively reduced clock-tree power by 19.0% and 14.9%, total power by 15.3% and
5.2% and WNS under on-chip variation (±10%) by 1.8% and 1.5% on average.
The progress of VLSI technology is facing two limiting factors: power and
variation. Minimizing clock network size can lead to reduced power consumption, less power
supply noise, less number of clock buffers and therefore less vulnerability to variations. Previous
works on clock network minimization are mostly focused on clock routing and the improvements
are often limited by the input register placement. In this work, we propose to navigate registers in
cell placement for further clock network size reduction. To solve the conflict between clock
network minimization and traditional placement goals, we suggest the following techniques in a
quadratic placement framework: (1) Manhattan ring based register guidance; (2) center of gravity
constraints for registers; (3) pseudo pin and net; (4) register cluster contraction. These techniques
work for both zero skew and prescribed skew designs in both wire length driven and timing
driven placement. Experimental results show that our method can reduce clock net wire length
by 16% -33% with no more than 0.5% increase on signal net wire length compared with
conventional approaches and applying multi-bit registers.
We present an automatic register placement technique that enables the synthesis of low-
power clock trees for low-power ICs . Merging 1-bit flip-flops into multi-bit flip-flops in the
post-placement stage is one of the most effective techniques for minimizing clock power. The
obstacles that hinder the merging process for multi-bit flip-flops are (1) the input and output
timing constraint on every flip-flop, (2) the area constraint on every partitioned bin in the
placement plane. Among these methodologies, applying multi-bit flip-flops, or multibit registers
[6], or register banks [4], is one of the most effective methodologies in saving both chip area and
power consumption.
The optimal fan-out of a clock gater yielding maximal power savings is derived based on
the average toggling statistics of the individual FFs, process technology, and cell library in use.
In general, the state transitions of FFs in digital systems depend on the data they process.
Assessing the effectiveness of data-driven clock gating requires, therefore, extensive simulations
and statistical analysis of the FFs’ activity. Another grouping of FFs for clock switching power
reduction, called multi-bit FF (MBFF).
MBFF attempts to physically merge FFs into a single cell such that the inverters driving
the clock pulse into its master and slave latches are shared among all FFs in a group. MBFF
grouping is mainly driven by the physical position proximity of individual FFs, while grouping
for data-driven clock gating should combine toggling similarity with physical position
considerations. The group size that maximizes power savings, this paper studies the questions of:
1) which FFs should be placed in a group to maximize the power reduction and 2) how to
algorithmically derive those groups. We also describe a backend design flow implementation.
…………………(1)
where CFF and Clatch are the clock input loads of a FF and a latch, respectively. The solution of (4)
for various activities is shown in Table 2 for typical CFF and Clatch .
………(5)
Given FFi , FFj , FFk and FFl , their pairing in two 2-MBFFs yields the energy waste
While the term (a) of (9) is independent of the pairing, the term (b) does depend. The expression i j k l
, , W W is minimized when (b) is maximized. If , i j k l p p p
The generalization for pairing of n FFs is straight forward. Let n be even and 2 , 1 : FF n s t i i i P
be a pairing of FF ,FF , ,FF 1 2 n in n 2 2-MBFFs. The following energy waste W P results in
……..(10)
Since 1 n j j p is independent of the pairing, W P is minimized when 2 1 n s t i i i p p
is maximized. The optimal pairing minimizing W P is defined by the following theorem [8].
Theorem 1. Let n be even and let FF ,FF , ,FF 1 2 n be ordered such that their toggling
probabilities satisfy 1 2 n p p p . The pairing 2 2 1,2 1 : FF n i i i P of successive
FFs is minimizing W P given in (10). The above result of grouping in 2-MBFFs is generalized
for grouping in k -MBFFs as follows. 11
Theorem 2. Let n be divisible by k , and let FF ,FF , ,FF 1 2 n be ordered such that their toggling
Where cFF is the FFs clock input capacitance, cW is the unit-size wire capacitance, and
clatch is the latch capacitance including the wire capacitance of its clk input. Table I shows how
the optimal k depends on p. Such a gating scheme has considerable timing implications, which
are discussed in [9]. We will return to those when discussing the implementation of data-driven
gating as a part of a complete design flow.
4.2 Implementation and Integration in a Design Flow.
In the following, we describe the implementation of data-driven clock gating as a part of
a standard backend design flow. It consists of the following steps.
1) Estimating the FFs toggling probabilities involves running an extensive test bench
representing typical operation modes of the system to determine the size k of a gated FF group
by solving (1).
2) Running the placement tool in hand to get preliminary preferred locations of FFs in the
layout.
3) Employing a FFs grouping tool to implement the model and algorithms presented in
Sections III and IV, using the toggling correlation data obtained in Step 1 and FF locations’ data
obtained in Step 2. The outcome of this step is k-size FF sets (with manual overrides if required),
where the FFs in each set will be jointly clocked by a common gater.
4) Introducing the data-driven clock gating logic into the hardware description (we use
Verilog HDL). This is done automatically by a software tool, adding appropriate Verilog code to
implement the logic described in Fig. 2. The FFs are connected according to the grouping
obtained in Step 3. A delicate practical question is whether to introduce the gating logic into
RTL or gate-level description. This depends on design methodology in use and its discussion is
beyond the scope of this paper. We have introduced the gating logic into the RTL description.
5) Re-running the test bench of Step 1 to verify the full identity of FFs’ outputs before
and after the introduction of gating logic. Although data driven gating, by its very definition,
should not change the logic of signals, and hence FFs toggling should stay identical, a robust
design flow must implement this step.
6) Ordinary backend flow completion. From this point, the backend design flow proceeds
by applying ordinary place and route tools.
CHAPTER 5
HARDWARE REQUIREMENTS
GENERAL
Integrated circuit (IC) technology is the enabling technology for a whole host of
innovative devices and systems that have changed the way we live. Jack Kil by and Robert
Noyce received the 2000 Nobel Prize in Physics for their invention of the integrated circuit;
without the integrated circuit, neither transistors nor computers would be as important as they are
today. VLSI systems are much smaller and consume less power than the discrete components
used to build electronic systems before the 1960s.
Integration allows us to build systems with many more transistors, allowing much more
computing power to be applied to solving a problem. Integrated circuits are also much easier to
design and manufacture and are more reliable than discrete systems; that makes it possible to
develop special-purpose systems that are more efficient than general-purpose computers for the
task at hand.
5.2ADVANTAGES OF VLSI:
While we will concentrate on integrated circuits in this book, the properties of integrated
circuits what we can and cannot efficiently put in an integrated circuit—largely determine the
architecture of the entire system. Integrated circuits improve system characteristics in several
critical ways. ICs have three key advantages over digital circuits built from discrete components:
Size: Integrated circuits are much smaller—both transistors and wires are shrunk to micrometer
sizes, compared to the millimeter or centimeter scales of discrete components. Small size leads to
advantages in speed and power consumption, since smaller components have smaller parasitic
resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip than
they can between chips. Communication within a chip can occur hundreds of times faster than
communication between chips on a printed circuit board.
The high speed of circuits on-chip is due to their small size—smaller components and wires have
smaller parasitic capacitances to slow down the signal.
Power consumption: Logic operations within a chip also take much less power. Once again,
lower power consumption is largely due to the small size of circuits on the chip smaller parasitic
capacitances and resistances require less power to drive them.
5.3 VLSI AND SYSTEMS
These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size: Smallness is often an advantage in itself—consider portable televisions
or handheld cellular telephones.
Lower power consumption: Replacing a handful of standard parts with a single chip reduces
total power consumption. Reducing power consumption has a ripple effect on the rest of the
system: a smaller, cheaper power supply can be used; since less power consumption means less
heat, a fan may no longer be necessary; a simpler cabinet with less shielding for electromagnetic
shielding may be feasible, too.
Reduced cost: Reducing the number of components, the power supply requirements, cabinet
costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that
the cost of a system built from custom ICs can be less, even though the individual ICs cost more
than the standard parts they replace. Understanding why integrated circuit technology has such
profound influence on the design of digital systems requires understanding both the technology
of IC manufacturing and the economics of ICs and digital systems.
CHAPTER 6
TOOLS
6.1 Introduction:
The main tools required for this project can be classified into two broad categories.
Hardware requirement
Software requirement
6.2 Hardware Requirements:
FPGA KIT
In the hardware part a normal computer where Xilinx ISE 13.2 software can be easily operated is
required, i.e., with a minimum system configuration Pentium III, 1 GB RAM, 20 GB Hard Disk.
6.3 Software Requirements:
XILINX 13.2
It requires Xilinx ISE 13.2 version of software where Verilog source code can be used for
design implementation.
6.4 Introduction To XILINX ISE:
This instrument can be utilized to make, execute, reenact, and integrate Verilog outlines
for usage on FPGA chips.
ISE: Integrated Software Environment
Environment for the improvement and trial of computerized systems configuration
focused to FPGA or CPLD
Integrated gathering of apparatuses available through a GUI
Based on an intelligent combination motor (XST: Xilinx Synthesis Technology)
XST underpins diverse dialects:
Verilog
VHDL
XST create a net rundown incorporated with requirements
Supports every one of the means required to finish the plan:
Translate, guide, place and course
Bit stream era
For this situation, it is conceivable to utilize Verilog to compose a test seat to confirm the
usefulness of the outline utilizing documents on the host PC to characterize jolts, to interface
with the client, and to contrast comes about and those normal.
A Verilog show is converted into the "doors and wires" that are mapped onto a
programmable rationale gadget, for example, a CPLD or FPGA, and after that it is the real
equipment being designed, instead of the Verilog code being "executed" as though on some type
of a processor chip.
6.4.1 Implementation:
– Synthesis (XST)
-Produce a netlist file starting from an HDL description
Translate (NGDBuild)
– Converts all input design netlists and then writes the results into a single merged
file, that describes logic and constraints.
Mapping (MAP)
– Maps the logic on device components.
– Takes a netlist and groups the logical elements into CLBs and IOBs (components of
FPGA).
Place And Route (PAR)
– Place FPGA cells and connects cells.
Bit stream generation
XILINX Design Process
Step 1: Design entry
– HDL (Verilog or VHDL, ABEL x CPLD), Schematic Drawings, Bubble
Diagram
Step 2: Synthesis
– Translates .v, .vhd, .sch files into a netilist file (.ngc)
Step 3: Implementation
– FPGA: Translate/Map/Place & Route, CPLD: Fitter
Step 4: Configuration/Programming
– Download a BIT file into the FPGA
– Program JEDEC file into CPLD
– Program MCS file into Flash PROM
Simulation can occur after steps 1, 2, 3
The tools used in this thesis are XILINX ISE 13.2 for simulation and Synthesis. The
programs are written in verilog language.
For each of the properties given below, click on the ‘value’ area and select from the list of values
that appear.
Device Family: Family of the FPGA/CPLD used. In this thesis we will be using the
Spartan3E FPGA’s.
Device: The number of the actual device. For this lab you may enter XC3S100E (this can be
found on the attached prototyping board)
Package: The type of package with the number of pins. The Spartan FPGA used in this lab is
packaged in VQ100 package.
Speed Grade: The Speed grade is “-5”.
Synthesis Tool: XST [VHDL/Verilog]
Simulator: The tool used to simulate and verify the functionality of the design. Modelsim
simulator is integrated in the Xilinx ISE. Hence choose “Modelsim-XE Verilog” as the simulator
or even Xilinx ISE Simulator can be used.
Then click on NEXT to save the entries.
In the Port Name column, enter the names of all input and output pins and specify the Direction
accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the MSB/LSB
columns. Then click on Next>to get a window showing all the new source information.
click on Finish to continue.
The source file will now be displayed in the Project Navigator window.
The source file window can be used as a text editor to make any necessary changes to the source
file. All the input/output pins will be displayed. Save your Verilog program periodically by
selecting the File->Save from the menu. You can also edit Verilog programs in any text editor
and add them to the project directory using “Add Copy Source”.
Here we can give the inputs. Right click on the selected input click on force constant and
enter the input value click on Ok.
Click on Run option in the tool bar to check input and output waveforms.
Open the Synthesis Report in the Detailed Reports to see the Device utilization Summary and
Timing Report of the current project.
6.5.5. View RTL Schematic:
Expand Synthesize-XST and click on view RTL Schematic and click ok.
The window with Top module is opened to view the internal modules click on the top module.
6.6 FPGA DESIGN FLOW:
In this part of tutorial we are going to have a short intro on FPGA design flow. A simplified
version of design flow is given in the flowing diagram.
It is the better choice for the designers who think the design as a series of states. But the
tools for state machine entry are limited. In this documentation we are going to deal with the
HDL based design entry.
6.8 Synthesis:
The process which translates VHDL or Verilog code into a device netlist format. i.e. a
complete circuit with logical elements( gates, flip flops, etc…) for the design. If the design
contains more than one sub designs, ex. to implement a processor, we need a CPU as one design
element and RAM as another and so on, then the synthesis process generates netlist for each
design element Synthesis process will check code syntax and analyze the hierarchy of the design
which ensures that the design is optimized for the design architecture, the designer has selected.
The resulting netlist(s) is saved to an NGC( Native Generic Circuit) file (for Xilinx® Synthesis
Technology (XST)).
6.9.2 Map:
Process divides the whole circuit with logical elements into sub blocks such that they can
be fit into the FPGA logic blocks. That means map process fits the logic defined by the NGD file
into the targeted FPGA elements (Combinational Logic Blocks (CLB), Input Output Blocks
(IOB)) and generates an NCD (Native Circuit Description) file which physically represents the
design mapped to the components of FPGA.
MAP program is used for this purpose.
RTL Schematic.
Technology Schematic.
Design Summary.
CHAPTER 8
CONCLUSION
Clock gating is used in fifo to reduce the power consumption. For further power saving
data driven clock gating and multi-bit flip-flops are used in sequential circuits. Common clock
gating is used for power saving. But clock gating still leaves larger amount of redundant clock
pulses. Multi-bit flip-flop is also used to reduce power consumption. Using of Multi-bit Flip-
Flop method is to eliminate the total inverter number by sharing the inverters in the flip-flops.
Combination of Multi-bit Flip-Flop with Data driven clock gating will increase the further power
saving. Xilinx software tool is used for implementing this proposed system. The combination of
data-driven gating with MBFF in an attempt to yield further power savings.
REFERENCES
1. Kapoor, Ajay, Cas Groot, Gerard Villar Pique, Hamed Fatemi, Juan Echeverri, Leo Sevat,
Maarten Vertregt et al. “Digital systems power management for high performance mixed signal
platforms.” Circuits and Systems I: Regular Papers, IEEE Transactions on 61, no. 4 (2014): 961-
975.
2. Wimer, Shmuel, and Israel Koren. “The optimal fan-out of clock network for power
minimization by adaptive gating.” Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on 20, no. 10 (2012): 1772-1780.
3. Santos, Cristiano, Ricardo Reis, Guilherme Godoi, Marcos Barros, and Fabio Duarte. “Multi-
bit flip-flop usage impact on physical synthesis.” In Integrated Circuits and Systems Design
(SBCCI), 2012 25th Symposium on, pp. 1-6. IEEE, 2012.
4. Yan, Jin-Tai, and Zhi-Wei Chen. “Construction of constrained multi-bit flip-flops for clock
power reduction.” In Green Circuits and Systems (ICGCS), 2010 International Conference on,
pp. 675-678. IEEE, 2010. 15
5. Jiang, IH-R., Chih-Long Chang, and Yu-Ming Yang. “INTEGRA: Fast multibit flip-flop
clustering for clock power saving.” Computer-Aided Design of Integrated Circuits and Systems,
IEEE Transactions on 31, no. 2 (2012): 192-204.
6. Chang, Chih-Long, and Iris Hui-Ru Jiang. “Pulsed-latch replacement using concurrent time
borrowing and clock gating.” IEEE Transactions on ComputerAided Design of Integrated
Circuits and Systems 32, no. 2 (2013): 242-246.
7. Lo, Shih-Chuan, Chih-Cheng Hsu, and Mark Po-Hung Lin. "Power optimization for clock
network with clock gate cloning and flip-flop merging." In Proceedings of the 2014 on
International symposium on physical design, pp. 77-84. ACM, 2014.
8. Wimer, Shmuel, Doron Gluzer and Uri Wimer. “Using well-solvable minimum cost exact
covering for VLSI clock energy minimization.” Operations Research Letters 42, no. 5 (2014):
332-336.
9. Wimer, Shmuel, and Israel Koren. “Design flow for flip-flop grouping in datadriven clock
gating.” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 22, no. 4 (2014):
771-778.
10. Wimer, Shmuel. “On optimal flip-flop grouping for VLSI power minimization.” Operations
Research Letters 41, no. 5 (2013): 486-489.
11.SpyGlass Power [Online]. Available: Using many advanced algorithms and analysis
techniques, the SpyGlass platform provides designers with insight about their design, early in the
process at RTL. It functions like an interactive guidance system for design engineers and
managers, finding the fastest and least expensive path to implementation for complex SoCs .
http://www.atrenta.com/solutions/spyglassfamily/spyglass-power.html