0% found this document useful (0 votes)
31 views23 pages

VLSI Assignment 002110701100

Uploaded by

atp20.12.2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views23 pages

VLSI Assignment 002110701100

Uploaded by

atp20.12.2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

JADAVPUR UNIVERSITY

DEPARTMENT OF ELECTRONICS AND


TELECOMMUNICATION ENGINEERING

VLSI ASSIGNMENT

4th YEAR 1st SEMESTER

Submitted By-
NAME: ADWAY PAUL

ROLL NO: 002110701100

Submitted to: Prof. Dr. Subir Kumar Sarkar


1. Explain how parallelism can be used to achieve low power instead of high
performance in realizing digital circuits. Explain how multicore architecture
provides low power compared to the single core architecture of the same
performance.

Ans: In digital circuits, parallelism is often associated with enhancing performance by


speeding up operations, but it can also be leveraged to reduce power consumption.
This is based on the principle that performing tasks more slowly and at lower voltage
while using more parallel resources can result in significant power savings. Here's
how it works:

1. Parallelism to Reduce Clock Frequency

● By duplicating hardware (such as functional units or data paths), a system can


execute more tasks simultaneously. This reduces the clock frequency for the
same throughput.
● Since dynamic power is proportional to the square of the supply voltage (
2
P❑dynamic ∝ V ❑dd ⋅f ⋅C where f is the clock frequency and C is the capacitance),
lowering the clock frequency allows for a corresponding reduction in the
supply voltage, significantly reducing power consumption.

2. Voltage Scaling

● When parallelism is used to process multiple operations concurrently, each unit


can be operated at a lower clock frequency and voltage. The reduction of
voltage has a quadratic effect on power savings, which is one of the biggest
advantages of voltage scaling. Even though more hardware is active,
significantly less power is consumed by each unit.
● This is referred to as Dynamic Voltage and Frequency Scaling (DVFS),
where parallelism enables reducing both frequency and voltage, leading to
large power reductions.

3. Energy Efficiency and Leakage Power

● In high-performance systems, operating at a high clock frequency leads to


increased leakage power, which is a dominant factor in modern deep-
submicron technologies. By spreading the workload across multiple parallel
units operating at lower speeds, you can reduce the amount of leakage power,
as fewer transistors are highly stressed by fast switching.
● Lower frequencies also reduce switching activity, further contributing to
reduced dynamic power dissipation.

4. Parallelism in Low-Power Architectures


● In signal processing and communications, algorithms like filtering, FFTs, and
matrix operations can be created with parallel structures that carry out
different parts of the operation simultaneously but at lower speeds. These
architectures, although larger, consume less total power because of reduced
clock frequency and voltage.

● For example, instead of a single fast multiplier in a DSP (Digital Signal
Processor), several smaller multipliers operating in parallel at lower speeds can
complete the same task, but with reduced power usage.

5. Energy-Delay Product (EDP) Optimization

● Instead of focusing on minimizing delay, low-power systems aim to optimize


the Energy-Delay Product (EDP). Increasing parallelism can lead to a modest
decrease in delay, but a significant reduction in energy consumption, resulting
in a lower overall EDP, which is more desirable for low-power systems.

6. Fine-Grained vs. Coarse-Grained Parallelism

● Fine-grained parallelism enables multiple instructions or operations to be


processed simultaneously at each clock cycle (e.g., SIMD or vector
processors). These operations occur at lower clock frequencies, resulting in a
more power-efficient system.
● Coarse-grained parallelism, such as having multiple cores in a multi-core
processor, can reduce power consumption by distributing tasks across cores
running at lower frequencies, thus decreasing the energy required per
instruction.

7. Clock Gating and Power Gating with Parallelism

● When using parallel units, it is possible to implement clock gating, where the
clock signals to unused parts of the circuit are disabled, or power gating,
where power to certain sections is turned off. By employing multiple parallel
units, specific parts of the circuit can be selectively powered down or slowed
based on workload, resulting in substantial power savings.

In summary, parallelism enables digital circuits to function at lower clock


frequencies and voltages, significantly decreasing power consumption without
compromising overall throughput. This balance between area (more resources) and
power (lower frequency and voltage) is crucial in designing energy-efficient systems,
particularly for battery-powered or mobile applications.

Multicore architectures can achieve lower power consumption compared to


single-core architectures while maintaining the same performance level. This power
efficiency stems from the ability to distribute the workload across multiple cores
running at lower clock frequencies and voltages, instead of relying on a single high-
performance core. Here's how multicore architectures provide these power
advantages:

1. Lower Clock Frequency per Core


● In a single-core architecture, achieving high performance often requires
running the core at a high clock frequency, which increases power
consumption. Since dynamic power is proportional to clock frequency (
2
P❑dynamic ∝ V ❑dd ⋅f ⋅C where f is the clock frequency and C is the capacitance),
increasing frequency results in significantly higher power.
● Multicore architectures distribute tasks across several cores, allowing each core
to operate at a lower clock frequency. The combined performance across all
cores matches that of the single-core architecture at a higher clock frequency.
Since power scales linearly with frequency, reducing the frequency per core
reduces total power consumption.

2. Voltage Scaling

● A critical advantage of using multiple cores is that you can reduce both clock
frequency and supply voltage. The power consumed by a core is
proportional to the square of the supply voltage, so reducing
the voltage leads to substantial power savings (P∝V dd2). Multicore
systems allow each core to operate at lower voltage levels while still delivering
the same performance, reducing power consumption more effectively than a
single-core system running at a higher voltage.

3. Parallelism without Increased Power

● Multicore processors exploit parallelism to handle multiple tasks


simultaneously (e.g., in data processing, multi-threaded applications). By
distributing tasks across multiple cores, each core works less intensively, which
leads to lower overall power consumption than pushing a single core to work
harder and faster.
● For the same total performance, a multicore system can achieve the desired
result without the power cost of operating a single core at high speeds.

4. Reduced Heat Dissipation and Cooling Power

● High-performance single-core architectures generate more heat due to higher


power consumption, requiring more aggressive cooling systems (which also
consume power). In contrast, multicore systems spread the workload across
several cooler-running cores, reducing the need for high-power cooling
solutions, thereby saving energy in the cooling system as well.

5. Energy Efficiency through Task Specialization

● Multicore architectures often include heterogeneous cores, where different


cores are optimized for specific tasks (e.g., some cores for high-performance,
others for low-power tasks). This allows the system to allocate tasks efficiently,
using low-power cores for lightweight tasks and high-performance cores only
when necessary.
● This division leads to overall lower power consumption, as energy-hungry
high-performance cores are not always active, and less power-hungry cores
handle simpler operations.

6. Dynamic Voltage and Frequency Scaling (DVFS)

● In multicore systems, each core can individually implement Dynamic Voltage


and Frequency Scaling (DVFS). This means that when certain cores are
underutilized, their voltage and frequency can be lowered, leading to
significant power savings. In a single-core system, this flexibility is limited, as
the entire core must be scaled together.
● With multicore processors, tasks can be dynamically reassigned to different
cores based on workload, with idle cores powered down or set to low-power
states.

7. Reduced Switching Activity

● Multicore architectures can reduce switching activity, which is a key factor in


dynamic power consumption. In a single-core system, the high clock frequency
causes frequent switching in transistors. With multicore designs, each core
switches less frequently, which leads to lower dynamic power consumption.
● Additionally, tasks can be parallelized across cores, reducing the need for
intense, high-speed computation in a single core, further reducing switching-
related power.

8. Power Gating and Clock Gating

● Multicore processors enable power gating and clock gating more effectively.
In power gating, unused cores can be completely powered down to save
energy. In clock gating, the clock signal to inactive cores can be turned off,
reducing unnecessary power consumption.
● Single-core processors cannot shut down parts of the core in the same way,
meaning they are less efficient at managing power dynamically.

Example: Performance-Energy Trade-off

● Suppose a single-core processor has to run at a clock frequency of 2 GHz to


meet performance requirements, consuming 1.2V of power. A multicore
architecture with 4 cores might be able to run each core at 1 GHz at 0.9V.
While the multicore system has more hardware, each core operates at a lower
frequency and voltage, reducing total power.
● The multicore system achieves the same throughput while consuming less
energy due to lower frequency and voltage, despite the overhead of additional
cores.

By distributing the workload across multiple cores, multicore architectures can


reduce the clock frequency, enable lower voltage operation, and better manage power
through techniques like DVFS, power gating, and clock gating. These factors
contribute to significantly lower power consumption compared to single-core
architectures of the same performance, making multicore designs ideal for energy-
efficient computing.

2. List various sources of leakage currents. Why leakage power is an important


issue in deep submicron technology? What is band-to-band tunneling current?
What is body effect? What is subthreshold leakage current? Briefly discuss
various mechanisms responsible for this leakage current?

Ans: Leakage currents in digital circuits, particularly in modern deep-submicron and


nanometer-scale technologies, are a significant source of power consumption. Even
when a transistor is not actively switching, leakage currents can still flow through the
device, contributing to overall power dissipation. Here are the main sources of leakage
currents:

1. Subthreshold Leakage: Current flowing when the transistor is off, due to


weak inversion.
2. Gate Leakage: Quantum tunneling through the thin gate oxide.
3. Junction Leakage: Reverse-biased current in the drain-source PN junctions.
4. Gate-Induced Drain Leakage (GIDL): Tunneling at the gate-drain overlap
due to high electric fields.
5. Drain-Induced Barrier Lowering (DIBL): Reduced source-drain barrier in
short-channel devices, causing leakage.
6. Punch-Through Current: Source-to-drain current due to overlapping
depletion regions in short-channel devices.
7. Band-to-Band Tunneling: Tunneling between the valence and conduction
bands in highly doped regions.
8. Gate-Induced Junction Leakage: Junction leakage enhanced by high gate
voltage.
9. Hot Carrier Injection (HCI): High-energy carriers causing leakage and
reliability issues.

Leakage power is a critical issue in deep submicron technology (typically


below 100 nm) due to the following reasons:

1. Increased Leakage Current: As transistors shrink, short-channel effects like


subthreshold leakage and gate leakage become more pronounced. Thin gate
oxides and lower threshold voltages lead to more leakage, even when the
transistors are "off."
2. Higher Static Power Consumption: Leakage power, which is present even
when the circuit is idle, becomes a significant portion of total power
consumption in deep submicron designs. As active power reduces due to lower
supply voltages and clock gating techniques, leakage power becomes
dominant, affecting battery life and energy efficiency.
3. Scaling Challenges: Technology scaling reduces the physical dimensions of
transistors, increasing the electric field across smaller devices. This increases
leakage currents like gate oxide tunneling and junction leakage, making it
harder to maintain low-power designs.
4. Thermal and Reliability Concerns: Excessive leakage power leads to higher
heat generation, impacting the thermal management of chips. Overheating
reduces the reliability and lifetime of transistors, exacerbating aging effects
like hot carrier injection and electromigration.
5. Power-Efficient Designs: In mobile and battery-powered devices, leakage
power is crucial because it drains power continuously, even in standby modes.
Minimizing leakage is essential for achieving power efficiency and extending
battery life.

In summary, leakage power is a growing challenge in deep submicron


technology due to increased leakage currents, higher static power consumption,
scaling difficulties, and its impact on device reliability and efficiency.

Band-to-band tunneling (BTBT) current occurs when electrons tunnel


directly from the valence band of a semiconductor material into the conduction band
of an adjacent region. This typically happens across a reverse-biased PN junction. It
doesn't require thermal excitation because the high electric field across the junction
lowers the potential barrier between the two energy bands, allowing for quantum
mechanical tunneling.

Fig: Band to Band Tunneling

Key Characteristics:

1. Quantum Tunneling Effect: In BTBT, electrons move through the energy


barrier between the valence and conduction bands instead of overcoming it
thermally, as in normal carrier movement.
2. Occurs in Reverse-Biased Junctions: It primarily happens at the drain
junction of transistors when the junction is heavily reverse-biased.
3. Enhanced by High Doping Levels: BTBT is more prominent in devices with
heavily doped regions, as higher doping increases the electric field at the
junction.
4. Impact in Advanced Technology Nodes: As transistor sizes shrink in deep
submicron technologies, the depletion regions become thinner, and BTBT
becomes a more significant source of leakage current.

The body effect (also known as the back-gate effect) refers to the influence of
the substrate (or body) voltage on the threshold voltage (Vth) of a MOSFET (Metal-
Oxide-Semiconductor Field-Effect Transistor). This effect arises in MOSFETs where
the substrate is not at the same potential as the source terminal.

Key Characteristics:

1. Threshold Voltage Shift: The body effect causes the threshold voltage to
increase when the body (substrate) voltage is raised. This can be expressed by
the equation:
V ❑th =V ❑th0 + γ ( √ ❑ )
where:
○ Vth0 is the threshold voltage when the source-to-body voltage (Vsb) is
zero.
○ γ is the body effect coefficient.
○ Vsb is the source-to-body voltage.
2. Substrate Bias: When the substrate is connected to a different voltage level
than the source, it creates a source-to-body voltage (Vsb). A positive Vsb (for
NMOS) increases the threshold voltage, while a negative Vsb (for PMOS)
decreases it.
3. Impact on Device Performance: The body effect can influence:
○ Switching Speed: Higher threshold voltages can slow down the
switching characteristics of the transistor.
○ Power Consumption: Increased threshold voltages can reduce leakage
currents but may also lead to increased dynamic power in some
scenarios.
4. Use in Circuit Design: Designers can use the body effect intentionally to
adjust the threshold voltage for specific circuit requirements, such as in analog
circuits or biasing techniques.

Subthreshold leakage current is the small amount of current that flows


through a MOSFET when it is in the "off" state, meaning that the gate-to-source
voltage (Vgs) is below the threshold voltage (Vth). Despite the transistor being off, a
weak inversion layer forms near the channel, allowing some current to flow from the
source to the drain. This current becomes significant in deep submicron technologies,
contributing to overall power consumption even when the circuit is not actively
switching.

Mechanisms Responsible for Subthreshold Leakage Current

1. Weak Inversion: When Vgs is below Vth, the MOSFET enters weak inversion,
where thermal energy allows a small number of carriers to accumulate in the
channel. This leads to a gradual increase in current as the voltage approaches
Vth.
2. Thermal Generation: Thermal energy in the semiconductor can generate
electron-hole pairs. At elevated temperatures, the probability of these pairs
contributing to subthreshold conduction increases, leading to higher leakage
currents.
3. Short-Channel Effects: In short-channel devices, the control of the gate over
the channel decreases, making it easier for carriers to flow from source to drain
even when the transistor is off. This effect exacerbates subthreshold leakage as
the channel length shrinks.
4. Gate Oxide Tunneling: Thin gate oxides in modern MOSFETs can allow
electrons to tunnel through the oxide layer when the gate voltage is applied,
contributing to leakage current. This is particularly significant in advanced
technology nodes.
5. Drain-Induced Barrier Lowering (DIBL): In short-channel MOSFETs,
increasing the drain voltage can reduce the potential barrier between the source
and drain, facilitating subthreshold conduction even when the gate voltage is
below the threshold.
6. Body Effect: Changes in the substrate (or body) voltage can impact the
threshold voltage. A higher substrate voltage can increase the threshold
voltage, leading to increased subthreshold leakage when the transistor is in
weak inversion.

3. Sketch the schematic diagram of a SRAM memory cell along with sense
amplifier and data write circuitry. Explain how read and write operations are
performed in a SRAM. In what way the DRAMs differ from SRAMs? Explain
the read and write operations for a one-transistor DRAM cell. Distinguish
between Mealy and Moore machines.

Ans:
Fig: 6 transistor SRAM cell using CMOS and NMOS model

The operation of individual models is explained as under:

1. NMOS 6-Transistor (6T) SRAM Cell

This configuration uses six transistors (T1 to T6) to store a single bit of data.

● Transistors T1, T2 (Access Transistors): These are NMOS transistors that


connect the storage node (A and B) to the bit lines (0-bit input and 1-bit input)
when activated by the word line (WL). When WL is high, the access transistors
turn on, allowing data to be written to or read from the cell.
● Transistors T3 and T4 (Latch Transistors): These NMOS transistors form a
latch, creating a feedback loop between nodes A and B. When one node is
charged, it helps to keep the other node discharged, ensuring that the cell
maintains its stored value.
● Operation:
○ Write Operation: To write data into the cell, the corresponding bit line
is set high or low (1 or 0). The word line (WL) is activated, turning on
T1 and T2, which allows the bit lines to control the state of nodes A and
B. The feedback from T3 and T4 ensures that the written data is
maintained.
○ Read Operation: To read data, the WL is activated, allowing the stored
value at A or B to affect the bit lines. The state of the cell is detected
based on whether one of the bit lines goes high or remains low.
2. CMOS 6-Transistor SRAM Cell

This configuration uses complementary MOS (CMOS) technology, incorporating both


PMOS and NMOS transistors.

● Transistors T3, T4 (PMOS Latch Transistors): These PMOS transistors


form a latch similar to T3 and T4 in the NMOS configuration but provide
improved stability and lower leakage current.
● Transistors T5, T6 (Access Transistors): These NMOS transistors act
similarly to the NMOS RAM cell, connecting the storage nodes (A and B) to
the bit lines based on the word line activation.
● Operation:
○ Write Operation: The write process is similar to the NMOS
configuration. The appropriate bit line is driven high or low, and the WL
is activated to allow access to the storage nodes.
○ Read Operation: The read process also mirrors that of the NMOS
configuration. Activating the WL allows the cell to drive the bit lines,
which can then be sensed to determine the stored value.

Fig: Sense amplifier and Data Write circuitry

Now read and write operations of this sense amplifier are discussed as under
Write Operation

1. Activation of Write Enable (WE):


○ When WE is high, the memory cell is in write mode.
2. Selection of Write Signal:
○ The WRITE 1 or WRITE 0 signals, determined by the control logic,
indicate whether to write a '1' or '0' to the memory cell.
○ This signal goes through the AND gates, which combine the WRITE
signals with the R/W control signal to create the necessary signals to
drive the bit lines.
3. Writing to the Storage Cell:
○ Depending on the WRITE signal, the appropriate bit line (1-bit or 0-bit)
is driven high or low, allowing the NMOS storage cell to update its
stored value.
Read Operation

1. Activation of Read Control:


○ When WE is low, the system is in read mode. The circuit generates the
READ OR SENSE signals for each bit line.
2. Sensing the Stored Value:
○ The sense amplifiers detect the voltage levels on the bit lines and
convert them into digital signals, indicating whether the stored value is a
'1' or '0'.
○ The results of the read operation are then available on the outputs (A1
and A2), which reflect the stored values in the respective bit lines.

Dynamic Random-Access Memory (DRAM) and Static Random-Access


Memory (SRAM) are two types of memory technologies used in computing systems.
Here are the key differences between them:

Feature Static Random Access Dynamic Random Access


Memory (SRAM) Memory (DRAM)

Structure Uses a bistable latch (6T Uses a capacitor and a transistor


cell)

Data Retention Retains data without Requires periodic refresh


refresh

Speed Faster access times Slower access times

Density and Size Lower density, larger chip Higher density, smaller chip size
size

Power Higher power consumption Lower power consumption when


Consumption when idle idle

Cost More expensive to Less expensive, cost-effective for


manufacture high capacity

Use Cases Cache memory in CPUs Main memory in computers


Fig: One Transistor DRAM Cell

1T DRAM Write Operation

The write operation involves storing a bit of data (either a 0 or a 1) in the capacitor.
Here are the steps involved:

Step 1: Activate the Word Line (WL):

The word line is driven high (logic 1), turning on the transistor (T).

Step 2: Set the Bit Line (BL):

● If writing a '1': The bit line (BL) is driven high (charged) to provide a positive
voltage to the capacitor.
● If writing a '0': The bit line (BL) is driven low (discharged), effectively
grounding the capacitor.

Step 3: Store the Data:

● The capacitor charges or discharges based on the state of the bit line while the
transistor is on.
● The charge remains in the capacitor after the write operation, representing the
stored bit.

Step 4: Deactivate the Word Line (WL):

The word line is driven low (logic 0), turning off the transistor and isolating the
capacitor from the bit line. The stored charge remains in the capacitor, preserving the
data until the next write or refresh operation.

1T DRAM Read Operation


The read operation involves accessing the stored data from the capacitor. Here are the
steps involved:

Step 1: Activate the Word Line (WL):

The word line is driven high (logic 1), turning on the transistor (T).

Step 2: Sense the Charge on the Capacitor:

● The charge stored in the capacitor affects the voltage level on the bit line (BL).
● If the capacitor is charged (representing a '1'), it will pull the bit line high.
● If the capacitor is uncharged (representing a '0'), the bit line will remain low.

Step 3: Read and Amplify the Signal:

A sense amplifier connected to the bit line detects the voltage level and
amplifies the signal to distinguish between a '0' and a '1'.

Step 4: Deactivate the Word Line (WL):

The word line is driven low (logic 0), turning off the transistor and isolating the
cell again.

Mealy and Moore machines are two types of finite state machines (FSMs) whose
differences are discussed below:

Feature Mealy Machine Moore Machine

Output Output depends on both the current Output depends only on the
Generation state and the input. current state.

Output Outputs can change immediately Outputs change only on state


Timing with input changes (on state transitions, after the clock
transitions). cycle.

State Typically has fewer states because Typically has more states as
Diagram outputs can be associated with outputs are associated with
transitions. states.

Complexity May be simpler in design for some May be more complex due to
applications due to fewer states. more states for the same
behavior.

Timing Outputs can change in the middle Outputs change at the clock
Diagram of a clock cycle. edge (rising or falling).

Design More reactive, as output is More predictable, as outputs


immediate with input. are tied to state changes.

Application Suitable for applications requiring Suitable for applications


fast response to inputs. where output stability is
essential.

4. (i)A sequence detector produces a ’1’ for each occurrence of the input sequence
‘1001’ at its input.

a. Draw the state-transition diagram of the FSM realizing the sequence


detector.
b. Obtain state table from the state transition diagram.
c. Realize the FSM using D FFs and a PLA.

Ans: For a 1001 sequence detector for detecting overlapping sequences, the complete
state transition diagram is

State Table for this State Transition Diagram is:

Binary Next State (Y) / Output State (Z)


Present Code
State (y)
y1y0 For input Y1Y0/Z For input Y1Y0/Z
x=0 x=1
A 00 A/0 00/0 B/0 01/0
B 01 C/0 10/0 B/0 01/0
C 10 D/0 11/0 B/0 01/0
D 11 A/0 00/0 B/1 01/1

Flip Flop Input and Output Equations

Y1 = x’(y0 ⊕ y1) Y0 = x + x’y1y0’ Z = xy0y1

To implement the combinational logic required for the next state logic and output
using a Programmable Logic Array (PLA):

● PLA Input: Current state bits (y1, y0) and input x.


● PLA Output: The next state bits (Y1, Y0) and output Z.

The PLA would be programmed with the following minterms based on the state
transition table:

Inputs(y1, y0, Y1 Y0 Z
x)

000 0 0 0
001 0 1 0
010 1 0 0
011 0 1 0
100 1 1 0
101 0 1 0
110 0 0 0
111 0 1 1
Circuit Diagram realized using PLA and D flip flops

5. Explain the clock skew problem of dynamic CMOS circuits? How clock skew
problem is overcome in in domino CMOS circuits? How clock skew problem is
overcome in in NORA CMOS circuits?

Ans: The clock skew problem in dynamic CMOS circuits refers to the undesired
variation or misalignment in the arrival time of clock signals at different parts of a
circuit. This issue can significantly impact dynamic CMOS logic's performance and
reliability, which relies on the precise timing of clock signals for charging and
discharging the dynamic nodes.

Dynamic CMOS Circuits and Clocking

Fig: Dynamic CMOS circuit

Dynamic CMOS circuits use clock signals to control the precharge and evaluate
phases of logic gates. The operation of these circuits typically involves two phases:
1. Precharge Phase (when the clock is low): The dynamic node is precharged to
a known value (usually high).
2. Evaluate Phase (when the clock is high): The output is evaluated based on
the inputs.

Clock Skew Problem

Clock skew occurs when the clock signal arrives at different times at different parts of
the circuit due to:

● Unequal wire lengths or routing paths.


● Buffer delays or variations in clock distribution networks.
● Manufacturing variations leading to different transistor delays.

In dynamic CMOS circuits, this clock skew can lead to two main issues:

1. Premature Evaluation (Race Condition):


○ If the clock signal arrives earlier at a certain gate than at others, the
circuit may enter the evaluate phase before the inputs are stable.
○ This can cause incorrect outputs, as the logic gate evaluates based on
incomplete or unstable inputs, leading to timing violations and
unpredictable behavior.
2. Delayed Precharge:
○ If the clock arrives late at certain gates, those gates may stay in the
evaluate phase longer than intended.
○ This delay could prevent proper precharging in the next clock cycle,
leaving the dynamic nodes in an undefined state and potentially leading
to functional failures in the subsequent cycles.

Impact of Clock Skew in Dynamic Circuits

● Functional Failures: As dynamic CMOS circuits rely on precise timing


between precharge and evaluate phases, even small skew can cause incorrect
operation.
● Timing Violations: Clock skew may cause some parts of the circuit to operate
faster or slower than others, leading to setup and hold time violations, which
may affect the overall timing and reliability of the system.
● Power Consumption: Incorrect precharging and evaluating due to clock skew
can result in unnecessary switching, leading to increased power dissipation.

In domino CMOS circuits, the clock skew problem is mitigated through


careful design techniques that address the specific challenges associated with the
timing of precharge and evaluate phases. Domino logic is a dynamic form of CMOS
logic that uses a clock to control when gates precharge and evaluate, making it
particularly sensitive to clock skew.
Fig: Domino CMOS logic Gate

Here’s how the problem is typically addressed:

1. Non-Overlapping Clock Phases

Domino logic relies on two distinct clock phases:

● Precharge Phase: When the clock is low, dynamic nodes are precharged.
● Evaluate Phase: When the clock is high, logic is evaluated based on the
inputs.

To avoid timing issues due to clock skew, circuits are designed with non-
overlapping clock phases. This ensures that no unintended overlap between
precharge and evaluate phases occurs, preventing premature evaluation and race
conditions. Any skew that might cause part of the circuit to enter the evaluate phase
early can be minimized by making sure there is a sufficient gap between the two
phases.

2. Skew-Tolerant Domino Stages

Domino circuits are designed with cascaded logic stages, where each stage
depends on the output of the previous one. To reduce the impact of skew, careful
timing is applied between these stages:

● Each stage must complete its evaluate phase before the next stage begins.
● The logic blocks are synchronized so that even if there is minor skew in the
clock signal, the stages still evaluate in the correct order.
● Adding buffers or delay elements between stages can also help ensure that the
timing aligns properly, preventing race conditions or incorrect evaluations.

Fig: Domino Logic cascaded with NMOS logic

3. Footed Domino Logic

One common technique to improve the robustness of domino logic against


clock skew is to use footed domino logic. This involves adding an extra transistor
(foot transistor) in the pull-down network of the dynamic logic:

● The foot transistor is controlled by the clock, and it ensures that the evaluation
phase does not start until the clock signal is fully asserted.
● This reduces the chances of premature evaluation due to skew by ensuring the
dynamic node cannot discharge unless the clock signal is stable and high.

Fig: a. Normal Domino Logic b. Footed Domino Logic

4. Proper Clock Distribution Networks


To reduce clock skew, domino circuits are designed with well-balanced clock
distribution networks:

● Clock trees or H-trees are used to distribute the clock signal evenly across the
entire circuit, minimizing skew between different logic blocks.
● The clock paths are equalized, and delay buffers are used if necessary, ensuring
that all parts of the circuit receive the clock signal at the same time.

5. Skew-Tolerant Buffers

Special buffers known as skew-tolerant buffers are sometimes used in domino


circuits. These buffers are designed to be more resistant to clock skew, ensuring that
the logic evaluates correctly even if small variations in clock timing occur.

6. Dual-Phase Clocking

Domino circuits can also implement dual-phase clocking to reduce clock skew
effects:

● Instead of relying on a single clock signal for precharge and evaluate, two
complementary clocks (phases) are used, where one controls the precharge
phase and the other controls the evaluate phase.
● This technique can minimize timing mismatches and reduce the vulnerability to
skew since the transitions between the phases are more controlled.

7. Pipelining and Timing Margins

Domino logic is often designed with pipelining to ensure that each stage
completes evaluation before the next clock cycle. Sufficient timing margins are added
between stages, giving enough room to handle small clock skews without affecting
functionality.

Pipelining improves overall timing control by reducing the critical path, thus
limiting the potential for skew-related issues.

8. Use of Latches or Flip-Flops at Critical Points

In some domino circuits, latches or flip-flops are inserted at critical points to


isolate stages and mitigate the impact of clock skew. These elements can help store
intermediate results and ensure that evaluation only occurs when the clock signals are
properly aligned.

NORA (No Race) logic is a type of dynamic logic that interleaves NMOS and
PMOS dynamic stages to avoid race conditions, allowing both precharge and evaluate
phases to alternate between these stages. The design alternates between precharged
PMOS and NMOS stages, eliminating the need for separate clock phases for
precharge and evaluation, as both operations occur simultaneously in different parts of
the circuit.

Fig: NORA CMOS Logic

Overcoming Clock Skew in NORA CMOS

1. Alternating NMOS and PMOS Stages (No Separate Precharge and


Evaluation Phases):
○ In NORA CMOS, dynamic stages are constructed in alternating
sequences of NMOS and PMOS transistors.
○ One stage (e.g., NMOS) evaluates while the next stage (PMOS) is in
precharge, and vice versa. This eliminates the strict reliance on the clock
signal to define separate precharge and evaluation phases, inherently
reducing the susceptibility to clock skew. Since each stage is either
precharging or evaluating, clock skew does not directly cause
overlapping precharge or evaluation phases, thus preventing timing-
related issues.
2. Single-Phase Clocking:
○ NORA CMOS uses a single-phase clock for all logic stages, which
simplifies clock distribution and reduces the risk of skew. Since there's
only one clock signal, skew is less likely to cause discrepancies between
different stages, as opposed to designs that use multiple clocks.
○ By avoiding multi-phase clocking schemes (as seen in some other
dynamic logic families), NORA simplifies the clocking structure, which
directly helps in reducing clock skew.
3. Race-Free Operation:
○ The alternation between NMOS and PMOS stages ensures that at any
given time, only one type of transistor network is evaluating while the
other is precharging, thus avoiding race conditions.
○ The circuit naturally becomes less sensitive to clock skew since there is
no overlap between evaluation and precharge phases of adjacent stages,
even if there is a slight mismatch in clock signal arrival times.
4. Clock Buffers and Balanced Clock Distribution:
○ Similar to other dynamic logic families, NORA CMOS circuits employ
buffered clock signals and balanced clock distribution (such as using
clock trees or grids). This ensures that the clock signal reaches all parts
of the circuit simultaneously, reducing clock skew.
○ By carefully designing the clock network to minimize delays, NORA
CMOS circuits ensure that the clock skew is kept within a tolerable
range, preventing it from affecting the evaluate and precharge
operations.
5. Reduced Dependency on Precharge Phase Timing:
○ Since each NORA stage can operate independently, with NMOS and
PMOS stages evaluating and precharging alternately, the circuit does not
rely on strict timing between the precharge and evaluation phases, which
minimizes the impact of clock skew.
○ The natural alternation of these stages reduces the circuit's dependency
on precise clock alignment for correct operation, making it more tolerant
to skew.
6. Skew-Tolerant Design with Pipelining:
○ NORA CMOS designs often incorporate pipelining to further mitigate
the effects of clock skew. By introducing buffers or latches between
stages, the circuit can absorb any minor variations in clock signal timing
without causing errors.
○ The use of pipelining creates clear boundaries between stages, which
can help manage and isolate the effects of skew.
7. Use of Domino-Like Logic for Timing Control:
○ Some variations of NORA circuits combine aspects of Domino logic to
ensure that the outputs of one stage cannot interfere with the next until
the clock has synchronized them. This prevents issues where skew
might otherwise cause premature evaluation or precharge in adjacent
stages.

You might also like