0% found this document useful (0 votes)
380 views24 pages

ARM 1176-JZFS CPU-Based Low-Power Subsystem

ARM 1176-jzfs CPU-based Low-Power Subsystem demonstrates methodology to Reduce Electrical and Functional Failure in a low-power design. Power shutoff created next level issues like performance, wear-outs of power switches, more complexity in the power switch analysis. This required accompanying ASIC implementation and verification methodology to reduce the risk of chip failure, both functional and electrical.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
380 views24 pages

ARM 1176-JZFS CPU-Based Low-Power Subsystem

ARM 1176-jzfs CPU-based Low-Power Subsystem demonstrates methodology to Reduce Electrical and Functional Failure in a low-power design. Power shutoff created next level issues like performance, wear-outs of power switches, more complexity in the power switch analysis. This required accompanying ASIC implementation and verification methodology to reduce the risk of chip failure, both functional and electrical.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

A Practical Guide to Low-Power Design ARM 1176-JZFS CPU-Based Low-Power Subsystem User Experience with CPF

ARM 1176-JZFS CPU-Based Low-Power Subsystem: Methodology to Reduce Electrical and Functional Failure in a Low-Power Design
By David Flynn, Fellow ARM; Sachin Idgunji, Architect, ARM; Felix Jen, Manager Design Implementation, UMC; Wen-Pin Lin, Senior Manager, UMC; and Vivek Shukla, Cadence Architect, Bangalore.

Abstract

Leakage control has become a major design issue due to leakage currents that drain a batterys charge even when a device is inactive or in standby mode. Transistors in each new process generation leak more than those in previous generations, due to transistor scaling effects, only exacerbating the problem. A few years ago, designers began using power shut-off in their designs and EDA suppliers provided low-power methodology solutions. However, power shutoff created next level issues like performance, wear-outs of power switches, more complexity in the power switch analysis, managing system-level performance due to power-up time, test, and reliability. This required accompanying ASIC implementation and verification methodology to reduce the risk of chip failure, both functional and electrical. We demonstrate the application of these techniques and the methodology on an ARM1176-JZFS CPU-based system that is targeted for a 65nm technology node, which achieves higher speed, but has lower leakage, with a methodology to reduce post silicon electrical failure.

Overview of Ulterior Project

Dual Vt technology, 65nm technology models, SOC implementation

IP and Power Management

Collaboration Between Leaders to Deliver the Low

Memory compiler with memory shut offs, std cells, PMK library

Power Solution

Complete implementation methology for Ulterior CPU

Figure 1: Collaboration and Contributors

Sec14:2

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Joint Collaboration Contributors: This effort has been jointly executed by ARM, UMC, and Cadence to accomplish the following tapeout and silicon measurements. UMC: 65nm standard process Looking for performance and yield on the LP implementation ARM: ARM1176JZFS based SoC to demonstrate power management on a high performance design Low-power architecture Power management and low-power memory IP for managing leakage Cadence: CPU implementation Complete low-power tool and methodology support

UMC Technology Trends and Process Selection for Project


This section discusses the process parameters and process selection. Figure 2 illustrates the process nodes used in this project and its evolution over the 90nm process.
Technology node Process Lithography Core Voltage (V) tox Core (A) IO (A) (IO Vdd) Physical Gate Length (nm) Salicide Interconnect Inter/Intra Metal Dielectric 1XMetal Pitch (nm) 6T SRAM Cell Size (um )
*Cell non-shrinkable
2

L90 1P9M SP/LL 193nm Dry 1.0/1.2 16/22 30/52/65 (1.8/2.5/3.3V) 70/80 CoSi2 Cu Low-k (k=2.9) 280 1.16/0.99*

L65 1P10M SP/LL 193nm Dry 1.0/1.2 12/19 30/52/65 (1.8/2.5/3.3V) 40/55 NiSi Cu Low-k (k=2.9) 200 0.499*/0.525*

Figure 2: UMC Technology Trends

Low Leakage (LL) process has approximately half of the performance at 1.2V in comparison with Standard Process (SP) running at 1.0V (Figure 3).

Sec14:3

ARM 1176-JZFS CPU-Based Low-Power Subsystem


106

105

65SP 1.0V 90SP 1.0V

Normalized loff (pA/um)

10

103

102

65LL 1.2V

90LL 1.2V

101

1 0 5 10 15 20 25 Intrinsic R.O. Delay (ps/stage)

Figure 3: Managing Performance per Watt

As shown in Figure 4, Low Leakage (LL) Nodes gain significantly (>80x) across the process space. They are highly sensitive to temperature (sub-threshold component). High Performance Nodes gain an average of 25% on Drive Strength. This is dominated by the process spread.
Leakage Gain (LL vs SP)
1000.00
NMOS ratio PMOS ratio

Drive strength gain (SP vs LL)


1.40 1.35 1.30

NMOS ratio PMOS ratio

100.00

Gain

Gain

1.25 1.20 1.15 1.10 1.05

10.00

1.00 25 TT 25 SS 25 FF 25 SF 25 FS 125 TT -40 FF

25 TT

25 SS

25 FF

25 SF

25 FS

125 TT

-40 FF

Corners

Corners

Figure 4: Low Leakage vs. Performance Node Tradeoffs

High Performance Nodes gain significantly (average 30%) across the corners. The power dissipation can be managed effectively with voltage scaling. Multichannel devices can be used to reduce the leakage.

Sec14:4

ARM 1176-JZFS CPU-Based Low-Power Subsystem


Performance Gain (SP vs LL)
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 25 TT 25 SS 25 FF 25 SF 25 FS 125 TT -40 FF
Block 1 Block 2 Speed

Voltage scaling on performance and power


2.00E+00 1.80E+00

Dynamic Power

Ratio (SP to LL)

1.60E+00 1.40E+00 1.20E+00 1.00E+00 8.00E+01 6.00E+01 4.00E+01 2.00E+01 0.00E+00 0.5 0.6 0.7 0.8 0.9 1 1.1

Gain

Corners

Voltage

Figure 5: Performance Gain and Impact of Voltage Scaling

Delay for some key structures is V , is in the range of 1.5 2. As shown in Figure 6 and Figure 7, temperature sensitivity decreases with lowered voltage (Zero Temperature Coefficient for block around 0.78V). Variability is highly sensitive to the voltage and increases drastically at lower voltages impacting the functionality of design.
Voltage vs Delay (Average)
0.18 450 400 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.85 0.9 1 1.1 Delay sensitivity to temperature

Delay sensitivity to temperature

Delay (normalized)

350 300 250 200 150 100 50 0 0.75 0.85 0.95 1.05 1.15 1.25 Block 1 Block 2 Block 3

Voltage (V)

Delay slope with temperature

Voltage

Figure 6: Delay Dependencies

Sec14:5

ARM 1176-JZFS CPU-Based Low-Power Subsystem


Voltage vs Variability
40

35

y = 16.396x
Observed std dev (arbititrary units)
30 25

-3.0055

20

15

10

Dispersion data

0 0.7 0.8 0.9 1 1.1 1.2

Core Voltage (V)

Figure 7: Variability

ARM1176JZFS based SoC overview and Advance Leakage Control


Figure-8 illustrates the block diagram of ARM 1176_1616 cpu and its subsystem (Ulterior Chip). Here are the key design features: a. ARM1176 CPU with dual 16K caches With State Retention Power Gating (SRPG) leakage management Multi-voltage design (VRAM, VCPU, VSOC)but not DVS, although will include level shifters Support-independent power/energy analysis Diagnostic SRPG error rate analysis b. ARM AXI-based system-on-chip support logic SDRAM and Flash memory controllers IEM-based performance and leakage controllers Level-2 RAM on-chip memory (with BIST) c. Linux OS port peripherals Demonstration of the entire system running real applications

Sec14:6

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Debug Interface

Flexible DFT/MBIST

Ceompressor Controller

JTAG debug
Instruction Cache (16K) TCRAM 0/1 Interface TrustZone enabled ARM11 core Data Cache (16K) TCRAM 0/1 Interface

ARM1176 64-bit wide SRPG CPU/cache

Level-2 64-bit wide Banked SRAM SDRAM 32-bit

ARM1176J2_1616

ALVC Leakage Control AHB/APB bridge PLL + clkgen

Memory Management

64-bit AXI Inter-Connection Matrix Flash 16-bit

AMBA AXI Interface Instruction Interface Data Interface DMA Peripheral Port

Timer x2

UART X2

INTC

GPIO

ARM 1176_1616

SOC Block diagram

Figure 8: Architecture of ARM 1176

Ulterior Power Strategy


Ulterior consists of two switchable domains and one always-on domain. The VDDCore power net is switched power grid derived from VDDCPU. VDDCPU itself can be switched off externally and can run at different typical voltage value than VDDSOC. VDDSOC is always-on power for the chip, which feeds to small logic required to be always-on. VDDCore, which can be switched off, contains multi-stage turn-on and turn-off control coming from advanced leakage controller. All the flops in VDDCORE domain are retention flops, while all the memories in the VDDCPU can work in 3 low-power modes. Figure 9 shows the logical power domain definition for ARM1176_1616 core.
Ulterior ARM1176 Voltage domains (Logical)
Data side Instruction Side

DTCDataRAM

DTCDataRAM

ITCDataRAM

ITCDataRAM

VDDCPU (Can be Switched off Externally)

DData RAM

DData RAM

DData RAM

DData RAM

IData RAM

IData RAM

IData RAM

IData RAM

DData RAM DTag RAM DTag RAM

DData RAM DTag RAM DTag RAM

DData RAM

DData RAM

IData RAM

IData RAM

IData RAM ITag RAM ITag RAM

IData RAM ITag RAM ITag RAM IValid RAM BTACTag RAM BTACData RAM

Expected location of Pb

VVDDCPU (gated logic)

VDDSOC (Always on logic)


Instruction read only and data read/write ports Peripheral and DMA ports

DDirty RAM DValid RAM Instruction read only and data read/write ports

TLBRAM

Peripheral and DMA ports

Expected location of Instruction Decoders

Need to check if all output ports are clamped

Expected location of A1176 Core TLBRAM On the connectors, the location of the pins is indicated by a line Clock, reset, and interupts port Coprocessor ports ETM ports Figure 6-3 Alternative macrocell floorplan

Figure 9: Power Domains on Ulterior Sec14:7

ARM 1176-JZFS CPU-Based Low-Power Subsystem

To reduce the wear-out of the power switches as well as maintaining the performance, Ulterior proposes two kinds of power switch matricesthe weak network and the strong network. Control for both networks comes from the Advance Leakage Controller(ALC) separately; the weak network has 8 power shut-off control input requests and acknowledge; the strong network has one shut-off enable request. Weak resistive network brings up virtual grid gently with sufficient current to ensure VVDD reaches to 0.95*VDD @high temperature. Strong matrix is turned-on once virtual grid reaches to 0.95*VDD to reduce the IR drop. Implementation of 8 weak enable-based network is to carry out wear-out experiment with 1/2/4/8 enables. All the power controls acknowledge signal selection is based on STA measurement. Figure 10 has one example sequence, where 8N1.
VDDCPU
CLOCK N_ISOLATE

WDDCPU PWR_REQ [N] PWR_ACK [N]

N_RESET N_PWR_REQ N_PWR_ACK

Figure 10: Power Switch Request and Acknowledge (8N1)

Memory subsystem contains 37 single port memories; each memory can work in three low-power modes (Figure 11): a. Standby mode (HALT) CEN disables the memory b. Retention mode (SRPG) Power is supplied to core array to retain state Power is off for periphery for reduced leakage Outputs are clamped to zero

Sec14:8

ARM 1176-JZFS CPU-Based Low-Power Subsystem

c. Shutdown mode (HIBERNATE) Power is off for core and periphery for reduced leakage Outputs are clamped to zero Possible through both integrated MTCMOS and separated power sources for core and periphery
PGEN RETN PGEN

HVt Switch CORE VDD

HVt Switch LOGIC VDD

HVt Switch CORE VDD

Column Decoders Sense Amps and I/O CORE Ground LOGIC VSS

Word Lines

Column Decoders Sense Amps and I/O CORE VSS

HVt Switch PGEN _

HVt Switch RETN_

HVt Switch PGEN _

Figure 11: Memory Retention Power Gating

Implementation Overview
Figure-12 illustrates the Cadence CPF-based low-power implementation flow, with the following key highlights. Single CPF used from the synthesis to backend, power and timing sign-off Leakage optimizations in the synthesis and in the backend flows CPF-based MMMC flow in the Encounter platform PSO Planning flow to meet performance/electrical/power goals Automated Power Switch Network Simulation for multiple combinations CRC model based spice simulation to reduce TaT for complex power switch analysis
Sec14:9

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Comprehensive IR/EM checks Low Power Verification throughout the flow


CPF Integration & Quality Check Conformal Low Power RTL LP Simulation & LP Auto Assertion Generation/Checks Incisive Enterprise Simulator LEC + Power Checks Conformal Low Power PD-Aware Logic Synthesis & DFT Encounter RTL Compiler

While addressing low-power implementation and its verification, it is also important that methodology be adequate enough to deal with the challenges of maintaining performance and reliability. Here are the key issues addressed by the implementation methodology: Power shut-off and MSV implementation Maintaining the system performance is a challenge CPF based methodology simplifies the Low Power Insertion Low-power verification Verifying the low power Through RTL and gate simulations Through formal checks

CPF
Sec14:10

PD-Aware Physical Implementation SoC Encounter Timing & SI Signoff Encounter Timing System IR drop & Power Signoff VoltageStorm-PE & DC Physical Verification

Figure 12: Ulterior Implementation Flow

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Ensuring Reliability New power structures and strategy may lead to Defects in the t>0 time Needs to be taken care in the design ARM has come up with new approach in the design To avoid electrical failures How Implementation would support such mechanism

Ulterior Implementation
Low-power Verification
Low-power verification is the backbone of any low-power flow. Verification can be performed through dynamic simulation on the RTL as well as gate, and static checks. Cadence Encounter Conformal Low Power verifies the correct implementation of low-power design techniques and validates the design using formal techniques (versus simulation) throughout the design process. It also decreases the risk of missed bugs, before a product goes out the door. Conformal Low Power accepts RTL/gatelevel netlists with or without explicit power or ground nets and CPF file as input. It performs structural and rule-based checks to verify that low-power implementation is as per the power specification defined in the CPF file. Under Low Power Equivalency Checking, Conformal Low Power ensures that lowpower optimizations do not introduce a technology mapping bug or a logical bug in the design netlist. It reads golden and revised designs along with CPF files and checks the logical equivalence without setting any constraints on low-power control signals. The RTL and Conformal Low Power flow is used to verify the CPF. It reads RTL and CPF as input and reports missing, and redundant low-power rules as per the power architecture of the design. Conformal Low Power flows for the synthesis and physical netlists are used to verify the low-power implementation with respect to power specification defined in the CPF file. Since instances in the synthesis netlist does not have power ground pins, power domain are assigned based on the CPF definition. The power domains to the instances in the physical netlist are assigned based on the power and ground pin connectivity. Power domain consistency check (PDCIC) performs power-aware equivalence checking and checks low-power cells. The PDCIC between synthesis and the physical netlist performs the power-aware equivalence checking between the golden and revised design. In this case, it assigns the power domain for the synthesis netlist using the CPF definitions while the power domains for the physical netlist use the power and ground pin connectivity.

Sec14:11

ARM 1176-JZFS CPU-Based Low-Power Subsystem


CLP (RTL and CPF checks) LEC (RTL vs Synthesis netlist) Power aware LEC (RTL vs Synthesis Netlist) CLP (Synthesis Netlist) LEC (Synthesis vs Backend Netlist) Power Aware LEC (Synthesis vs Backend Netlist) Power Aware LEC including PDCIC (Synthesis vs Physical Netlist) CLP (Physical Netlist) CLP Unified (hierarchical + top level) Physical Netlist CPF

Conformal LP Verify CPF consistency checking

RTL

Logic Syntheis & DFT Gate netlist

Conformal Low Power Ewuivalence Checking

Front-end Signoff Physical implementation


Physical netlist

Conformal Low Power Equivalence Checking Back-end Signoff

Power Equivalence

Low Power Check Progress

Figure 13: Verification


Design Import

Multi_Mode Pre-CTS Optimization

Domain aware CTS


Load CPF and Create RC, Timing Optimization MMMC views

Multi-Mode postCTS Optimization


Floorplanning of Power Domains, relative macro placement

SI and domain Aware nanoroute

End cap, well-tap, Power Switch cell placement

Multi-Mode postRoute Optimization

Multi-Mode leakage Optimization


Power Planning (PSO, Well-tap hookup) Placement (Isolation, SRPG)

Multi_Mode postRoute SI Optimization

Multi-Mode Hold optimization

Always-on-Nets Synthesis For RETAIN and PSO control signals

SOC Signoff timing and ciltic checks

Power Routing sroute-LVLSeconday/STDCELL preroute nanoroute-SRPGSenconday /Always-on-nets

Figure 14: Ulterior Backend Flow Overview

Sec14:12

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Ulterior PnR Flow


It is important that methodology and flow used for the Place and Rout (P&R) captures the additional complexity due to the power strategy. The Backend tool takes the power architecture information from the CPF file and is able to perform all the steps in automated manner. Figure 14 captures the CPF-based automated low-power P&R flow used in the project. The flow starts with the design import and loading of the CPF file. Inside the Cadence SoC Encounter(r) RTL-to-GDSII System, loading and committing of CPF on the design occurs through the loadCPF and commitCPF commands respectively. The loadCPF command mainly captures the following information from the CPF: Low-power cells such as level shifter, isolation cell, power switch cell, SRPG, and always-on buffers All the power and ground nets The power domain with switchable attributes and its global connections Attaches libraries to the appropriate power domains Rules such as power switch, level shifter, isolation cell, and SRPG Different analysis views

The commitCPF command mainly creates the following information according to the loaded CPF: Creates power domains and defines their global connections Checks and inserts level shifters and isolations based on the rules Checks and replaces flip-flops with SRPGs based on the rules Creates the analysis views

Once design is imported and CPF is loaded, the following are the key steps performed by the SoCE: (i) (ii) Low-power CPF flow and the MMMC settings Different kinds of the power shut-off (PSO) for the design

On-chip PSO Column-based checker board PSO PSO for hard macro (memory) Off-chip PSO VDDCPU, secondary domain for VDDCore, can also be shut off externally (iii) (iv) (v) (vi) Different kinds of level shifter implementation for the design Isolation implementation for the PSO power domains State retention for the PSO power domains Always-on net synthesis Low-to-high level shifter

Sec14:13

ARM 1176-JZFS CPU-Based Low-Power Subsystem

(vii)

Secondary power pin connection for the SRPG/always-on buffer/LVL shifter On-chip Variation (OCV) timing analysis mode Timing optimization and analysis in MMMC Clock tree synthesis in MMMC Domain-aware routing

(viii) Placement in MMMC (ix) (x) (xi) (xii)

(xiii) MMMC SI closure (xiv) Hold timing optimization in MMMC (xv) MMMC leakage optimization (xvi) Running multiple-CPU processing to reduce the runtime for multiple mode analysis

Figure 15: Floorplan and power plan

Figure 15 illustrates the floorplan and the power switches columns of the ulterior design.

Sec14:14

ARM 1176-JZFS CPU-Based Low-Power Subsystem

The key in the implementation is the power network planning. As per the power architecture section, we need two types of power switch networkthe weak network and the strong power switch network. Weak network itself has eight different enables and same number of acknowledgements.

Weak Network Enable


As shown in Figure 16, 8 weak enables feed to 16 columns spread uniformly and interleaved; this was done to reduce the rush current issue and to bring up the power grid gently up to 95% of VDD. Every vertical weak column has certain number of cell rows skipped (13 rows to be precise) and skipped rows either have strong network switch cell or the weak network return path switch cell.

REQ_WEAK_0 REQ_WEAK_1

REQ_WEAK_2

REQ_WEAK_3

REQ_WEAK_4

REQ_WEAK_5 REQ_WEAK_6 REQ_WEAK_7

Figure 16: Weak Network Enable

Weak Network Acknowledgement


As shown in Figure 17, there are 16 separate acknowledgements on the return path of the weak network. Out of these 16 acknowledgements, 8 have been connected to ALC state machine based on STA measurements.

ACK_WEAK_1 ACK_WEAK_0

ACK_WEAK_3

ACK_WEAK_5

ACK_WEAK_7

ACK_WEAK_1_1 ACK_WEAK_3_1 ACK_WEAK_5_1 ACK_WEAK_7_1

ACK_WEAK_2

ACK_WEAK_4

ACK_WEAK_6

ACK_WEAK_0_1 ACK_WEAK_2_1 ACK_WEAK_4_1 ACK_WEAK_6_1

Figure 17: Weak Network Acknowledgement

Sec14:15

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Strong Network Request


The strong network has been used to reduce the IR-drop once the power ramps up to 95% of VDD through the weak power network. The strong network has a higher number of PSO switches than the weak, but it is important that the number should not be so high such that leakage through these switches becomes an issue. As shown in Figure 18, single request for the strong network feeds to 16 columns spread uniformly and every column has 351 strong network cells. So, the implementation ends up having the 5600 strong power switches and total leakage through the PSO is 0.6mW.

REQ_STRONG

Figure 18: Strong Network Request

Strong Network Acknowledgment


Figure 19 captures the return path of the strong network with 16 acknowledges; STA measurement has been performed to choose one out of the 16 to connect to the ALC controller.

ACK_STRONG_1 ACK_STRONG_0

ACK_STRONG_3

ACK_STRONG_5

ACK_STRONG_7

ACK_STRONG_1_1

ACK_STRONG_3_1 ACK_STRONG_5_1 ACK_STRONG_4_1

ACK_STRONG_7_1

ACK_STRONG_2

ACK_STRONG_4

ACK_STRONG_6

ACK_STRONG_0_1

ACK_STRONG_2_1

ACK_STRONG_6_1

Figure 19: Strong Network Acknowledge Return Path

Sec14:16

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Well Tap Cells


This standard cell library from ARM can be used for back-bias technique to reduce the leakage. This technique is not used in this project, but proper well taps still need to be inserted for bulk connections. Figure 20 shows the well tap placement in the design. Also, the SoC Encounter system automatically takes care of the domain association of the well tap cells while inserting them.

Figure 20: Well Tap Cells

Isolation Cells Insertion and Placement


Isolation cells are inserted between the always-on and switchable domain based on the isolation rules and cell specified in the CPF file. As shown in Figure 21, in the ulterior design, the isolation has been placed between PDsoc and PDcore, PDcpu and PDcore, and PDsoc and PDcpu. Placement of these isolation cells is in the ON domain.

Figure 21: Isolation Cells Sec14:17

ARM 1176-JZFS CPU-Based Low-Power Subsystem

SRPG Cells
There are ~40k state retention flops in the PDcore PSO power domain (Figure 22). While all the flops are inserted during the synthesis, its placement and power connections have been performed during P&R. Secondary power pin connection to always-on power for state-retention flops is shown in Figure 23, SRPG is double height cell with VSS in bottom. The entire secondary power hookup for the SRPG were done using Cadence NanoRoute router.

Figure 22: SRPG Cells Spread

Figure 23: SRPG Standard Cell

Sec14:18

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Ulterior Power Sign-Off


With the multiple power switch networks and multiple enable conditions for the weak network, this project required extensive power switch simulations as illustrated in Figure 24. Here is the brief summary of the analysis performed: Power calculation Using Common Power Engine (CPE) for power calculation for different modes and corners Using Powermeter to generate dynamic current waveforms with a vectorless approach Static and dynamic IR-drop analysis Using Cadence VoltageStorm power analysis for static IR-drop analysis and EM check with power from CPE Using VoltageStorm for dynamic IR-drop analysis with current waveforms from Powermeter Power gating analysis for the multiple request enable conditions Using Powermeter to generate spice decks and Ultrasim to run power-up simulation Using VoltageStorm for dynamic IR-drop analysis with rush current from power-up Perform the ECOs on the power switch network to get the ramp-up time, rush current and request to acknowledge time as per the required specs Decap ECO flow and PSO ECO flow
Spice model, .cl, .lib, .spice Library DEF, LEF, SPEF, TWF Design data from FE

Powermeter Power up deck generation

Powermeter Power Calculation

Window based decap eco was used to get a rough idea of how much decap was needed

Ultrasim PSO spice simulation

VStorm Static IR/EM analysis

VStorm Dynamic IR analysis

.tm waveforms

.ptipeak rush current file plots plots reports reports decap eco file

block .cl for top level analysis

*Except VStorm dynamic IR drop, all runs were done in 3 corners (FF, SS, TT). VStorm dynamic IR drop was done in TT only.

Figure 24: Power Analysis Flow

Sec14:19

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Figure 25 illustrates the static IR-drop plots and numbers. The average drop across the PSO varies between 15% to 25% of the total IR-drop. So it is important that one does careful analysis at the time of PSO strategy to maintain the performance.

Figure 25: Static IR-Drop Analysis

Figures 26 and 27 illustrate the power switch network simulation results and waveforms. For simplification, Figure 27 illustrates the limited and optimal set of simulation results at the end of the project, but during the power switch network optimization, several combination and corners have been tried out to get optimized power switch network. As you can see, rush current (Ipeak and Iavg) are minimal when we have one weak enable turned on, but ramp-up time is 12X more in comparison with the 8 weak enable ON. Similarly, 8 weak enable conditions has 12X more rush current than the one weak enable condition.

Sec14:20

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Figure 26: Power Switch Network Simulation Results

Figure 27: Power-Up Simulation Waveforms

Sec14:21

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Assembly and Packaging


Figure 28 illustrates the bond diagram of the Ulterior SoC. It uses 352-pin package for the 4x4 square mm die. It contains the following: 180/244 signal pins 40 mixed-signal (power measurement) 52 power/ground

Figure 28: Assembly and Packaging

Sec14:22

ARM 1176-JZFS CPU-Based Low-Power Subsystem

Ulterior Implementation Results


While Silicon Measurements are under progress, here are the stats from the tape-out data measurements: Gate Count: 1.1M gates, Instance Count: ~300K, 37 memories, 3.09mm Utilization is 80% for core Performance: 615 MHz in WC corner (0.9V, SS, 125C) Power Savings: Ulterior total leakage power savings : 3X VDDCore domain leakage power savings : 100X 2 types of power switch network: Weak : 14 columns, 8 enables, 280 total switches, 20 per column Strong : 16 columns, 1 enable (16 acks), 5616 total switches per column 351 Extensive power analysis Ramp-up time for every corner (with combination of weak enables) Ranges from 8ns to 107ns Rush current control with multiple combination (ranging from 39mA to 481mA) Current limit spec has been met for HEADER follow pin Through the multiple iteration Drop through the switch 6mV (average), range+ 3mV _____________________________________________________
David Flynn, a Fellow in R&D at ARM Ltd, has been with the company since 1991, specializing in System-onChip IP deployment and methodology. He is the original architect behind ARMs synthesizable CPU family and the AMBA on-chip interconnect standard. His current research focus is low-power system-level design. He holds a number of patents in on-chip bus, low power and embedded processing sub-system design and holds a BSc in Computer Science from Hatfield Polytechnic, UK and a Doctorate in Electronic Engineering from Loughborough University, UK. He is currently Visiting Professor with the Electronics and Computer Science Department at Southampton University, UK. ([email protected]) Sachin Idgunji, a Principal Engineer at ARM Inc. in the Research Group specializing in Systems/Circuit design and analysis. His current research focus is in variation analysis, low power design and statistical techniques. Prior to joining ARM, he was at Synopsys Inc. where he led several projects ranging from design specification through tape out in areas of graphics, networking and embedded processing. Prior to Synopsys, Sachin worked at IBM Labs (India) and PCS-Data General and has over 18 years of industry experience. Sachin holds a BE in Electronics Engineering from Shivaji University, India. ([email protected])
2

Sec14:23

ARM 1176-JZFS CPU-Based Low-Power Subsystem


Felix Jen, a section manager in the Design Technology Support Section of IP Development and Design Support Division at UMC, has been with the company since 2002, with expertise mainly focusing on IC design implementation and design methodology. ([email protected]) Wen-Pin Lin, a Senior Technical Manager and Staff in IP Development and Design Support Division at UMC, joined the company in 2007. His expertise mainly focuses on deep sub-micron IC design implementation and design methodology. ([email protected]) Vivek Shukla serves as ]an R&D Architect at Cadence Design Systems, Bangalore. Before Cadence, Vivek worked at Beceem Communications, a startup in Wi-Max, where he led multiple tape-outs of Wi-Max 802.16e standard compliant chips. Prior to Beceem, he had a 5 year stint at Intel during which he worked and led efforts for Ethernet chips, multi-core processor, CSI development and was responsible for the processor methodologies in a wide range of areas including timing closure, Custom design and mixed signal design. Prior to Intel, he worked at Motorola on DSP processors and led chip designs for automotive applications. He holds a B.Tech in Electronics and Communications Engineering from IT-BHU, India and has 2 design patents in the area of low power and high speed interconnect. ([email protected])

Sec14:24

You might also like