Systems On Chip (SoC)
Systems On Chip (SoC)
Systems On Chip (SoC)
FOR EMBEDDED
APPLICATIONS
OUTLIN
E
• What is an embedded SoC?
• SoC Intellectual Property (IP)
• ARM processors
• ARM support modules
• SoC Design Flow
• Modeling and simulation
• Physical design
EMBEDDED
SYSTEM
• “System”
• Set of components needed to perform a function
• Hardware + software + ….
• “Embedded”
• Main function not computing
• Usually not autonomous
• Usually a computer inside a system
• Application specific
• Subject to constraints
SYSTEM ON
CHIP
• Definition
• (nearly) complete embedded system on a single chip
• Usually includes
• Programmable processor(s)
• Memory
• Accelerating function units
• Input/output interfaces
• Software
• Re-usable intellectual property blocks (HW + SW)
SOC DESIGN
GOAL
Before
After
T.I .
smartphone
reference
design
Main
SoC
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
TEXAS INSTRUMENTS
OMAP44X
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
APPLE “A5”
SOC
• Used in iPad 2 and iPhone 4S
• Manufactured by Samsung
• 45nm, 12.1 x 10.1 mm
• Elements (unofficial):
• ARM Corex-A9 MPCore CPU - 1GHz
• NEON SIMD accelerator
• Dual core PowerVR SGX543MP2 GPU
• Image signal processor (ISP)
• Audience “EarSmart” unit for noise canceling
• 512 MB DDR2 RAM @ 533MHz
NVIDIA TEGRA 2
SOC
Tablet Applications:
• Asus Eee Pad
• Motorola Xoom
• Samsung Galaxy
• Acer Iconia Tab
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
SOC
CHALLENGES
• SoC Designs
• More complex, more functions, higher gate counts
• Faster, cheaper, smaller
• More reliable
• How to handle complexity?
• System design at multiple abstraction levels
• Integration of heterogeneous technologies & tools
• Signal integrity & timing
• Power management
• SoC test methodology
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
ARM SOC-BASED
PRODUCTS
ARM CORTEX
PROCESSORS
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
ARM CORTEX-A9
MPCORE
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
RELATIVE
PERFORMANCE
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
LICENSING ARM
IP
• Perpetual (implementation) license
• ARM partner may perpetually design and manufactureARM-based
products
• Term license
• Design a limited number of ARM-based products within a specified
time period (usually 3 years)
• Perpetual manufacturing rights
• Per use license
• Selected ARM IP, right to design a single ARM-technologyproduct
within a specified time frame (3 years)
• Manufacturing rights perpetual
• University “DesignStart” Program
• Some physical and selected processor IP down-loadable for
academic study
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
CORTEX A9 SYSTEM
IP
Interconnect SoC components
Description AMBA Bus System IP Components
Advanced AMBA3
AXI NIC-301, PL301
Interconnect IP
DMA Controller AXI DMA-330 , PL330
Level 2 Cache Controller AXI L2C-310 , PL310
Dynamic Memory
AXI DMC-340 , PL340
Controller
DDR2 Dynamic Memory
AXI DMC-342
Controller
Static Memory Controller AXI SMC-35x , PL35x
TrustZone Address
AXI PL380
Space Controller
CoreSight™ Design Kit ATB CDK-11
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
• Promotes re-use by defining a common backbone for SoC modules using standard
bus architectures
AHB – Advanced High-performance Bus (system backbone)
High-performance, high clock freq. modules
Processors to on-chip memory, off-chip memory interfaces
APB – Advanced Peripheral Bus
Low-power peripherals
Reduced interface complexity
ASB – Advanced System Bus
High performance alternate to AHB
AXI – Advanced eXtensible Interface
ACE – AXI Coherency Extension
ATB – Advanced Trace Bus
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
EXAMPLE AMBA
SYSTEM
NXP LPC2292
Microcontroller
ARM buses
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
ARM DEBUGARCHITECTURE
(FOR CORTEX: “CORESIGHT” DEBUG &
TRACE IP)
SOC
DESIG
N
FLOW
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
Jouni Tomberg/TUT
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
HIGH-LEVEL PERFORMANCE
MODELING
• Identify Workloads
• Based on target market
• Standard benchmarks (spec, EEMBC)
• O/S based “real” benchmarks – browser, real apps
• Performance Models
• C-based, highly configurable
• Internally developed (no EDA vendor)
• Fast Instruction-Set Based Model
• No timing information, but very fast
• Used for statistics collection and coarse algorithm
development (i.e. branch prediction scheme, load/store
address patterns)
• Abstracted Pipeline
• Reasonably accurate, longer development time
• More specific to microarchitecture
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
UNIT
RTL
• Synthesizable HDL models
• Split work into units based on functionality
• Verilog language of choice
• Write low-level constructs only (assign, case)
• Why?
• Portability; we target multiple partners and have to target
• ‘lowest-common denominator’ design tools
• Know your RTL! Easier to count gates “on-the-fly”
VALIDATI
ON
• Stimulus
• random – RIS, traffic generators for system
• directed – assembly language tests to check specific features
• Irritators
• artificial constraints to stress certain features
• i.e. 1 entry TLB to test table walk logic, forced stalls to test queues
• Assertions
• constantly check that logic cannot do things specified as “illegal”
• i.e. bus protocol checkers
• Formal
• uses properties to check that system can’t get into certain state via
sequence (lots of constraints required)
• lots of research ongoing about formal
VALIDATION -
COVERAGE
• How do I know that I’ve tested everything?
• Coverage provides validation metrics
• Line Coverage
• hit all RTL lines (easy to run)
• Condition Coverage
• hit all RTL terms (easy to run)
• Functional Coverage
• Only way to determine sequences hit
• Require coverage points to be written
• For example, test pipeline flush under other stall conditions (Cache
miss, TLB miss…)
• RTL unit designer and validation person collaborate to determine
“points” or “matrices”
VALIDATION –
SIMULATION/EMULATION
• Software simulation – compile RTL and
testbench
• testbench can be Verilog/VHDL or newer
extensions (SystemVerilog)
• slow – 100’s of Hz, slower if simulating multi-
core system
• full visibility (dump waveforms) – but this is
even slower
• best for debugging/rapid turnaround
• Emulation
• FPGA – moderate $, large setup time, 1’s of
MHz, low to no visibility into failures
• Quickturn/hardware emulator – big $, large
capacity, 1 MHz
• Can do O/S boots and complex system validation
• Visibility – can see all RTL signals
• Can use virtual device drivers for peripherals
(LCD)
Joe Bungo: CPU Design Concept to SoC
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
Compiled, with
HDL “wrappers”
IMPLEMENTATION
STYLE
• Custom
• Squeeze last % of performance from design
• Larger teams required
• Best suited to datapath elements:
• adders
• memories
• search [CAM] structures; reservation stations
• Automated Flow
• Synthesis, place and route
• Time to market
• Tools have gotten pretty good
• Dependent on good libraries (standard cell)
• Initial RTL is portable between processes
These are not mutually exclusive; even teams that use “custom” design
will use automated flow for non-critical or control blocks.
STANDARD
CELLS
• Important to have rich set of standard cells for synthesis tool
• drive strengths
• complex gates (for area)
• biased threshold (faster P, faster N)
• Static Power (leakage) minimization
• This power is burned when doing NOTHING!
• Multi-Vt
• Incremental cost; 4-5 more masks out of 45 (implant)
• higher Vt – slower, but less leakage
• lower Vt – faster, but MUCH more leakage (use sparingly if at all)
• Long channel
• Characterization
• Synthesis – function, pins, timing, power
• Place and route – pin location, blockages, power rails (abstracts)
IMPLEMENTATION STEP -
SYNTHESIS
• Maps RTL to gates (standard cells) – “netlist”
• Constraints – input/output arrival time, drive strength, load, physical
floorplan
• Logic optimization – moves early arriving signals to front of stack, late
to back of stack
• Selects arithmetic operators based on timing requirements (i.e. +
types)
• Cost function – can end up local minima
IMPLEMENTATION
FEEDBACK
SOC
INTEGRATION
• Once core is built, integrated with other cores into chip
• Many millions of gates; can we abstract this out?
• System Design
• SystemC model – transaction level, no timing
• Can chain processor/peripheral models together to test OS
• Cycle-level system simulation
• compiled model
• no internal visibility
• faster runtimes
• smaller, simulator won’t run out of memory!
ARM DEVELOPMENT
TOOLS
• Software development
• ARM Development Studio 5 (DS-5)
• For ASICs and ASSPs
• Compilers, debugger, system performance analyzer, real-time system
simulator
• Keil Microcontroller Development Kit (MDK)
• For embedded microcontrollers
• Cortex M, Cortex-R4, ARM7, ARM 9 devices
• Compilers, debugger, simulators
• Models
• ARM Fast Models – virtual platforms for software development
before silicon
2/29/2012 VLSI D&T Seminar - Victor P. Nelson
CONCLUSIO
NS
• SoC design requires different design approach than
traditional ASICs
• More modeling & simulation at higher abstraction levels
• Heavy use of IP, re-usable modules, platform-based
design
• SoC design team must work with IP vendor and foundry
• Use platform design & standard interfaces between IP
• Hardware/software co-design
• Many design challenges