0% found this document useful (0 votes)
18 views55 pages

01-System Architecture

The document provides an overview of System-on-Chip (SoC) architecture and design, detailing the components such as processors, memory, and interconnects, as well as the economic factors influencing SoC development. It discusses the differences between SoC and traditional processors, the various types of processors used in SoCs, and the importance of understanding application requirements for effective design. Additionally, it highlights strategies for reducing design complexity through the use of Intellectual Property (IP) and reconfigurable technology.

Uploaded by

jayasakthi.ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views55 pages

01-System Architecture

The document provides an overview of System-on-Chip (SoC) architecture and design, detailing the components such as processors, memory, and interconnects, as well as the economic factors influencing SoC development. It discusses the differences between SoC and traditional processors, the various types of processors used in SoCs, and the importance of understanding application requirements for effective design. Additionally, it highlights strategies for reducing design complexity through the use of Intellectual Property (IP) and reconfigurable technology.

Uploaded by

jayasakthi.ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Introduction to the

Systems Approach
SOC architecture and design
• system-on-chip (SOC)
• processors: become components in a system
• SOC covers many topics
• processors, cache, memory, interconnect, design tools
• need to know
• user view: variety of processors
• basic information: technology and tools
• processor internals: effect on performance
• storage: cache, embedded and external memory
• interconnect: buses, network-on-chip
• evaluation: processor, cache, memory, interconnect
• advanced: specialized processors, reconfiguration
• design productivity: system modelling, design exploration
System on a Chip: driven by
semiconductor advances
SOC vs processors on chip
• with lots of transistors, designs move in 2 ways:
• complete system on a chip
• multi-core processors with lots of cache

System on chip Processors on chip


processor multiple, simple, few, complex,
heterogeneous homogeneous
cache one level, small 2-3 levels, extensive
memory embedded, on chip very large, off chip
functionality special purpose general purpose

interconnect wide, high bandwidth often through cache

power, cost both low both high

operation largely stand-alone need other chips


iPhone: has System-on-Chip

Source: UC Berkeley
iPhone SOC
I/O
Processor

1 GHz ARM Cortex


A8
I/O
I/O Memory
Source: UC Berkeley
SOC design: key ideas
• to design and evaluate an SOC, designers need to
understand:
– its components: processors, memory, interconnect
– applications that it targets
• SOC economics heavily dependent on:
– costs: initial design, marginal production
– volume: applicability, lifetime
• reducing design complexity
– Intellectual Property (IP)
– reconfigurable technology
SOC processors
• usually a mix of special and general purpose (GP)
• can be proprietary design or purchased IP
• commonly GP processor is purchased IP
• includes OS and compiler support
• GP processor optimized for an application
• additional instructions
• vector units
Some processors for SOCs
SOC Basic ISA Processor description
Freescale c600: PowerPC Superscalar with vector
signal processing extension

ClearSpeed Proprietary Array processor with 96


CSX600: general processing elements

PlayStation 2: MIPS Pipelined with 2 vector


gaming coprocessors

ARM VFP11: ARM Configurable vector


general coprocessor
Basic system-on-chip model
Basic system-on-chip model
• Some of the basic elements of an SOC system. These include a
number of heterogeneous processors interconnected to one or more
memory elements with possibly an array of reconfigurable logic.
• Frequently, the SOC also has analog circuitry for managing sensor
data and analog - to - digital conversion, or to support wireless data
transmission.
• SOC for a smart phone would need to support, in addition to audio
input and output capabilities for a traditional phone, Internet access
functions and multimedia facilities for video communication,
document processing, and entertainment such as games and movies.
Basic system-on-chip model
• A possible configuration for the elements would have the core
processor being implemented by several ARM Cortex - A9 processors
for application processing, and the media processor being
implemented by a Mali - 400MP graphics processor and a Mali - VE
video engine.
• The system components and custom circuitry would interface with
peripherals such as the camera, the screen, and the wireless
communication unit.
• The elements would be connected together by AXI (Advanced
eXtensible Interface) interconnects.
Basic system-on-chip model
• If all the elements cannot be contained on a single chip, the implementation
is probably best referred to as a system on a board, but often is still called a
SOC.
• What distinguishes a system on a board (or chip) from the conventional
general - purpose computer plus memory on a board is the specific nature of
the design target.
• The application is assumed to be known and specified so that the elements
of the system can be selected, sized, and evaluated during the design
process.
• The emphasis on selecting, parameterizing, and configuring system
components tailored to a target application distinguishes a system architect
from a computer architect.
PROCESSOR ARCHITECTURES
• Typically, processors are characterized either by their application or by
their architecture (or structure)
• The requirements space of an application is often large, and there is a
range of implementation options.
• Thus, it is usually difficult to associate a particular architecture with a
particular application.
• In addition, some architectures combine different implementation
approaches as seen in the PlayStation
• There, the graphics processor consists of a four - element SIMD array of
vector processing functional units (FUs).
• Other SOC implementations consist of multiprocessors using very long
instruction word (VLIW) and/or superscalar processors.
PROCESSOR ARCHITECTURES
Processor types: overview
Processor type Architecture / Implementation approach
SIMD Single instruction applied to multiple
functional units
Vector Single instruction applied to multiple
pipelined registers
VLIW Multiple instructions issued each cycle
under compiler control

Superscalar Multiple instructions issued each cycle


under hardware control
Adding instructions
• additional instructions to support specialized resources
• exception: superscalar, with hardware control
• instructions can be added to base processor for coprocessor control
• VLIW: Very Large Instruction Word
• Array
• Vector
Sequential and parallel
machines
• basic single stream processors
• pipelined: basic sequential
• superscalar: transparently concurrent
• VLIW: compiler generated concurrency
• multiple stream
• array processors
• vector processors
• multiprocessors
Sequential processors
• operation
• generally transparent to sequential programmer
• appear as in order instruction execution
• pipeline processor
• execution in order
• limited to one instruction execution / cycle
• superscalar processor
• multi instructions / cycle, managed by hardware
• VLIW
• multi op execution / cycle, managed by compiler
Simple Sequential Processors
• Sequential processors directly implement the sequential execution
model.
• These processors process instructions sequentially from the
instruction stream.
• The next instruction is not processed until all execution for the
current instruction is complete and its results have been committed.
• The semantics of the instruction determines that a sequence of
actions must be performed to produce the specified result
• These actions can be overlapped, but the result must appear in the
specified serial order.
Simple Sequential Processors
1. fetching the instruction into the instruction register (IF),
2. decoding the opcode of the instruction (ID),
3. generating the address in memory of any data item residing there
(AG),
4. fetching data operands into executable registers (DF),
5. executing the specifi ed operation (EX), and
6. writing back the result to the register fi le (WB).
Simple Sequential Processors
Simple Sequential Processors
• During execution, a sequential processor executes one or more operations per
clock cycle from the instruction stream.
• An instruction is a container that represents the smallest execution packet
managed explicitly by the processor.
• One or more operations are contained within an instruction.
• The distinction between instructions and operations is crucial to distinguish
between processor behaviors.
• Scalar and superscalar processors consume one or more instructions per cycle,
where each instruction contains a single operation.
• Although conceptually simple, executing each instruction sequentially has
significant performance drawbacks:
• A considerable amount of time is spent on overhead and not on actual execution.
• Thus, the simplicity of directly implementing the sequential execution model has significant
performance costs.
Pipelining
• Pipelining is a powerful technique that is used in almost all current processor implementations.
• Techniques to extract and exploit the inherent parallelism in the code at compile time or run
time are also widely used.
• Instruction - level parallelism (ILP) means that multiple operations can be executed in parallel
within a program.
• ILP may be achieved with hardware, compiler, or operating system techniques.
• At the loop level, consecutive loop iterations are ideal candidates for parallel execution,
provided that there is no data dependency between subsequent loop iterations.
• Next, there is parallelism available at the procedure level, which depends largely on the
algorithms used in the program.
• Finally, multiple independent programs can execute in parallel.
• Different computer architectures have been built to exploit this inherent parallelism.
• In general, a computer architecture consists of one or more interconnected processor
elements (PEs) that operate concurrently, solving a single overall problem.
Pipelined processor
Instruction #1

IF ID AG DF EX WB

Instruction #2

IF ID AG DF EX WB

Instruction #3

IF ID AG DF EX WB

Instruction #4

IF ID AG DF EX WB
Time

• IF: Instruction Fetch


• ID: Instruction Decode
• AG: Address Generation
• DF: Data Fetch
• EX: Execution
• WB: Write Back
Pipelined Processor
• For a simple pipelined machine, there is only one operation in each phase at any given time;
• one operation is being fetched (IF);
• one operation is being decoded (ID);
• one operation is generating an address (AG);
• one operation is accessing operands (DF);
• one operation is in execution (EX); and
• one operation is storing results (WB).
• The most rigid form of a pipeline, sometimes called the static pipeline, requires the processor
to go through all stages or phases of the pipeline whether required by a particular instruction
or not.
• A dynamic pipeline allows the bypassing of one or more pipeline stages, depending on the
requirements of the instruction.
• The more complex dynamic pipelines allow instructions to complete out of (sequential) order,
or even to initiate out of order.
• The out - of - order processors must ensure that the sequential consistency of the program is
preserved.
Pipelined Processor
Superscalar and VLIW
processors
Instruction #1

IF ID AG DF EX WB

Instruction #2

IF ID AG DF EX WB

Instruction #3

IF ID AG DF EX WB

Instruction #4

IF ID AG DF EX WB

Instruction #5

IF ID AG DF EX WB

Instruction #6

IF ID AG DF EX WB
Time
Superscalar

VLIW
Superscalar

VLIW
Parallel processors
• execution managed by programmer
• array processors
• single instruction stream, multiple data streams: SIMD
• vector processors
• SIMD
• multiprocessors
• multiple instruction streams, multiple data streams: MIMD
Array processors
• perform op if condition = mask
• operand can come from neighbour

mask op dest sr1 sr2

n PEs, each with


memory; neighbour
communications

one instruction
issued to all PEs
Vector processors
• vector registers, eg 8 regs x 64 words x 64 bits
• vector instructions: VR3 <- VR2 VOP VR1
Array Processors

Vector Processors
SOC multiprocessors
Memory and addressing
• many SOC memory designs use simple embedded
memory
• a single level cache
• real (rather than virtual) addressing
• as SOC become more complex
• their designs are expected to use more complex memory and
addressing configurations
Three levels of
addressing
User view of memory:
addressing
• a program: process address (offset + base + index)
• virtual address: process address + process id
• a process: assigned a segment base and bound
• system address: segment base + process address
• pages: active localities in main/real memory
• virtual address: translated by table lookup to real address
• page miss: virtual pages not in page table
• TLB (translation look-aside buffer): recent translations
• TLB entry: corresponding real and (virtual, id) address
• a few hashed virtual address bits address TLB entries

• if virtual, id = TLB (virtual, id) then use translation


The TLB and
The MMU
SOC interconnect
• interconnecting multiple active agents requires
• bandwidth: capacity to transmit information (bps)
• protocol: logic for non-interfering message transmission
• bus
• AMBA (adv. Microcontroller bus architecture) from ARM,
widely used for SOC
• bus performance: can determine system performance
• network on chip
• array of switches
• statically switched: eg mesh
• dynamically switched: eg crossbar
Bus based SOC
Network on a Chip
SOC design approach
• understand application (compiler, OS, memory and real
time constrains)
• select initial die area, power, performance targets;
select initial processors, memory, interconnect
• assume target processor and interconnect performance,
design and evaluate memory
• evaluate and redesign processors with memory
• design interconnect to support processors and memory
• repeat and iterate to optimize
SOC design
approach
Processor optimization
example
• given embedded ARM processor
• in an SOC chip
• 1 IALU vs 2 IALU vs 3 IALU vs 4 IALU
• instructions per cycle?
• 16k L1 instruction cache vs 32k L1 i-cache
• how much improvement? less power?
• branch predictor: taken vs not-taken
• misprediction rate?

• aim: explore this large design space


Design cost: product
economics
• increasingly product cost determined by
• design costs, including verification
• not marginal cost to produce
• manage complexity in die technology by
• engineering effort
• engineering cleverness
Design complexity
Cost: product program vs
engineering Chip design

Fixed Variable costs


costs Verify & test

Labor costs

Software
Marketing,
sales,
administration
Manufacturing CAD
support
costs
Engineering
costs

Engineering

Mask costs

CAD Fixed
programs project costs

Product cost
Capital
equipment
Two scenarios
• fixed costs Kf, support costs 0.1 x fct(n), and
variable costs Kv*n, so
Costs Kf  (0.1* Kf ) * 3 n  Kv * n

• design get more complex while production costs decrease


• K1 increases while K2 decreases
• implicitly requires higher volumes to break even

• when compared with 1995, in 2005


• K1 increased by 10 times
• K2 decreased by the same amount
Two scenarios
Product volume dictates design
effort

Basic
Design time physical
and effort tradeoffs

Balance point depends on


n, number of units
Reduce complexity: use IP

• IP: Intellectual Property


Reduce complexity: reconfig.
tech.
• reconfigurable technology: no fabrication costs
– lower non-recurring engineering (NRE) costs
• reconfigurable design: faster and cheaper
– improve time-to-market
• reconfigurability
– in-system upgrade: improve time-in-market
– run-time adaptation: respond to run-time conditions
– compile-time reconfiguration: retarget accelerator
• overhead: performance, area, energy efficiency
– less effective than ASIC (application-specific IC)
Summary
• to design and evaluate an SOC, designers need to
understand:
– its components: processors, memory, interconnect
– applications that it targets
• SOC economics heavily dependent on:
– costs: initial design, marginal production
– volume: applicability, lifetime
• reducing design complexity
– Intellectual Property (IP)
– reconfigurable technology

You might also like