L20 EmbeddedMultiprocessors
L20 EmbeddedMultiprocessors
1
System on a chip
An SoC integrates a microcontroller (or microprocessor)
with advanced peripherals like graphics processing
unit (GPU), Wi-Fi module, or coprocessor. Similar to
how a microcontroller integrates a microprocessor with
peripheral circuits and memory, an SoC can be seen as
integrating a microcontroller with such advanced
peripherals
It may include Analog and digital functions, Mixed Signals and
Radio frequency functions
2
Why multiprocessor systems-on-chips
3
Introduce Multiprocessor
Definition: Multiprocessor is Parallel
processors with a single shared address.
Microprocessor is now the most cost-effective
processor.
Multiprocessors have the highest absolute
performance-faster than the fastest
uniprocessor.
4
Multiprocessor example
applications
Network Security
Telecommunication
Multimedia applications
5
Several conceptions
Parallel processing program: a single program
that runs on multiple processors simultaneously.
Parallel computing and concurrency.
Cluster: a set of computers connected over a local
area network (LAN) that function as a single large
multiprocessor.
Shared memory: a memory for a parallel processor
6
Multiprocessosrs communication
mode
Shared address: offer the programmer a single
memory address space that all processors share.
Processors communicate through shared
variables in memory, with all processors capable
of accessing any memory location via loads and
stores.
Message passing: Communicating between
multiple processors by explicitly sending and
receiving information.
7
Two types single address access
Uniform memory access multiprocessors (or symmetric
multiprocessors): which takes the same time to access
main memory no matter which processor requests it and no
matter which word is requested.
one copy of an OS runs on all the processors
8
Two types single address access
Non - uniform memory access (NUMA) multiprocessors:
some memory accesses are faster than others depending on
which processor asks for which word.
Speed depends on running tasks.
a processor can access its own local memory faster than non-local memory
Inter processor memory transfer may be costly
For non-uniform memory access machines can scale to larger sizes and hence
are potentially higher performance.
9
Two basic constructed organizations
Processors connected by a single bus
Processors connected by a network
Options in communication style and physical connection for multiprocessors as the number
of processors varies.
10
Multiprocessors connected by
a single bus
11
Multiprocessors connected by
a network
12
Multiprocessors Programming
Why is it difficult to write multiprocessor programs that are
fast, especially as the number of processors increases?
Because of the programming difficulty, most parallel
processing success stories are a result of software wizards
developing a parallel subsystem that presents a
sequential interface.
you must get good performance and efficiency from the
parallel program on a multiprocessor
13
System-on-chips designs
constraints:
Not simply high computation rates, but
real-time performance that meets
deadlines.
Low power or energy consumption.
Low cost.
14
Need for MPSocs
15
Concept of MPSoC
Multiprocessor system on chips (MPSoC) are not chip
multiprocessors.
Chip multiprocessors are components that take
advantage of increased transistor densities to
put more processors on a single chip, but they don’t
try to leverage application needs
MPSoC are custom architectures that balance the
constraints of VLSI technology with an
application’s needs.
16
Memory system of MPSoC
Heterogeneous memory systems: some blocks of memory
may be accessible by only one or a few processors.
Heterogenous memory systems are harder to program because
the programmer must keep in mind what processors can access
what memory blocks
Irregular memory structures are often necessary in MPSoCs.
One reason that designers resort to specialized memory is to
support real-time performance.
17
Challenges and Opportunities
MPSoCs combine the difficulties of building complex
hardware systems and complex software systems.
Methodology is critical to MPSoC design. Methodologies that
work offer many advantages. They decrease the time it takes
to design a system; they also make it easier to predict how
long the design will take and how many resources it will
require. Methodology also codify techniques for improving
performance and power consumption that developers can
apply to many different designs.
Methodology will necessarily be a moving target for the next
decade.
19
Productivity increase through
Raise level of abstraction
Structured design
IP reuse
EDA support
20
Example: TV application
21
Example: MPSoC hardware
22
Example: MPSoC software stack
23
Example: MPSoC integration
Current practice
Ad hoc approaches
Low-level interfaces
Examples
Synchronization via low-level primitives
• Interrupts, MMIO, semaphores
Data access services partly in IP
• Buffering, DMA control, address generation
Consequence
Part of IP is specific for underlying communication infrastructure
• IP just wants the next pixel or block or …
• But also knows about burst transfers, interrupts, semaphores
24
Example: MPSoC integration
Low-level interfaces
Hardware / software IP designer must deal with low-level issues
• Increases design effort
• Same problems solved again and again: error prone
IP becomes specific for particular use
• Hampers reusability
IP integrator must deal with low-level issues
• Increases design effort
Infrastructures cannot evolve
• Changes in infrastructure affect hardware / software IP
25
Task Transaction Level
Interface
27
TTL Requirements
Well-defined semantics for application modeling
Focus: stream processing applications
Make concurrency and communication explicit
High-level interface
Make high-level services available
• Inter-task communication
• Multi-tasking
Easy to use for IP development
Facilitate reuse and integration of IP
Provide implementation freedom
Allow efficient and cheap implementations
E.g. supporting fine grain synchronization for on-chip memory
Support integration of hardware and software tasks 28
TTL in Example Architecture
29