0% found this document useful (0 votes)
21 views13 pages

Multiprocessors I

Computer architecture

Uploaded by

oliviaclark2905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Multiprocessors I

Computer architecture

Uploaded by

oliviaclark2905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Introduction

Multiprocessor System

Professor: Dr. Tran Ngoc Thinh


Group Members: Mr. Ibtasam Rehman
Mr. La Minh Tuan Kiet
Mr. Nguyen Thanh Loc

Content

● Introduction to Multiprocessors
● Historical Context
● Difference MIMD and MDSI
● Multiprocessor - 1
● Flynn Classification
● Memory Consistency
● Sequential Consistency
● Relaxed Memory Models
● Thread & Multithread
● Superpipeline & Superscalar
● Multithreading & Superscalar
● Conclusion

2
1
Introduction to Multiprocessors

Definition

● Multiprocessor systems: integrate multiple processors within a single computing device.


● These systems enable parallel execution of tasks across multiple processors.

Significance

● Revolutionized computing landscape.


● Enabled smaller, faster, and more powerful devices.
● Democratized access to computing power:
○ Increased affordability.
○ Widened accessibility to advanced computing capabilities.
Impact

● Facilitated advancements in various fields:


○ Artificial intelligence.
○ Data analytics.
○ Scientific research.

Symmetric Multiprocessing
● Processing done by multiple processors that share a
common OS and memory.
● The processors share the same input/output (I/O) bus or
data path.
● This shared memory architecture is fundamental to the
operation of SMP systems and is typically implemented
using various techniques: Physical Memory, System bus
arbitration

4
2
Working of Multiprocessor

Flynn Classification
Flynn Classification

● Single Instruction, Single Data


● Single Instruction, Multiple Data (SIMD):
● Multiple Instruction, Single Data (MISD)
● Multiple Instruction, Multiple Data (MIMD)

6
3
Difference

Classification Instruction Stream Data Stream Example

SISD One One Traditional Von Neumann


architecture

SIMD One Multiple Vector processors like GPUs


or SIMD extensions in CPUs

MISD Multiple One Space shuttle flight control


system.

MIMD Multiple Multiple Multi-core CPUs, distributed


systems, clusters

Multiprocessor - 1
Synchronization
● A spin lock is a synchronization technique used in concurrent programming. When a thread wants to
access a shared resource protected by a spin lock and finds it already locked, instead of waiting, it
repeatedly checks the lock in a loop until it becomes available.

8
4
Multiprocessor - 1
Barrier
● A synchronization construct used in concurrent programming to ensure that a group of threads or
processes reach a designated point (the barrier) in their execution before any of them are allowed to
proceed further

Intuitive Model: Sequential Consistency (SC)

A multiprocessor is sequentially consistent if the result of any execution is the same as if


the operations of all the
processors were executed in some sequential order, and the operations of each
individual processor appear in this sequence in the order specified by its program.

10
5
Relaxed Memory Models
Why do we need Relaxed Consistency ?
- To keep hardware simple and performance high, relax the ordering requirements
- Relaxed Memory Models

11

Relaxed Consistency Models


SC maintains all memory access orderings -> very strict

- Can we relax some or all memory access orderings ?

12
6
Local Ordering: No Relaxing (SC)
- All prior LOADs and prior STORES must be performed ->
LOAD is performed
- All prior LOADs and previous STORE must be performed ->
STORE is performed

SC: Perform memory operations in-program order


- No Out-of-Order (OoO) execution for memory operations
- Any miss will stall the memory operations behind it

13

Local Ordering: Relaxing W→R


- Initially proposed for processors with inorder
pipelines -> allow Post-retirement Store Buffers

- Later loads can bypass earlier stores to independent


addresses

- TSO (Total Store Ordering) and Processor


Consistency are 2 examples of memory model with
this relaxation

14
7
Local Ordering: Relaxing W→W & R→RW
- In Processor Consistency and TSO, W→W and R→R are still enforced
- Enforcing R→R order strictly (i.e., reads executed in the order issued)
- Strict enforcement of Write-to-Write order

15

Relax Constraints on Memory Orders

16
8
Memory Model: Weak Ordering

- In well-synchronized program, all reorderings


inside a critical section should be allowed
- Data-race freedom ensures that no other thread
can observe the order of execution
- Mark instructions used for synchronization

17

Memory Model: Release Consistency


- Similar to Weak Ordering but distinguishes between:
- SYNCH op used to start a critical section (Acquire)
- SYNCH op used to end a critical section (Release)

18
9
Instruction Level Parallelism - ILP
Definition
● Used to refer to the architecture in which multiple operations can be performed parallelly in a particular process,
with its own set of resources – address space, registers, identifiers, state, and program counters.\
● It refers to the compiler design techniques and processors designed to execute operations, like memory load and
store, integer addition, and float multiplication, in parallel to improve the performance of the processors.

Example
1. y1 = x1*1010
2. y2 = x2*1100
3. z1 = y1+0010
4. z2 = y2+0101
5. t1 = t1+1
6. p = q*1000
7. clr = clr+0010
8. r = r+0001

19

Instruction Level Parallelism - ILP


Advantages of Instruction-Level Parallelism
● Improved Performance: ILP can significantly improve the performance of processors by allowing multiple
instructions to be executed simultaneously or out-of-order. This can lead to faster program execution and
better system throughput.
● Efficient Resource Utilization: ILP can help to efficiently utilize processor resources by allowing multiple
instructions to be executed at the same time. This can help to reduce resource wastage and increase efficiency.
● Reduced Instruction Dependency: ILP can help to reduce the number of instruction dependencies, which can
limit the amount of instruction-level parallelism that can be exploited. This can help to improve performance
and reduce bottlenecks.
● Increased Throughput: ILP can help to increase the overall throughput of processors by allowing multiple
instructions to be executed simultaneously or out-of-order. This can help to improve the performance of multi-
threaded applications and other parallel processing tasks.

20
10
Instruction Level Parallelism - ILP

Disadvantages of Instruction-Level Parallelism


● Increased Complexity: Implementing ILP can be complex and requires additional hardware resources, which
can increase the complexity and cost of processors.
● Instruction Overhead: ILP can introduce additional instruction overhead, which can slow down the execution
of some instructions and reduce performance.
● Data Dependency: Data dependency can limit the amount of instruction-level parallelism that can be
exploited. This can lead to lower performance and reduced throughput.
● Reduced Energy Efficiency: ILP can reduce the energy efficiency of processors by requiring additional
hardware resources and increasing instruction overhead. This can increase power consumption and result in
higher energy costs.

21

Thread Level Parallelism -TLP

Motivation:
● A single thread leaves a processor underutilized
● By doubling processor area, single thread performance barely improves

-> Multiple threads share the same large processor, leading to reduces underutilization, efficient resource
allocation.

Strategies for thread level parallelism:


● Simultaneous Multi-Threading - SMT: Multiple threads executed simultaneously.
● Chip Multi-Processing - CMP: Multiple cores on the same die.

Multithreading: Multiple thread to share the functional units of one processor via overlapping by duplicating
independent state of each thread e.g., a separate copy of register file or a separate PC. Memory shared through the virtual
memory mechanisms, which already support multiple processes. Hardware for fast thread switch which is much faster
than full process switch

22
11
Thread Level Parallelism -TLP

Fine-Grained Multithreading:
● Switch between threads on each instruction, causing the
execution of multiples threads to be interleaved
● Usually done in a round-robin fashion, skipping any stalled
threads

Coarse-Grained Multithreading: switches threads only on costly


stalls, such as cache misses

23

Thread Level Parallelism -TLP

24
12
Multithreading vs Superscalar

25

13

You might also like