0% found this document useful (0 votes)

89 views84 pages

COA - Unit 4

This document discusses parallelism in computer organization and architecture. It covers the need for parallelism, types of parallelism including hardware and software parallelism, applications of parallelism, and challenges in parallelism. It also describes Flynn's classification of parallel architectures including SISD, SIMD, MIMD, and architectures like multi-core processors.

Uploaded by

AARAV DOGRA (RA2011028010025)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views84 pages

COA - Unit 4

Uploaded by

AARAV DOGRA (RA2011028010025)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 84

18CSC203J-Computer Organization

and Architecture
UNIT - 4
Parallelism

2
Course Outcome
• CLR-3 : Understand the concepts of Pipelining and
basic processing units
• CLR-4 : Study about parallel processing and
performance considerations.

• CLO-3 : Analyze the detailed operation of Basic

Processing units and the performance of Pipelining
• CLO-4 : Analyze concepts of parallelism and multi-
core processors.
Contents
• Parallelism
• Need for parallelism
• Types of Parallelism
• Applications of Parallelism
• Parallelism in Software
– Instruction level parallelism
– Data level parallelism
• Challenges in Parallelism
• Architecture of Parallel system
– Flynn’s Classification
– SISD , SIMD
– MIMD, MIMD
• Hardware Multi threading
– Coarse grain Parallelism
– Fine grain Parallelism
• Uni-Processor and MultiProcessor
• Muti-core Processor
• Memory in multi-processor system
• Cache Coherency in multi-processor system
• MESI Protocol for multi-processor system

4
Parallelism
• Executing two or more operations at the same time is known
as parallelism.
• Parallel processing is a method to improve computer system
performance by executing two or more instructions
simultaneously
• A parallel computer is a set of processors that are able to
work cooperatively to solve a computational problem.
• Two or more ALUs in CPU can work concurrently to increase
throughput
• The system may have two or more processors operating
concurrently

5
Goals of parallelism

• To increase the computational speed (ie) to reduce

the amount of time that you need to wait for a
problem to be solved
• To increase throughput (ie) the amount of processing
that can be accomplished during a given interval of
time
• To improve the performance of the computer for a
given clock speed
• To solve bigger problems that might not fit in the
limited memory of a single CPU

6
Applications of Parallelism
• Numeric weather prediction
• Socio economics
• Finite element analysis
• Artificial intelligence and automation
• Genetic engineering
• Weapon research and defence
• Medical Applications
• Remote sensing applications
7
Applications of Parallelism

8
Types of parallelism
1. Hardware Parallelism
2. Software Parallelism

• Hardware Parallelism :
The main objective of hardware parallelism is to increase the processing speed. Based on the
hardware architecture, we can divide hardware parallelism into two types: Processor
parallelism and memory parallelism.
• Processor parallelism
Processor parallelism means that the computer architecture has multiple nodes, multiple CPUs
or multiple sockets, multiple cores, and multiple threads.
• Memory parallelism means shared memory, distributed memory, hybrid distributed shared
memory, multilevel pipelines, etc. Sometimes, it is also called a parallel random access machine
(PRAM). “It is an abstract model for parallel computation which assumes that all the processors
operate synchronously under a single clock and are able to randomly access a large shared
memory. In particular, a processor can execute an arithmetic, logic, or memory access operation
within a single clock cycle”. This is what we call using overlapping or pipelining instructions to
achieve parallelism.

9
Hardware Parallelism
• One way to characterize the parallelism in a processor is by the
number of instruction issues per machine cycle.
• If a processor issues k instructions per machine cycle, then it is
called a k-issue processor.
• In a modern processor, two or more instructions can be issued per
machine cycle.
• A conventional processor takes one or more machine cycles to
issue a single instruction. These types of processors are called one-
issue machines, with a single instruction pipeline in the processor.
• A multiprocessor system which built n k-issue processors should be
able to handle a maximum of nk threads of instructions
simultaneously

10
Software Parallelism
• It is defined by the control and data dependence of
programs.
• The degree of parallelism is revealed in the program flow
graph.
• Software parallelism is a function of algorithm,
programming style, and compiler optimization.
• The program flow graph displays the patterns of
simultaneously executable operations.
• Parallelism in a program varies during the execution period .
• It limits the sustained performance of the processor.

11
12
13
14
15
16
17
Software Parallelism - types
Parallelism in Software
Instruction level parallelism
Task-level parallelism
Data parallelism
Transaction level parallelism

18
Instruction level parallelism
• Instruction level Parallelism (ILP) is a measure of
how many operations can be performed in parallel
at the same time in a computer.

• Parallel instructions are set of instructions that do

not depend on each other to be executed.
• ILP allows the compiler and processor to overlap
the execution of multiple instructions or even to
change the order in which instructions are executed.

19
Eg. Instruction level parallelism
Consider the following example
1. x= a+b
2. y=c-d
3. z=x * y
Operation 3 depends on the results of 1 & 2
So ‘Z ‘ cannot be calculated until X & Y are calculated
But 1 & 2 do not depend on any other. So they can
be computed simultaneously.

20
• If we assume that each operation can be
completed in one unit of time then these 3
operations can be completed in 2 units of
time .
• ILP factor is 3/2=1.5 which is greater than
without ILP.
• A superscalar CPU architecture implements ILP
inside a single processor which allows faster
CPU throughput at the same clock rate.
21
Data-level parallelism (DLP)

• Data parallelism is parallelization across

multiple processors in parallel computing
environments.
• It focuses on distributing the data across
different nodes, which operate on the data in
parallel.
• Instructions from a single stream operate
concurrently on several data

22
DLP - example
• Let us assume we want to sum all the elements
of the given array of size n and the time for a
single addition operation is Ta time units.
• In the case of sequential execution, the time
taken by the process will be n*Ta time unit
• if we execute this job as a data parallel job on 4
processors the time taken would reduce to
(n/4)*Ta + merging overhead time units.

23
DLP in Adding elements of array

24
DLP in matrix multiplication

• A[m x n] dot B [n x k] can be finished in O(n) instead of O(m ∗n ∗k )

when executed in parallel using m*k processors.

25
Flynn’s Classification

 Was proposed by researcher Michael J. Flynn in 1966.

 It is the most commonly accepted taxonomy of computer
organization.
 In this classification, computers are classified by whether it
processes a single instruction at a time or multiple
instructions simultaneously, and whether it operates on
one or multiple data sets.

27
Flynn’s Classification

• This taxonomy distinguishes multi-processor computer

architectures according to the two independent dimensions of
Instruction stream and Data stream.
• An instruction stream is sequence of instructions executed by
machine.
• A data stream is a sequence of data including input, partial or
temporary results used by instruction stream.
• Each of these dimensions can have only one of two possible
states: Single or Multiple.
• Flynn’s classification depends on the distinction between the
performance of control unit and the data processing unit rather
than its operational and structural interconnections.

28
Flynn’s Classification
• Four category of Flynn classification

29
SISD
• They are also called scalar • SISD computer having
processor i.e., one instruction at a
time and each instruction have only one control unit, one
one set of operands. processor unit and
• Single instruction: only one
instruction stream is being acted on
single memory unit.
by the CPU during any one clock •
cycle.
• Single data: only one data stream is
being used as input during any one
clock cycle.
• Deterministic execution.
• Instructions are executed
sequentially.

30
SIMD
• A type of parallel computer. • single instruction is
• Single instruction: All processing
executed by different
units execute the same
instruction issued by the control processing unit on
unit at any given clock cycle . different set of data
• Multiple data: Each processing
unit can operate on a different
data element as shown if figure
below the processor are
connected to shared memory or
interconnection network
providing multiple data to
processing unit

31
MISD
• A single data stream is fed • same data flow through
into multiple processing units. a linear array of
• Each processing unit operates
processors executing
on the data independently via
independent instruction. different instruction
• A single data stream is streams
forwarded to different
processing unit which are
connected to different control
unit and execute instruction
given to it by control unit to
which it is attached.

32
MIMD
• Multiple Instruction: every • Different processor
processor may be executing each processing
a different instruction
different task.
stream.
• Multiple Data: every
processor may be working
with a different data stream.
• Execution can be
synchronous or
asynchronous, deterministic
or nondeterministic

33
Hardware Multithreading
• Hardware multithreading allows multiple threads to share the
functional units of a single processor in an overlapping fashion to try to
utilize the hardware resources efficiently.
• To permit this sharing, the processor must duplicate the independent
state of each thread. It Increases the utilization of a processor
• For example, each thread would have a separate copy of register file
and program counter. The memory itself can be shared through virtual
memory mechanisms, which already support multi-programming.
• In addition, hardware must support to change to a different thread
relatively quickly. In particular, a thread switch should be much more
efficient than a process switch, which typically requires hundreds to
thousands of processor cycles while a thread switch can be
instantaneous.

34
Fine grained multi threading
• Fine-grained multithreading switches between threads on each
instruction, resulting in interleaved execution of multiple threads.
• This interleaving is often done in a round-robin fashion, skipping any
threads that are stalled at that clock cycle.
• To make fine-grained multithreading practical, processor must be able
to switch threads on every clock cycle.
• One advantage of fine-grained multithreading is that it can hide
throughput losses that arise from both short and long stalls, since
instructions from or threads can be executed when one thread stalls.
• The primary disadvantage of fine-grained multithreading is that it slows
down execution of individual threads, since a thread that is ready to
execute without stalls will be delayed by instructions from or threads.

35
Coarse grained multi threading
• Coarse-grained multithreading was invented as an alternative to fine-grained
multithreading.
• Coarse-grained multithreading switches threads only on costly stalls, such as last-level
cache misses.
• This change relieves need to have thread switching be extremely fast and is much less
likely to slow down execution of an individual thread, since instructions from or threads
will only be issued when a thread encounters a costly stall.
• Drawback: it is limited in its ability to overcome throughput losses, especially from shorter
stalls.
• This limitation arises from pipeline start-up costs of coarse-grained multithreading.
Because a processor with coarse-grained multithreading issues instructions from a single
thread, when a stall occurs, pipeline must be emptied or frozen.
• The new thread that begins executing after stall must fill pipeline before instructions will
be able to complete. Due to start-up overhead, coarse-grained multithreading is much
more useful for reducing penalty of high-cost stalls, where pipeline refill is negligible
compared to stall time.

36
Comparison

37
Single-core computer

38
Single-core CPU chip
the single core

39
Multi-core architectures
• Replicate multiple processor cores
on a single die.
Core 1 Core 2 Core 3
Core 4

Multi-core CPU chip 40

Multi-core CPU chip
• The cores fit on a single processor socket
• Also called CMP (Chip Multi-Processor)

c c c c

o o o o

r r r r

e e e e

1 2 3 4
41
The cores run in parallel
thread 1 thread 2 thread 3 thread 4

c c c c
o o o o
r r r r

e e e e
1 2 3 4

42
Within each core, threads are time-sliced (just
like on a uniprocessor)
several several several several

threads threads threads threads

c c c c
o o o o
r r r r

e e e e
1 2 3 4

43
Memory in Multiprocessor System
• Two architectures:
– Shared common memory
– Unshared Distributed memory.

44
Shared memory multiprocessors

• Shared memory multiprocessors

• A system with multiple CPUs “sharing” the same main memory
is called Shared memory multiprocessor.
• In a multiprocessor system all processes on the various CPUs
share a unique logical address space.
• Multiple processors can operate independently but share the
same memory resources.
• Changes in a memory location effected by one processor are
visible to all other processors.
• Shared memory machines can be divided into two main
classes based upon memory access times: UMA , NUMA.

45
Uniform Memory Access (UMA)

• Most commonly represented today by Symmetric

Multiprocessor (SMP) machines.
• Identical processors .
• Equal access and access times to memory .
• Sometimes called CC-UMA - Cache Coherent UMA. Cache
coherent means if one processor updates a location in
shared memory, all the other processors know about the
update. Cache coherency is accomplished at the hardware
level.
• It can be used to speed up the execution of a single large
program in time critical applications

46
Non-Uniform Memory Access (NUMA)
• these systems have a shared logical address space, but
physical memory is distributed among CPUs, so that
access time to data depends on data position, in local
or in a remote memory (thus the NUMA
denomination)
• •These systems are also called Distributed Shared
Memory (DSM)architecture
• Memory access across link is slower
• If cache coherency is maintained, then may also be
called CC-NUMA - Cache Coherent NUMA
47
• The COMA model : The COMA model is a special
case of NUMA machine in which the distributed
main memories are converted to caches. All caches
form a global address space and there is no memory
hierarchy at each processor node.
• Data have no specific “permanent” location (no
specific memory address) where they stay and when
they can be read (copied into local caches) and/or
modified (first in the cache and then updated at
their “permanent” location).
48
Shared Memory

Uniform Memory Access Non-Uniform Memory Access

49
Distributed memory systems
• Distributed memory systems require a communication network to
connect inter-processor memory.
• Processors have their own local memory.
• Memory addresses in one processor do not map to another processor,
so there is no concept of global address space across all processors.
• Because each processor has its own local memory, it operates
independently.
• Changes it makes to its local memory have no effect on the memory of
other processors. Hence, the concept of cache coherency does not
apply.
• When a processor needs access to data in another processor, it is
usually the task of the programmer to explicitly define how and when
data is communicated.
• Synchronization between tasks is likewise the programmer's
responsibility. 50
The memory hierarchy
• If simultaneous multithreading only:
– all caches shared
• Multi-core chips:
– L1 caches private
– L2 caches private in some architectures
and shared in others
• Memory is always shared

51
“Fish” machines
hyper-threads
• Dual-core
Intel Xeon processors

CORE1

CORE0
• Each core is
L1 cache L1 cache
hyper-threaded
L2 cache

• Private L1 caches
memory
• Shared L2 caches

52
Designs with private L2 caches

CORE0

CORE1

CORE0
CORE1

L1 cache L1 cache L1 cache L1 cache

L2 cache L2 cache L2 cache L2 cache

L3 cache L3 cache
memory
memory

Both L1 and L2 are private

A design with L3 caches
Examples: AMD Opteron,
AMD Athlon, Intel Pentium D Example: Intel Itanium 2
53
Private vs shared caches?
• Advantages/disadvantages?

54
Private vs shared caches
• Advantages of private:
– They are closer to core, so faster access
– Reduces contention
• Advantages of shared:
– Threads on different cores can share the
same cache data
– More cache space available if a single (or a
few) high-performance thread runs on the
system
55
The cache coherence problem
• Since we have private caches:
How to keep the data consistent across caches?
• Each core should perceive the memory as a
monolithic array, shared by all the cores

56
The cache coherence problem
Suppose variable x initially contains 15213

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache

multi-core chip
Main memory
x=15213 57
The cache coherence problem
Core 1 reads x

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=15213

multi-core chip
Main memory
x=15213 58
The cache coherence problem
Core 2 reads x

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=15213 x=15213

multi-core chip
Main memory
x=15213 59
The cache coherence problem
Core 1 writes to x, setting it to 21660

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=15213

multi-core chip
Main memory assuming
x=21660 write-through
60
caches
The cache coherence problem
Core 2 attempts to read x… gets a stale copy

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=15213

multi-core chip
Main memory
x=21660
61
Solutions for cache coherence
• This is a general problem with
multiprocessors, not limited just to multi-core
• There exist many solution algorithms,
coherence protocols, etc.

• A simple solution:
invalidation-based protocol with
snooping

62
Inter-core bus

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache

multi-core chip
Main memory
inter-core
bus 63
Invalidation protocol with
snooping
• Invalidation:
If a core writes to a data item, all other
copies of this data item in other caches
are invalidated
• Snooping:
All cores continuously “snoop” (monitor)
the bus connecting the cores.

64
Bus Snooping
• Each CPU (cache system) ‘snoops’ (i.e. watches
continually) for write activity concerned with
data addresses which it has cached.• This
assumes a bus structure which is ‘global’, i.e all
communication can be seen by all.

65
The cache coherence problem
Revisited: Cores 1 and 2 have both read x

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=15213 x=15213

multi-core chip
Main memory
x=15213
66
The cache coherence problem
Core 1 writes to x, setting it to 21660

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=15213

sends INVALIDATED
invalidation
multi-core chip
request
Main memory assuming inter-core
x=21660 write-through bus 67
caches
The cache coherence problem
After invalidation:

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660

multi-core chip
Main memory
x=21660 68
The cache coherence problem
Core 2 reads x. Cache misses, and loads the new copy.

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=21660

multi-core chip
Main memory
x=21660 69
Alternative to invalidate protocol:
update protocol
Core 1 writes x=21660:

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=21660
UPDATED

broadcasts
multi-core chip
updated
value Main memory assuming inter-core
x=21660 write-through bus 70
caches
Which do you think is better?
Invalidation or update?

71
Invalidation vs update
• Multiple writes to the same location
– invalidation: only the first time
– update: must broadcast each write
(which includes new variable value)

• Invalidation generally performs better:

it generates less bus traffic

72
MESI Protocol

For Multiprocessor Systems

73
• The MESI protocol is an Invalidate-based cache
coherence protocol, and is one of the most common
protocols which support write-back caches.
• Write back caches can save a lot on bandwidth that is
generally wasted on a write through cache.
• There is always a dirty state present in write back
caches which indicates that the data in the cache is
different from that in main memory

74
MESI Protocol (2)
Any cache line can be in one of 4 states (2 bits)
• Modified - The cache line is present only in the current cache, and is dirty
- it has been modified (M state) from the value in main memory. The
cache is required to write the data back to main memory at some time in
the future, before permitting any other read of the (no longer valid) main
memory state. The write-back changes the line to the Shared state(S).
• Exclusive – The cache line is present only in the current cache, but is clean
- it matches main memory. It may be changed to the Shared state at any
time, in response to a read request. Alternatively, it may be changed to
the Modified state when writing to it.
• Shared – Indicates that this cache line may be stored in other caches of
the machine and is clean - it matches the main memory. The line may be
discarded (changed to the Invalid state) at any time
• Invalid – Indicates that this cache line is invalid (unused).

75
Operation

• A processor P1 has a Block X in its Cache, and there is a request

from the processor to read or write from that block.
• The second stimulus comes from other processors, which
doesn't have the Cache block or the updated data in its Cache.
• The bus requests are monitored with the help of Snoopers
which snoops all the bus transactions.
• Following are the different type of Processor requests and Bus
side requests:
• Processor Requests to Cache includes the following operations:
• PrRd: The processor requests to read a Cache block.
• PrWr: The processor requests to write a Cache block

76
• Bus side requests are the following:
• BusRd: Snooped request that indicates there is a read request to a Cache
block made by another processor
• BusRdX: Snooped request that indicates there is a write request to a Cache
block made by another processor which doesn't already have the block.
• BusUpgr: Snooped request that indicates that there is a write request to a
Cache block made by another processor but that processor already has
that Cache block resident in its Cache.
• Flush: Snooped request that indicates that an entire cache block is written
back to the main memory by another processor.
• FlushOpt: Snooped request that indicates that an entire cache block is
posted on the bus in order to supply it to another processor(Cache to
Cache transfers).

77
78
• Snooping Operation: In a snooping system, all caches
on the bus monitor (or snoop) all the bus transactions.
Every cache has a copy of the sharing status of every
block of physical memory it has stored. The state of
the block is changed according to the State Diagram of
the protocol used.(Refer image above for MESI state
diagram). The bus has snoopers on both sides:
• Snooper towards the Processor/Cache side.
• The snooping function on the memory side is done by
the Memory controller.
79
State Transitions and response to various
Processor Operations

80
81
82
Illustration of MESI protocol operations
• Let us assume that the following stream of read/write references. All the references are to
the same location and the digit refers to the processor issuing the reference.
• The stream is : R1, W1, R3, W3, R1, R3, R2.
• Initially it is assumed that all the caches are empty.

83
• Step 1: As the cache is initially empty, so the main memory
provides P1 with the block and it becomes exclusive state.
• Step 2: As the block is already present in the cache and in
an exclusive state so it directly modifies that without any
bus instruction. The block is now in a modified state.
• Step 3: In this step, a BusRd is posted on the bus and the
snooper on P1 senses this. It then flushes the data and
changes its state to shared. The block on P3 also changes
its state to shared as it has received data from another
cache. There is no main memory access here.

84
• Step 4: Here a BusUpgr is posted on the bus and the snooper on P1
senses this and invalidates the block as it is going to be modified by
another cache. P3 then changes its block state to modified.
• Step 5: As the current state is invalid, thus it will post a BusRd on the
bus. The snooper at P3 will sense this and so will flush the data out.
The state of the both the blocks on P1 and P3 will become shared now.
Notice that this is when even the main memory will be updated with
the previously modified data.
• Step 6: There is a hit in the cache and it is in the shared state so no bus
request is made here.
• Step 7: There is cache miss on P2 and a BusRd is posted. The snooper
on P1 and P3 sense this and both will attempt a flush. Whichever gets
access of the bus first will do that operation.

Weekend Windfalls: Trading Manual & Quick-Start Guide
No ratings yet
Weekend Windfalls: Trading Manual & Quick-Start Guide
33 pages
Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
No ratings yet
Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
14 pages
Facts & Formulae Chemistry
100% (2)
Facts & Formulae Chemistry
53 pages
BMS Procedure
100% (3)
BMS Procedure
138 pages
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
No ratings yet
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
118 pages
JLG-860SJ - en
No ratings yet
JLG-860SJ - en
142 pages
Final hmt-1 PDF
No ratings yet
Final hmt-1 PDF
211 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
CSC580 Quick Notes Lect1and2
100% (1)
CSC580 Quick Notes Lect1and2
18 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Unit V
No ratings yet
Unit V
95 pages
(Out of 100 Marks) (Out of 40 Marks) : Mathematics (Paper 1)
No ratings yet
(Out of 100 Marks) (Out of 40 Marks) : Mathematics (Paper 1)
38 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Final Demo
100% (1)
Final Demo
47 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Flynns
No ratings yet
Flynns
41 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Volume Shockers (Stocks With Rising Volumes), Technical Analysis Scanner
No ratings yet
Volume Shockers (Stocks With Rising Volumes), Technical Analysis Scanner
2 pages
07 - Chapter 1 PDF
No ratings yet
07 - Chapter 1 PDF
27 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Food Rules, 2027 (1970) : Amendments: 1. 2. 3. 4. 5
No ratings yet
Food Rules, 2027 (1970) : Amendments: 1. 2. 3. 4. 5
50 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
34 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Skin Rejuvenation Regimens
No ratings yet
Skin Rejuvenation Regimens
5 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Parallelism
No ratings yet
Parallelism
22 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Acetyline Cylinder PDF
No ratings yet
Acetyline Cylinder PDF
2 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Architecture of Parallel Computing
No ratings yet
Architecture of Parallel Computing
6 pages
Unit 5
No ratings yet
Unit 5
96 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
Giancoli Chap 3 Vectors Kinematics in 2 Dimensions
No ratings yet
Giancoli Chap 3 Vectors Kinematics in 2 Dimensions
37 pages
612 D Fig702 Flanged y Type Strainer Ul
No ratings yet
612 D Fig702 Flanged y Type Strainer Ul
2 pages
Analysis2 Final Exam 2022 PDF
No ratings yet
Analysis2 Final Exam 2022 PDF
3 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Financial Math Assignment
No ratings yet
Financial Math Assignment
2 pages
04 - RPi Pico - Measure Distance With Ultrasonic Sensor HC-SR04
No ratings yet
04 - RPi Pico - Measure Distance With Ultrasonic Sensor HC-SR04
6 pages
Unit 5
No ratings yet
Unit 5
66 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
792224FT-371008 - 207-00011
No ratings yet
792224FT-371008 - 207-00011
13 pages
J24 Jimmys Combo
No ratings yet
J24 Jimmys Combo
54 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
98 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Chapter-4 Bullet
No ratings yet
Chapter-4 Bullet
5 pages
Status and Prospects For Helicopter Apus in Russia Gavrilov V.V., Ponomarev B.A
No ratings yet
Status and Prospects For Helicopter Apus in Russia Gavrilov V.V., Ponomarev B.A
16 pages
Magic and The Mind
No ratings yet
Magic and The Mind
379 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
COA UNIT 5 (AutoRecovered)
No ratings yet
COA UNIT 5 (AutoRecovered)
14 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
CNS Unit 3
No ratings yet
CNS Unit 3
94 pages
Downloadfile
No ratings yet
Downloadfile
16 pages
Climate Change
No ratings yet
Climate Change
5 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Javascriptinterviewquestions 240713104909 D9bedd8b
No ratings yet
Javascriptinterviewquestions 240713104909 D9bedd8b
25 pages
08 Parallel Algorithms Approches
No ratings yet
08 Parallel Algorithms Approches
12 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Ict Lesson 9 Notes
No ratings yet
Ict Lesson 9 Notes
1 page
Unit 4
No ratings yet
Unit 4
16 pages
On The Wine-Dark Sea
No ratings yet
On The Wine-Dark Sea
1 page
Key Differences Between Industrial All Risk (IAR) and PAR
No ratings yet
Key Differences Between Industrial All Risk (IAR) and PAR
2 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
PAG Unit1
No ratings yet
PAG Unit1
64 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
100% (1)
Corrosion of Aluminium 2nd Edition Christian Vargel Instant Download
62 pages
IGNOU BCA Operating System Concepts and Networking Management Previous Year Solved Papers MCS 022
From Everand
IGNOU BCA Operating System Concepts and Networking Management Previous Year Solved Papers MCS 022
Manish Soni
No ratings yet
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

COA - Unit 4

Uploaded by

COA - Unit 4

Uploaded by

18CSC203J-Computer Organization

• CLO-3 : Analyze the detailed operation of Basic

• To increase the computational speed (ie) to reduce

• Parallel instructions are set of instructions that do

• Data parallelism is parallelization across

• A[m x n] dot B [n x k] can be finished in O(n) instead of O(m ∗n ∗k )

 Was proposed by researcher Michael J. Flynn in 1966.

• This taxonomy distinguishes multi-processor computer

Multi-core CPU chip 40

threads threads threads threads

• Shared memory multiprocessors

• Most commonly represented today by Symmetric

Uniform Memory Access Non-Uniform Memory Access

L1 cache L1 cache L1 cache L1 cache

L2 cache L2 cache L2 cache L2 cache

Both L1 and L2 are private

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

• Invalidation generally performs better:

For Multiprocessor Systems

• A processor P1 has a Block X in its Cache, and there is a request

You might also like