0% found this document useful (0 votes)

20 views34 pages

Aca Unit 1

Uploaded by

blackpink7010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views34 pages

Aca Unit 1

Uploaded by

blackpink7010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Advanced Computer Architecture

Module I
Parallel Processing

Jyoti Kumari

1
Parallel Processing
Definition:
• Parallel processing can be described as a class of techniques which
enables the system to achieve simultaneous data-processing tasks
to increase the computational speed of a computer system.
• A parallel processing system can carry out simultaneous
data-processing to achieve faster execution time.
• For instance, while an instruction is being processed in the ALU
component of the CPU, the next instruction can be read from
memory.

Why?
• It enhances the computer’s processing capability
• increases its throughput, i.e. the amount of processing that can be
accomplished during a given interval of time.

2
Parallel Computer Models

3
Parallel/Vector Computers
• Parallel computers are those that execute programs in MIMD mode.
• There ate two major classes of parallel computers:
Shared-memory multiprocessors
Message passing multicomputers

• The major distinction between multiprocessors and multicomputers lies in memory sharing
and the mechanisms used for interprocessor communication.
• A vector processor is equipped with multiple vector pipelines that can be concurrently used
under hardware or firmware control.
There are two families of pipelined vector processors:
Memory-to-memory:
This architecture supports the pipelined flow of vector operands directly from the memory
to pipelines and then back to the memory.
Register-to-register:
This architecture uses vector registers to interface between the memory and functional
pipelines.

4
System Attributes to Performance
• The ideal performance of a computer system demands a perfect match between machine
capability and program behaviour.
• Machine capability can be enhanced with better hardware technology, innovative architectural
features, and efficient resources management.
• Program behaviour is difficult to predict due to its heavy dependence on application and
run-time conditions.
• Other factors affecting program behaviour
algorithm design, data structures,
language efficiency, programmer skill,
and compiler technology.

• These attributes/ performance indicators guide system architects in designing better machines
or to educate programmers or compiler writers in optimizing the codes for more efficient
execution by the hardware.

5
• Computer architects have come up with a variety of metrics to describe the computer
performance:
Clock rate and CPI :
• Since I/O and system overhead frequently overlaps processing by other programs, it is fair to
consider only the CPU time used by a program, and the user CPU time is the most important
factor.
• CPU is driven by a clock with a constant cycle time (usually measured in nanoseconds, which
controls the rate of internal operations in the CPU.
• The clock mostly has the constant cycle time (τ in nanoseconds).
• The inverse of the cycle time is the clock rate (f = 1/τ, measured in megahertz).
• A shorter clock cycle time, or equivalently a larger number of cycles per second, implies more
operations can be performed per unit time.
• The size of the program is determined by the instruction count (Ic), the number of machine
instructions to be executed by the program.
• Different machine instructions require different numbers of clock cycles to execute. CPI
(cycles per instruction) is thus an important parameter.

6
• Performance Factors:
Let Ic, be the number of instructions in a given program, or the
instruction count.
The CPU time ( T in seconds/program) needed to execute the program is
estimated by finding the product of three contributing factors:
T = Ic * CPI * τ (1)

• The execution of an instruction goes through a cycle of

events: the instruction fetch, decode,
operand( s) fetch, execution, and store results.
• In this cycle, only the instruction decode and execution phases are
carried out in the CPU.
• The remaining three operations may require access to the memory.

7
• The time required to access memory is called the memory cycle, which is
usually k times the processor cycle time τ.
• The value of k depends on the memory technology and the processor-memory
interconnection scheme.
• The processor cycles required for each instruction (CPI) can be attributed to
– cycles needed for instruction decode and execution (p),
– and cycles needed for memory references (m* k).

The total time needed to execute a program can then be rewritten as

T = Ic* (p + m*k)*τ. (2)

m= number of memory references

k= memory-access latency (ratio between memory cycle and processor cycle)
p=processor cycle per instruction
Τ= processor cycle time

8
System Attributes:
• The five performance factors (Ic, p, m, k, τ) are influenced by four system
attributes:
Instruction-set architecture: It affects Ic and p
Compiler technology: It affects Ic , p and m
CPU implementation and control: Determine total processor time
needed (p*τ )
and cache and memory hierarchy: Affect memory latency (k* τ)
The attributes are shown in table 1.2.

9
10
MIPS Rate:
• The millions of instructions per second, this is calculated by dividing the number of instructions
executed in a running program by time required to run the program.
• Let C be the total number of clock cycles needed to execute a given program.
• Then the CPU time in Eq. 2 can be estimated as T= C * τ = C/f .
• CPI = C/Ic and T = Ic * CPI * τ = Ic * CPI/f.

• The processor speed is often measured in terms of million instructions per second (MIPS) or we
can say MIPS rate of a given processor.
• MIPS rate = Ic / (T * 106) = f / (CPI * 106 ) = (fc * I ) / (C * 106)
(3)
• Based on eq 3, the CPU time in eq 2 can also be written as

T = I * 10-6 /MIPS
c

• MIPS rate is directly proportional to the clock rate (f) and inversely proportion to the CPI.
• All four systems attributes (instruction set, compiler, processor, and memory technologies) affect
the MIPS rate.

11
Floating Point Operations per Second
• Most compute-intensive applications in science and engineering make heavy use of floating
point operations.
• Compared to instructions per second, for such applications a more relevant measure of
performance is floating point operations per second, which is abbreviated as flops.
• With prefix mega. (106), giga (109) tera (1012) and peta (1015) this is written as megaflops
(mf1ops), gigaflops (gflops), teraflops or petaflops.
Throughput Rate
How many programs a system can execute per unit time, called the system throughput Ws (in
programs/second).
In a multiprogrammed system, the system throughput is often lower than the CPU throughput
Wp, defined by:
Wp = f / (Ic* CPI)
(4)
• Note that Wp, = (MIPS) * 106/I c, from Eq. 3.
• The unit for Wp is programs/second.
• The CPU throughput is a measure of how many programs can be executed per second, based
only on the MIPS rate and average program length (Ic).

12
• An implicit approach uses a
conventional language, such as C, C++,
Fortran, or Pascal, to write the source
program.
• The sequentially coded source program
is translated into parallel object code by
a parallelizing compiler. As illustrated in
Fig. 1.a, this compiler must be able to
detect parallelism and assign target
machine resources.
• This compiler approach has been
applied in programming shared-memory
multiprocessors. With parallelism being
implicit, success relies heavily on the
“intelligence” of a parallelizing compiler.
• This approach requires less effort on
the part of the programmer. Fig. 1.a.

13
• Explicit Parallelism This
approach (Fig. 1.b.) requires more
effort by the programmer to
develop a source program using
parallel dialects of C, C++,
Fortran, or Pascal.
• Parallelism is explicitly specified
in the user programs. This reduces
the burden on the compiler to
detect parallelism.
• Instead, the compiler needs to
preserve parallelism and, where
possible, assigns target machine
resources.
Fig. 1.b.

14
Multiprocessors and Multicomputers
• Shared-Memory Multiprocessors
– UMA Model
– NUMA Model
– COMA Model
– Discussed in Flyn’s Classification PPT

• Distributed-Memory Multicomputers

15
Multivector and SIMD computers
Vector Supercomputers
• A vector computer is often built on
top of a scalar processor.
• As shown in Fig. 2., the vector
processor is attached to the scalar
processor as an optional feature.
• Program and data are first loaded
into the main memory through a host
computer.
• All instructions are first decoded by
the scalar control unit.
• If the decoded instruction is a scalar
operation or a program control
operation, it will be directly executed
by the scalar processor using the Fig. 2. Architecture of vector processor
scalar functional pipelines.

16
• If the instruction is decoded as a vector operation, it will be sent to the
vector control unit.
• This control unit will supervise the flow of vector data between the main memory
and vector functional pipelines.
• The vector data flow is coordinated by the control unit. A number of
vector functional pipelines may be built into a vector processor.
• Two pipeline vector supercomputer models are described below.
• vector processor models
– Register-to-register architecture
– Memory-to-memory architecture

Register-to-register
• Vector registers are used to hold the vector operands, intermediate and final vector
results.
• The vector functional pipelines retrieve operands from and put results into
the vector registers. All vector registers are programmable in user
instructions.
• Each vector register is equipped with a component counter which keeps track of the
component registers used in successive pipeline cycles.
17
• The length of each vector register is usually fixed, say, sixty-four 64-bit
component registers in a vector register in a Cray Series supercomputer.
• Other machines, like the Fujitsu VPZUDG Series, rise reconfigurable vector
registers to dynamically match the register length with that of the vector
operands.
• In general, there are fixed numbers of vector registers and functional pipelines
in a vector processor. Therefore. both resources must he reserved in advance to
avoid resource conflicts between different vector operations.

• A memory-to-memory architecture differs from a register-to-register

architecture in the use of a vector stream unit to replace the vector registers.
• Vector operands and results are directly retrieved from and stored into the main
memory in superwords, say, 512 bits as in the Cyber 205.

18
SIMD Supercomputers

SIMD Machine model:

An operational model of an SIMD computer is
specified by a 5-tuple:
M=(N, C, I, M, R)

where –
1. N is the number of processing elements (PEs) in
the machine.
For example, the llliac iV had 64 PEs and the
Connection Machine CM-2 had 65,536 PEs.
2. C is the set of instructions directly executed
by the control unit (CU), including scalar and
program flow control instructions.
3. I is the set of instructions broadcast by the CU to
all PEs for parallel execution. These include
arithmetic, logic, data routing, masking, and
other local operations executed by each active PE
over data within that PE.
4. M is the set of masking schemes, where each
mask partitions the set of PEs into enabled and
disabled subsets.
5. R is the set of data-routing functions, specifying
various patterns to be set up in the
interconnection network for inter-PE
communications. Fig. 3. Operational model of SIMD

19
PRAM and VLSI Models
Parallel Random Access Machines (PRAM) Models
• Theoretical model
• These models are often used by algorithm designers and VLSI device/chip developers.
• The ideal models provide a convenient framework for developing parallel algorithms
without worrying about the implementation details or physical constraints.

Time and space complexities

• The complexity of an algorithm for solving a problem of size s on a computer is
determined by the execution time and the storage space required.
• The time complexity is a function of the problem size.
• The time complexity function in order notation is the asymptotic time complexity of the
algorithm.
• Usually, the worst-case time complexity is considered.
• For example, a time complexity g(s) is said to be O(f(s)), if there exist positive constants
c1, c2 and s0, such that c1 f(s) <= g(s) <= c2f(s) for all non negative values of s > s0.

20
• The space complexity can be similarly defined as a function of the problem size s.
• The asymptotic space complexity refers to the data storage of large problems.
• Note that the program (code) storage requirement and the storage for input data
are not considered in this.
• The time complexity of a serial algorithm is simply called serial complexity.
• The time complexity of a parallel algorithm is called parallel complexity.
• Intuitively, the parallel complexity should be lower than the serial
complexity, at least asymptotically.
• We consider only deterministic algorithms, in which every operational step is
uniquely defined in agreement with the way programs are executed on real
computers.
• A non deterministic algorithm contains operations resulting in one outcome from a
set of possible outcomes. There exist no real computers that can execute non
deterministic algorithms.

21
NP-Completeness
• An algorithm has a polynomial complexity if there exists a polynomial
p(s)
such that the time complexity is O(p(s)) for problem size s.
• The set of problems having polynomial-complexity algorithms is called
P-class (for polynomial class).
• The set of problems solvable by nondeterministic algorithms in
polynomial time is called NP-class (for non deterministic polynomial
class).
• Since deterministic algorithms are special cases of the nondeterministic
ones, we know that P is a subset of NP.
• The P-class problems are computationally tractable, while the NP - P-class
problems are intractable. But we do not know whether P = NP or P != NP.
This is still an open problem in computer science.
• To simulate a nondeterministic algorithm with a deterministic algorithm
may require exponential time. Therefore, intractable NP-class problems are
also said to have exponential-time complexity.

22
PRAM Models
• A parallel random access machine (PRAM)
model for modelling idealized parallel
computers with zero synchronization or
memory access overhead.
• This PRAM model will be used for parallel
algorithm development and for scalability and
complexity analysis.
• An n-processor PRAM (Fig. 4) has a globally
addressable memory.
• The shared memory can be distributed among
the processors or centralized in one place.
• The n processors—also called processing
elements (PEs)—operate on a synchronized
read-memory, compute, and write-memory
cycle.
• With shared memory, the model must specify
how concurrent read and concurrent write of Fig. 4. PRAM
memory are handled.

23
Four memory-update options are possible:
• Exclusive read (ER): This allows at most one processor to read from any memory
location in each cycle, a rather restrictive policy.
• Exclusive write (EW): This allows at most one processor to write into a
memory location at a time.
• Concurrent read (CR): This allows multiple processors to read the same information
from the same memory cell in the same cycle.
• Concurrent write (CW): This allows simultaneous writes to the same memory
location. In order to avoid confusion, some policy must be set up to resolve the
write conflicts.

24
• Since CR does not create a conflict problem, various PRAM variants differ mainly in how
they handle the CW conflicts.
PRAM variant:
• Described below are four variants of the PRAM model, depending on how the memory reads
and writes are handled.
1. EREW-PRAM model—This model forbids more than one processor from reading or writing
the same memory cell simultaneously. This is the most restrictive PRAM model proposed.
2. CREW-PRAM Model—The write conflicts are avoided by mutual exclusion. Concurrent
reads to the same memory location are allowed.
3. ERCW-PRAM model—This allows exclusive read or concurrent writes to the same memory
location.
4. CRCW-PRAM model—This model allows either concurrent reads or concurrent writes to
the same memory location.

25
1. CPI= (Ic * Clock cycle count) / Total
number of instructions
= ((450000 * 1) + (320000 * 2) +
(150000 * 2) + (80000 * 2)) /
(1000000)
2. MIPS = I c / (T * 106=
= 1550000/1000000 ) 1.55
= Ic / (Ic * CPI * τ * 106)
= 1/(CPI * (1/f) * 106)
= f/ (CPI * 106 )
f= 400 MHz , CPI =
1.55
MIPS = (400 *
10 ) / (1.55 * 106)
6

= 258.06

3. Execution time
(T) = Ic * CPI * (1/f) 26
• a. The MIPs rate could be
computed as the following:
• [ (MIPS rate) /106 ] = Ic / T
• I = T × [ (MIPS rate) /106]
c
• Now by computing the ratio of the
instruction count of S2 to S1:
• [ x × 1800] / [12x × 100] = 18x / 12x
= 1.5
a.What is the relative size of the
• b. CPI = f/ (MIPS rate * 106)
instruction count of the machine code
for this benchmark program running • For S1, the CPI = (500 MHz) / (100
on the two machines? MIPS) = 5
b.What are the CPI values for the two • For S2, the CPI = (2.5 GHz) / (1800
machines. MIPS) = 1.4

27
Level of Parallelism
Hardware and Software Parallelism
For implementation of parallelism, we need special hardware and software support. Besides
theoretical conditioning, joint efforts between hardware designers and software programmers
are needed to exploit parallelism in upgrading computer performance.

Hardware Parallelism
• This refers to the type of parallelism defined by the machine architecture and hardware
multiplicity.
• It is a function of cost and performance tradeoffs.
• It displays the resource utilization patterns of simultaneously executable operations.
• It also indicates the peak performance of the processor resources.

One way to characterize the parallelism in a processor is by the number of instruction issues per
machine cycle.
• If a processor issues k instructions per machine cycle, then it is called a k-issue processor.
• A conventional pipelined processor takes one machine cycle to issue a single instruction. These
types of processors are called one-issue machines, with a single instruction pipeline in the processor.

28
• In a modern processor, two or more instructions can be issued per machine cycle.
• For example, the Intel i960CA was a three-issue processor with one arithmetic, one memory
access, and one branch instruction issued per cycle.
• The IBM RISC/System 6000 is a four-issue processor capable of issuing one arithmetic, one
memory access, one floating-point. and one branch operation per cycle.

Software parallelism
• This type of parallelism is revealed in the program profile or in the program flow graph.
• It is a function of algorithm, programming style, and program design.
• The program flow graph displays the patterns of simultaneously executable operations.

29
Mismatch between software parallelism
and hardware parallelism

Fig.
• There are eight instructions (four
loads and four arithmetic operations)
to be executed in three consecutive
machine cycles as shown in Fig. 5.a.
• Four load operations are performed
in the first cycle, followed by two
multiply operations in the second
cycle and two add/subtract
operations in the third cycle.
• Therefore, the parallelism varies
from 4 to 2 in three cycles.
• The average software parallelism is
equal to 8/3 = 2.6 instructions per
cycle in this example.
Fig. 5.a.

30
Consider execution of the same
program by a two-issue processor
which can execute one memory
access (load or write) and one
arithmetic (add, subtract, multiply
etc.) operation simultaneously.

With this hardware restriction, the

program must execute in seven
machine cycles as shown in Fig.
5.b.

Therefore. the hardware

parallelism displays an average
value of 8/7 = 1.14
Fig. 5.b. instructions executed per
cycle.

31
• Let us try to match the software
parallelism shown in Fig. 5.a in a
hardware platform of a dual-processor
system, where single-issue processors
are used.
• The achievable hardware parallelism is
shown in Fig. 6, where L/S stands for
load/store operations.
• Note that six processor cycles are needed
to execute the 12 instructions by two
processors.
• S1 and S2 are two inserted store
operations, and l5 and l6, are two
inserted load operations. These added
instructions are needed for
interprocessor communication through
the shared memory.
Fig. 6. Dual Processor Execution of Program in
Fig. 5.a.

32
Types of Software Parallelism

Control Parallelism Data Parallelism

• It allows two or more operations • Here, almost the same operation
to be performed is performed over many data
simultaneously. elements by many processors
simultaneously.
• Control parallelism, appearing in • It offers the highest potential for
the form of pipelining or multiple concurrency. It is practiced in
functional units, is limited by the both SIMD and MIMD modes on
pipeline length and by the MPP systems.
multiplicity of functional units. • Data parallel code is easier to
• Both pipelining and functional write and to debug than control
parallelism are handled by the parallel code.
hardware; programmers need • Synchronization in SIMD data
take no special actions to invoke parallelism is handled by the
hardware.
them.

33
a. Avg. CPI = total C / Ic
• A & L = 60% * 2 x 106 =
Load = 18 % * 2 X 106 =
Branch = 12 % * 2 X
106 =
Mem = 10% * 2 X 106 =

CPI = ( x 1) + ( X 2) +
( X 4) + (X 8)

Ic
= 2.24
b. MIPS rate = 178. 57

Advanced Computer Architecture - Unit 1 - WWW - Rgpvnotes.in
100% (1)
Advanced Computer Architecture - Unit 1 - WWW - Rgpvnotes.in
20 pages
M1-CS405 Computer System Architecture-Ktustudents - in
100% (1)
M1-CS405 Computer System Architecture-Ktustudents - in
97 pages
CSA Performance
No ratings yet
CSA Performance
40 pages
Computer Architecture-Performance - Datapath
No ratings yet
Computer Architecture-Performance - Datapath
68 pages
Slide 1
No ratings yet
Slide 1
33 pages
Performance
No ratings yet
Performance
51 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
23-Performance Parameters-21-02-2023
No ratings yet
23-Performance Parameters-21-02-2023
16 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Flynn's Classification & Performance Metrics
No ratings yet
Flynn's Classification & Performance Metrics
8 pages
DHXD - Chuong 8. Performance
No ratings yet
DHXD - Chuong 8. Performance
27 pages
09 Perf
No ratings yet
09 Perf
22 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Da Ci
No ratings yet
Da Ci
13 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Lecture 1 Computer Abstraction and Performance
No ratings yet
Lecture 1 Computer Abstraction and Performance
25 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Real Time System
No ratings yet
Real Time System
10 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
2 Week
No ratings yet
2 Week
35 pages
Unit 1
No ratings yet
Unit 1
68 pages
CSCI 8150 Advanced Computer Architecture: Hwang, Chapter 1 Parallel Computer Models 1.1 The State of Computing
100% (3)
CSCI 8150 Advanced Computer Architecture: Hwang, Chapter 1 Parallel Computer Models 1.1 The State of Computing
37 pages
CS405 Csa M1
No ratings yet
CS405 Csa M1
5 pages
Computer Architecture Unit1
No ratings yet
Computer Architecture Unit1
20 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Mod 7
No ratings yet
Mod 7
56 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
Nexis GC-2030 Operation Guide 221-79201
No ratings yet
Nexis GC-2030 Operation Guide 221-79201
144 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
Lect 1
No ratings yet
Lect 1
56 pages
Lect 1
No ratings yet
Lect 1
54 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
PP 1
No ratings yet
PP 1
41 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Csa - 1
No ratings yet
Csa - 1
15 pages
Module 1 - Advanced Computer Architecture
No ratings yet
Module 1 - Advanced Computer Architecture
15 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
ACA T1 Solutions
No ratings yet
ACA T1 Solutions
17 pages
Evolution Firewall - 1
100% (1)
Evolution Firewall - 1
31 pages
Comparc Cpo203
No ratings yet
Comparc Cpo203
39 pages
Social Bookmarking Sites
100% (1)
Social Bookmarking Sites
20 pages
Design Thinking and Innovation at Apple
No ratings yet
Design Thinking and Innovation at Apple
36 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Parallel Computer Models: CSE7002: Advanced Computer Architecture
No ratings yet
Parallel Computer Models: CSE7002: Advanced Computer Architecture
37 pages
GTD Todoist Sample LTR
No ratings yet
GTD Todoist Sample LTR
11 pages
Module 4
No ratings yet
Module 4
63 pages
Attacking Metasploitable2 VM Server Cameron W
No ratings yet
Attacking Metasploitable2 VM Server Cameron W
20 pages
STULZ E2 Controller Operation Manual OZU0037M
No ratings yet
STULZ E2 Controller Operation Manual OZU0037M
82 pages
Pranjal Report
No ratings yet
Pranjal Report
49 pages
A Survey of Generative AI Applications
No ratings yet
A Survey of Generative AI Applications
36 pages
LWDMTX5 001
No ratings yet
LWDMTX5 001
333 pages
Iam Ug
No ratings yet
Iam Ug
364 pages
Resources and Help For GIS
No ratings yet
Resources and Help For GIS
5 pages
Consensus Map For Grade 3 Final
No ratings yet
Consensus Map For Grade 3 Final
3 pages
JD - AI - ML Architect
No ratings yet
JD - AI - ML Architect
3 pages
Courier Onboarding
100% (1)
Courier Onboarding
32 pages
Payshield 9000: The Hardware Security Module That Secures The World'S Payments
No ratings yet
Payshield 9000: The Hardware Security Module That Secures The World'S Payments
2 pages
Razorpay
No ratings yet
Razorpay
7 pages
Difference Between NC and CNC Machine - Mechanical Booster
0% (1)
Difference Between NC and CNC Machine - Mechanical Booster
5 pages
Data Peserta Didik Kec. Betoambari - Dapodikdasmen
No ratings yet
Data Peserta Didik Kec. Betoambari - Dapodikdasmen
3 pages
Rapport Stage
No ratings yet
Rapport Stage
22 pages
Oracle Exadata Cloud 2022 Solution Engineer Specialist Assessment
No ratings yet
Oracle Exadata Cloud 2022 Solution Engineer Specialist Assessment
3 pages
Apply A Dynamic Filter To A Remote Table in SAP Datasphere
No ratings yet
Apply A Dynamic Filter To A Remote Table in SAP Datasphere
7 pages
Rest Api v1
No ratings yet
Rest Api v1
10 pages
PAN-OS Release Notes, Version 4.1.11 Rev A
No ratings yet
PAN-OS Release Notes, Version 4.1.11 Rev A
60 pages
Bab - La Phrases Resume CV English Arabic
No ratings yet
Bab - La Phrases Resume CV English Arabic
4 pages
Commands
No ratings yet
Commands
4 pages
How To Install Pantheon Desktop in Arch Linux (Elementary OS's Desktop)
No ratings yet
How To Install Pantheon Desktop in Arch Linux (Elementary OS's Desktop)
12 pages
Bcom PDF
No ratings yet
Bcom PDF
2 pages
Alexsey Belan
No ratings yet
Alexsey Belan
1 page

Aca Unit 1

Uploaded by

Aca Unit 1

Uploaded by

Advanced Computer Architecture

• The execution of an instruction goes through a cycle of

The total time needed to execute a program can then be rewritten as

m= number of memory references

• A memory-to-memory architecture differs from a register-to-register

SIMD Machine model:

Time and space complexities

With this hardware restriction, the

Therefore. the hardware

Control Parallelism Data Parallelism

You might also like