0% found this document useful (0 votes)

36 views45 pages

ACA Mod2

The document discusses different processor designs and their positions in a design space defined by clock rate and cycles per instruction (CPI). It describes how newer technologies have enabled higher clock rates and efforts to lower CPI. Processors can be categorized as CISC, RISC, VLIW, superscalar, or superpipelined based on their instruction set, functional units, and pipeline design. Key distinctions between CISC and RISC processors include their control units, clock rates, and CPI figures.

Uploaded by

avi003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views45 pages

ACA Mod2

Uploaded by

avi003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Processor and memory hierarchy

Design Space of Processors

• Processors can be “mapped” to a space that has clock rate and cycles
per instruction (CPI) as coordinates. Each processor type occupies a
region of this space.

• -Newer technologies are enabling higher clock rates.

• Manufacturers are also trying to lower the number of cycles per

instruction.

• Thus the “future processor space” is moving toward the lower right
of the processor design space.
• Processor families mapped onto a coordinated space of clock rate
v/s CPI
o Clock Rates moved from lower
to higher speeds
o CPI rate is lowered

• Broad Categorization
o CISC
o RISC
CISC and RISC Processors
 Complex Instruction Set Computing (CISC) processors like the Intel
80486, the Motorola 68040, the VAX/8600, and the IBM S/390
typically use micro programmed control units, have lower clock rates,
and higher CPI figures
 Reduced Instruction Set Computing (RISC) processors like the Intel
i860, SPARC, MIPS R3000, and IBM RS/6000, which have hard-wired
control units, higher clock rates, and lower CPI figures
VLIW Machines
• Very Long Instruction Word machines typically have many
more functional units that superscalars (and thus the need for
longer – 256 to 1024 bits – instructions to provide control for
them).

• These machines mostly use micro programmed control units

with relatively slow clock rates because of the need to use
ROM to hold the microcode.
Super pipelined Processors
• These processors typically use a multiphase clock (actually several clocks
that are out of phase with each other, each phase perhaps controlling the
issue of another instruction) running at a relatively high rate.
• The CPI in these machines tends to be relatively high (unless multiple
instruction issue is used).
• Processors in vector supercomputers are mostly superpipelined and use
multiple functional units for concurrent scalar and vector operations
Instruction Pipelines
• Typical instruction includes four phases:
– fetch
– decode
– execute
– write-back
• These four phases are frequently performed in
a pipeline, or “assembly line” manner, as
illustrated on the next slide.
Pipeline Definitions
• Instruction pipeline cycle – the time required for each phase
to complete its operation (assuming equal delay in all phases)
• Instruction issue latency – the time (in cycles) required
between the issuing of two adjacent instructions
• Instruction issue rate – the number of instructions issued per
cycle (the degree of a superscalar)
• Simple operation latency – the delay (after the previous
instruction) associated with the completion of a simple
operation (e.g. integer add) as compared with that of a
complex operation (e.g. divide).
• Resource conflicts – when two or more instructions demand
use of the same functional unit(s) at the same time.
Pipelined Processors
• A base scalar processor:
– issues one instruction per cycle
– has a one-cycle latency for a simple operation
– has a one-cycle latency between instruction issues
– can be fully utilized if instructions can enter the pipeline at a rate on
one per cycle
• For a variety of reasons, instructions might not be able to be
pipelines as agressively as in a base scalar processor. In these
cases, we say the pipeline is underpipelined.
• CPI rating is 1 for an ideal pipeline. Underpipelined systems
will have higher CPI ratings, lower clock rates, or both.
Data path and control unit of scalar processor

the data path architecture and control unit of a

typical, simple scalar processor which does not
employ an instruction pipeline.
Main memory, ID controllers, etc. are
connected to the external bus.
The control unit generates control signals
required for the fetch, decode, ALU operation,
memory access, and write result phases of
instruction execution
Instruction-Set Architectures
Architectural Distinctions
CISC
• Micro programmed control units and ROM in earlier processors
• Conventional CISC architecture uses a unified cache for holding both
instructions and data. Therefore, they must share the same data.-instruction
path
• CISC processors use split cache.
• A microprogrammed control unit is a relatively simple logic circuit that
is capable of sequencing through microinstructions and generating control
signals to execute each microinstruction
RISC
• In a RISC processor separate instruction and data caches are used with
different access paths.
• Split caches and hardwired control unit are used in today RISC machines
Architectural Distinctions
CISC Scalar Processors
• Early systems had only integer fixed point facilities.
• Modern machines have both fixed and floating
point facilities, sometimes as parallel functional
units.
• Many CISC scalar machines are underpipelined.
• Representative systems:
– VAX 8600
– Motorola MC68040
– Intel Pentium
RISC Scalar Processors
• Designed to issue one instruction per cycle
• RISC and CISC scalar processors should have same
performance if clock rate and program lengths are equal.
• RISC moves less frequent operations into software, thus
dedicating hardware resources to the most frequently used
operations.
• Representative systems:
– Sun SPARC
– Intel i860
– Motorola M88100
– AMD 29000
 SPARC family chips produced by Cypress Semiconductors, Inc.
Figure 4.7 shows the architecture of the Cypress CY7C601 SPARC
processor and of the CY7C602 FPU.

 The Sun SPARC instruction set contains 69 basic instructions

 The SPARC runs each procedure with a set of thirty-two 32-bit IU

registers.

 Eight of these registers are global registers shared by all

procedures, and the remaining 24 are window registers associated
with only each procedure.

 The concept of using overlapped register windows is the most

important feature introduced by the Berkeley RISC architecture
 shows eight overlapping windows (formed with 64 local registers and 64
overlapped registers) and eight globals with a total of 136 registers, as
implemented in the Cypress 601.
 Each register window is divided into three eight-register sections, labeled Ins,
Locals, and Outs.
 The local registers are only locally addressable by each procedure. The Ins
and Outs are shared among procedures.
 The calling procedure passes parameters to the called procedure via its Outs
(r8 to r15) registers, which are the Ins registers of the called procedure.
 The window of the currently running procedure is called the active window
pointed to by a current window pointer.
 A window invalid mask is used to indicate which window is invalid. The trap
base register serves as a pointer to a trap handler
Superscalar, Vector Processors
• Scalar processor: executes one instruction per cycle, with only one instruction pipeline.
• Superscalar processor: multiple instruction pipelines, with multiple instructions issued per
cycle, and multiple results generated per cycle.
• Vector processors issue one instructions that operate on multiple data items (arrays). This
is conducive to pipelining with one result produced per cycle

Superscalar Pipelines
Superscalar processors were originally developed as an alternative to vector
processors, with a view to exploit higher degree of instruction level
parallelism.
A superscalar processor of degree m can issue m instructions per cycle.

The base scalar processor, implemented either in RISC or CISC, has m = 1.

In order to fully utilize a superscalar processor of degree m, m instructions

must be executable in parallel. This situation may not be true in all clock
cycles.
• In that case, some of the pipelines may be stalling in a wait state.
• In a superscalar processor, the simple operation latency should
require only one cycle, as in the base scalar processor.
• Due to the desire for a higher degree of instruction-level parallelism
in programs, the superscalar processor depends more on an
optimizing compiler to exploit parallelism
Typical Superscalar Architecture
• A typical superscalar will have
– multiple instruction pipelines
– an instruction cache that can provide multiple instructions per fetch
– multiple buses among the function units
• In theory, all functional units can be simultaneously active .
VLIW Architecture
• VLIW = Very Long Instruction Word
• Instructions usually hundreds of bits long.
• Each instruction word essentially carries multiple “short instructions.”
• Each of the “short instructions” are effectively issued at the same time.
• (This is related to the long words frequently used in microcode.)
• Compilers for VLIW architectures should optimally try to predict branch
outcomes to properly group instructions.
Pipelining in VLIW Processors

• Decoding of instructions is easier in VLIW than in superscalars, because

each “region” of an instruction word is usually limited as to the type of
instruction it can contain.
• Code density in VLIW is less than in superscalars, because if a “region” of
a VLIW word isn’t needed in a particular instruction, it must still exist (to
be filled with a “no op”).
• Superscalars can be compatible with scalar processors; this is difficult with
VLIW parallel and non-parallel architectures
VLIW Opportunities
• “Random” parallelism among scalar operations is exploited in VLIW, instead
of regular parallelism in a vector or SIMD machine.
• The efficiency of the machine is entirely dictated by the success, or
“goodness,” of the compiler in planning the operations to be placed in the
same instruction words.
• Different implementations of the same VLIW architecture may not be
binary-compatible with each other, resulting in different latencies
• VLIW Summary
• VLIW reduces the effort required to detect parallelism using hardware or
software techniques.
• The main advantage of VLIW architecture is its simplicity in hardware
structure and instruction set.
• Unfortunately, VLIW does require careful analysis of code in order to
“compact” the most appropriate ”short” instructions into a VLIW word.
Vector Processors
• Vector Processors

• A vector processor is a coprocessor designed to perform vector computations.

• A vector is a one-dimensional array of data items (each of the same data type).

• Vector processors are often used in multipipelined supercomputers.

• Architectural types include:

– Register-to-Register (with shorter instructions and register files)

– Memory-to-Memory (longer instructions with memory addresses)

Register-to-Register Vector Instructions
• Assume Vi is a vector register of length n,
• si is a scalar register,
• M(1:n) is a memory array of length n, and “ο” is a vector operation.
• Typical instructions include the following
– V1 ο V2  V3 (element by element operation)
– s1 ο V1  V2 (scaling of each element)
– V1 ο V2  s1 (binary reduction - i.e. sum of products)
– M(1:n)  V1 (load a vector register from memory)
– V1  M(1:n) (store a vector register into memory)
– ο V1  V2 (unary vector -- i.e. negation)
– ο V1  s 1 (unary reduction -- i.e. sum of vector)
Memory-to-Memory Vector Instructions

• Typical memory-to-memory vector instructions (using the

same notation as given in the previous slide) include these:

– M1(1:n) ο M2(1:n)  M3(1:n) (binary vector)

– s1 ο M1(1:n)  M2(1:n) (scaling)

– ο M1(1:n)  M2(1:n) (unary vector)

– M1(1:n) ο M2(1:n)  M(k) (binary reduction)

Pipelines in Vector Processors
• Vector processors can usually effectively use large pipelines in
parallel, the number of such parallel pipelines effectively
limited by the number of functional units.
• As usual, the effectiveness of a pipelined system depends on
the availability and use of an effective compiler to generate
code that makes good use of the pipeline facilities
Symbolic Processors
• Symbolic processors are somewhat unique in that their architectures are
tailored toward the execution of programs in languages similar to LISP,
Scheme, and Prolog.
• In effect, the hardware provides a facility for the manipulation of the
relevant data objects with “tailored” instructions.
• These processors (and programs of these types) may invalidate
assumptions made about more traditional scientific and business
computations
Hierarchical Memory Technology
Memory in system is usually characterized as appearing at various levels (0, 1, …) in a
hierarchy, with level 0 being CPU registers and level 1 being the cache closest to the
CPU.
Each level is characterized by five parameters:
• access time ti (round-trip time from CPU to ith level)
• memory size si (number of bytes or words in the level)
• cost per byte ci
• transfer bandwidth bi (rate of transfer between levels)
• unit of transfer xi (grain size for transfers)
Memory devices at a lower level are:

• Faster to access,
• Are smaller in capacity,
• Are more expensive per byte,
• Have a higher bandwidth, and
• Have a smaller unit of transfer

Registers and Caches

Registers
• The registers are parts of the processor;
• Register assignment is made by the compiler.
• Register transfer operations are directly controlled by the processor after
instructions are decoded.
• Register transfer is conducted at processor speed, in one clock cycle.
• Caches
• The cache is controlled by the MMU and is programmer-transparent.
• The cache can also be implemented at one or multiple levels, depending on
the speed and application requirements.
• Multi-level caches are built either on the processor chip or on the processor
board.
• Multi-level cache systems have become essential to deal with memory
access latency.

• Main Memory (Primary Memory)

• It is usually much larger than the cache and often implemented by the most
cost-effective RAM chips, such as DDR SDRAMs, i.e. dual data rate
synchronous dynamic RAMs.
• The main memory is managed by a MMU in cooperation with the operating
system.
• Disk Drives and Backup Storage
• The disk storage is considered the highest level of on-line memory.
• It holds the system programs such as the OS and compilers, and user
programs and their data sets.
• Optical disks and magnetic tape units are off-line memory for use as
archival and backup storage.
• They hold copies of present and past user programs and processed
results and files.
• Disk drives are also available in the form of RAID arrays.

• Peripheral Technology
• Peripheral devices include printers, plotters, terminals, monitors,
graphics displays, optical scanners, image digitizers, output microfilm
devices etc.
• Some I/O devices are tied to special-purpose or multimedia applications.
Inclusion, Coherence, and Locality

• Information stored in a memory hierarchy (M1, M2,…, Mn) satisfies 3 important properties:
– Inclusion
– Coherence
– Locality

• The inclusion property is stated as:

M1  M2  ...  Mn
The implication of the inclusion property is that all items of information in the
“innermost” memory level (cache) also appear in the outer memory levels.

• The inverse, however, is not necessarily true. That is, the presence of a
data item in level Mi+1 does not imply its presence in level Mi. We call a
reference to a missing item a “miss.”
The Coherence Property
The requirement that copies of data items at successive memory levels
be consistent is called the “coherence property.”
Write-through
As soon as a data item in Mi is modified, immediate update of the
corresponding data item(s) in Mi+1, Mi+2, … Mn is required.
This is the most aggressive (and expensive) strategy.
Write-back
The update of the data item in Mi+1 corresponding to a modified
item in Mi is not updated unit it (or the block/page/etc. in Mi that
contains it) is replaced or removed.
This is the most efficient approach, but cannot be used (without
modification) when multiple processors share Mi+1, …, Mn.
Locality of References
Memory references are generated by the CPU for either instruction or
data access.

Temporal locality – if location M is referenced at time t, then it

(location M) will be referenced again at some time t+t.

Spatial locality – if location M is referenced at time t, then another

location Mm will be referenced at time t+t.

Sequential locality – if location M is referenced at time t, then

locations M+1, M+2, … will be referenced at time t+t, t+t’, etc.
In each of these patterns, both m and t are “small.”
Hit Ratios
• When a needed item (instruction or data) is found in the level of the memory
hierarchy being examined, it is called a hit.

• Otherwise (when it is not found), it is called a miss (and the item must be
obtained from a lower level in the hierarchy).

• The hit ratio, h, for Mi is the probability (between 0 and 1) that a needed data
item is found when sought in level memory Mi.

• The miss ratio is obviously just 1-hi.

• We assume h0 = 0 and hn = 1.
Access Frequencies
• The access frequency fi to level Mi is
(1-h1)  (1-h2)  …  hi.

• Note that f = h , and

1 1
 f
i 1
i 1
Effective Access Times
• There are different penalties associated with misses at
different levels in the memory hierarchy.
– A cache miss is typically 2 to 4 times as expensive as a cache hit
(assuming success at the next level).
– A page fault (miss) is 3 to 4 magnitudes as costly as a page hit.
• The effective access time of a memory hierarchy can be
expressed as
n
Teff   f i  ti
i 1

 h1t1  (1  h1 )h2t2    (1  h1 )(1  h2 )  (1  hn 1 )hnt n

The first few terms in this expression dominate, but the effective access
time is still dependent on program behavior and memory design choices
Hierarchy Optimization

• The total cost of a memory hierarchy is estimated as follows:

• This implies that the cost is distributed over n levels. Since cl > c2 > c3 > … cn, we
have to choose s1 < s2 < s3 < … sn.
•
• The optimal design of a memory hierarchy should result in a T eff close to the t1 of M1
and a total cost close to the cost of Mn.

Understanding Siemens PLC s7-200
100% (3)
Understanding Siemens PLC s7-200
26 pages
Module 2 ACA Notes
100% (1)
Module 2 ACA Notes
31 pages
HSE-6-Soc Introduction To The System Design Approach
No ratings yet
HSE-6-Soc Introduction To The System Design Approach
69 pages
Processors
100% (4)
Processors
44 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
RT809F-List2-programmer Support List
100% (1)
RT809F-List2-programmer Support List
311 pages
Workbook PLC
No ratings yet
Workbook PLC
16 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
Chapter 4 Processors and Memory Hierarchy: Module-2
No ratings yet
Chapter 4 Processors and Memory Hierarchy: Module-2
31 pages
UCT Manual Hindi v1.0
No ratings yet
UCT Manual Hindi v1.0
34 pages
Advanced Processor Superscalarclass
50% (2)
Advanced Processor Superscalarclass
73 pages
Ladder Diagram
100% (1)
Ladder Diagram
20 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
59 pages
DLD Lab Manual KMG Version 24-08-2024
No ratings yet
DLD Lab Manual KMG Version 24-08-2024
71 pages
Parking Lot Counter
40% (5)
Parking Lot Counter
1 page
Bebras Solution Guide 2022 R1 Secondary
No ratings yet
Bebras Solution Guide 2022 R1 Secondary
144 pages
2.1 Advanced Processor Technology
No ratings yet
2.1 Advanced Processor Technology
40 pages
MC9S08QE32RM
No ratings yet
MC9S08QE32RM
306 pages
2 3 4 5 Merged Merged
No ratings yet
2 3 4 5 Merged Merged
164 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Very Long Instruction Word
No ratings yet
Very Long Instruction Word
19 pages
2 3 4 Merged
No ratings yet
2 3 4 Merged
134 pages
Csa Mod 2
100% (1)
Csa Mod 2
28 pages
@vtucode - in 21CS643 Module 2 2021 Scheme
No ratings yet
@vtucode - in 21CS643 Module 2 2021 Scheme
49 pages
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
No ratings yet
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
34 pages
Pipelining - Lec 2-3-4
No ratings yet
Pipelining - Lec 2-3-4
72 pages
17CS72 Mod 2 PPT
No ratings yet
17CS72 Mod 2 PPT
74 pages
Module II
No ratings yet
Module II
60 pages
SCADAPack 350 Datasheet
No ratings yet
SCADAPack 350 Datasheet
6 pages
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
CO - Unit 3
No ratings yet
CO - Unit 3
20 pages
Coa 3.2 - Risc - Cisc
No ratings yet
Coa 3.2 - Risc - Cisc
20 pages
ACA Module2 2018.PDF Extra
No ratings yet
ACA Module2 2018.PDF Extra
48 pages
Real Time and FPGA
No ratings yet
Real Time and FPGA
11 pages
Advanced Computer Architecture Prof Thriveni T K
No ratings yet
Advanced Computer Architecture Prof Thriveni T K
59 pages
Stud CSA Processors Mod2 Part1
No ratings yet
Stud CSA Processors Mod2 Part1
64 pages
01-System Architecture
No ratings yet
01-System Architecture
55 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
TEJ3M1 Sample Test#1 Ch#1,4
No ratings yet
TEJ3M1 Sample Test#1 Ch#1,4
7 pages
Superscalar and VLIW Architectures
No ratings yet
Superscalar and VLIW Architectures
35 pages
Field-Effect Transistors: Dr. Talal Skaik
No ratings yet
Field-Effect Transistors: Dr. Talal Skaik
18 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
15CS72 ACA Module2Final
No ratings yet
15CS72 ACA Module2Final
29 pages
ProDesk 400 G2.5 Manual
No ratings yet
ProDesk 400 G2.5 Manual
53 pages
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
No ratings yet
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
35 pages
Lec9 Multiple Issue Processors
No ratings yet
Lec9 Multiple Issue Processors
33 pages
FPGA Design Flow
No ratings yet
FPGA Design Flow
2 pages
Pipelining - Lec 3-Modified
No ratings yet
Pipelining - Lec 3-Modified
36 pages
Very Large Scale Instruction Word
No ratings yet
Very Large Scale Instruction Word
22 pages
Superscalar Vs VLIW
No ratings yet
Superscalar Vs VLIW
30 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
UART Controller With FIFO Buffer Function Based On APB Bus
No ratings yet
UART Controller With FIFO Buffer Function Based On APB Bus
4 pages
Mod5 1
No ratings yet
Mod5 1
18 pages
System-on-Chip Design: 2ECDE54
No ratings yet
System-on-Chip Design: 2ECDE54
24 pages
Phy430 Man Labs (1-2)
No ratings yet
Phy430 Man Labs (1-2)
7 pages
2449 Ex4
No ratings yet
2449 Ex4
15 pages
VLIW Processors: Spring 2003 CSE P548 1
No ratings yet
VLIW Processors: Spring 2003 CSE P548 1
17 pages
Aca Notes
No ratings yet
Aca Notes
19 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
18 Chuck Sobey Channel Science C3D Flash and Emerging Memory Technologies
No ratings yet
18 Chuck Sobey Channel Science C3D Flash and Emerging Memory Technologies
52 pages
03a ILP Superscalar VLIW
No ratings yet
03a ILP Superscalar VLIW
21 pages
Advanced Computer Architectures: 17CS72 (As Per CBCS Scheme)
No ratings yet
Advanced Computer Architectures: 17CS72 (As Per CBCS Scheme)
32 pages
Unit 5 - Pipeling and Multipoessors
No ratings yet
Unit 5 - Pipeling and Multipoessors
74 pages
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
No ratings yet
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
29 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
3-Bit Ripple Carry Adder and 3-Bit Subtractor in Unsigned Binary & 2's Complement
No ratings yet
3-Bit Ripple Carry Adder and 3-Bit Subtractor in Unsigned Binary & 2's Complement
6 pages
18 20210619 Computer Architecture Super Pipelined VLIW Processor Architecture
No ratings yet
18 20210619 Computer Architecture Super Pipelined VLIW Processor Architecture
15 pages
XS8801
No ratings yet
XS8801
1 page
02b ILP Superscalar VLIW
No ratings yet
02b ILP Superscalar VLIW
20 pages
ARM in Embedded Applications - David Rose@ARM
100% (3)
ARM in Embedded Applications - David Rose@ARM
24 pages
Types of Processor
No ratings yet
Types of Processor
6 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
DX Diag
No ratings yet
DX Diag
34 pages
Project Smart Energy Meter
No ratings yet
Project Smart Energy Meter
18 pages
A Comparative Report Between EPIC and VLIW Architecture
No ratings yet
A Comparative Report Between EPIC and VLIW Architecture
2 pages
Final Examination Solutions - Spring 2016-2017
No ratings yet
Final Examination Solutions - Spring 2016-2017
9 pages
204 Tutorial 2 Solution
No ratings yet
204 Tutorial 2 Solution
3 pages
Advantages and Disadvantages of Virtual Memory Management Schemes
No ratings yet
Advantages and Disadvantages of Virtual Memory Management Schemes
2 pages
CA-Assignment 2 - Final
No ratings yet
CA-Assignment 2 - Final
2 pages

ACA Mod2

Uploaded by

ACA Mod2

Uploaded by

Processor and memory hierarchy

Design Space of Processors

• -Newer technologies are enabling higher clock rates.

• Manufacturers are also trying to lower the number of cycles per

• These machines mostly use micro programmed control units

the data path architecture and control unit of a

 The Sun SPARC instruction set contains 69 basic instructions

 The SPARC runs each procedure with a set of thirty-two 32-bit IU

 Eight of these registers are global registers shared by all

 The concept of using overlapped register windows is the most

The base scalar processor, implemented either in RISC or CISC, has m = 1.

In order to fully utilize a superscalar processor of degree m, m instructions

• Decoding of instructions is easier in VLIW than in superscalars, because

• A vector processor is a coprocessor designed to perform vector computations.

• Vector processors are often used in multipipelined supercomputers.

• Architectural types include:

– Register-to-Register (with shorter instructions and register files)

– Memory-to-Memory (longer instructions with memory addresses)

• Typical memory-to-memory vector instructions (using the

– M1(1:n) ο M2(1:n)  M3(1:n) (binary vector)

– s1 ο M1(1:n)  M2(1:n) (scaling)

– ο M1(1:n)  M2(1:n) (unary vector)

– M1(1:n) ο M2(1:n)  M(k) (binary reduction)

Registers and Caches

• Main Memory (Primary Memory)

• The inclusion property is stated as:

Temporal locality – if location M is referenced at time t, then it

Spatial locality – if location M is referenced at time t, then another

Sequential locality – if location M is referenced at time t, then

• The miss ratio is obviously just 1-hi.

• Note that f = h , and

 h1t1  (1  h1 )h2t2    (1  h1 )(1  h2 )  (1  hn 1 )hnt n

• The total cost of a memory hierarchy is estimated as follows:

You might also like