0% found this document useful (0 votes)

20 views36 pages

Lec 14

Uploaded by

Jeremy Cordoba Wright

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Lec 14

Uploaded by

Jeremy Cordoba Wright

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Escuela de Ingeniería Electrónica

EL4314 – Arquitectura de Computadoras I

Paralelización del
procesamiento a nivel de
instrucción
Dr.-Ing. Jorge Castro-Godínez

EL4314 Arquitectura de Computadoras I

Escuela de Ingeniería Electrónica
Tecnológico de Costa Rica
Escuela de Ingeniería Electrónica

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• Instruction-level parallel processing can be informally defined as

the concurrent processing of multiple instructions.
• Traditional sequential processors execute one instruction at a
time.
• A leading instruction is completed before the next instruction is
processed.
• But pipelined processors achieve a form of instruction level
parallel processing by overlapping the processing of multiple
instructions.
Escuela de Ingeniería Electrónica

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• With pipelined (RISC) processors, even though each instruction

may still require multiple cycles to complete, by overlapping the
processing of multiple instructions in the pipeline, the effective
average CPI can be reduced to close to one if a new instruction
can be initiated every machine cycle.
• With scalar pipelined processors, there is still the limitation of
fetching and initiating at most one instruction into the pipeline
every machine cycle.
• Best possible CPI = 1, or best possible throughput -> 1 IPC.
Escuela de Ingeniería Electrónica

Procesadores Escalares
Contenido Procesadores Escalares Procesadores Superescalares
EL4314 – Arquitectura de Computadoras I
Referencias

Escalares

Procesadores
• Best possible CPI Escalares
= 1, or best possible throughput -> 1 IPC.
• IPC > Los
1 means a superscalar processor
procesadores secuenciales tradicionales sólo permiten la
(verdadera) ejecución paralela de una instrucción por ciclo.

Aún con pipeline:

Procesadores Escalares y Escuela de Ingeniería Electrónica
EL4314 – Arquitectura de Computadoras I

PROCESSOR DESIGN
Superescalares
• Scalar processor:
1.4.1.1 are pipelined
Processor Performance. processors
In Section that arethedesigned
1.3.1 we introduced iron law of to
fetch and issue
processor at most
performance, oneininstruction
as shown Equation (1.1).every machine
That equation cycle.
actually repre¬
sents the inverse of performance as a product of instruction count, average CPI, and
the clock cycle time. We can rewrite that equation to directly represent performance

xx—
• Superscalar processors
as a product of are those
the inverse of instruction that
count, are IPC
average designed to fetch
(IPC = 1/CPI), and the
andclock
issue frequency, as shown
multiple in Equation every
instructions (1.2). Looking at this cycle.
machine equation, we see that
performance can be increased by increasing the IPC, increasing the frequency, or
decreasing the instruction count.

Performance = instructions 1 IPC X frequency

instruction count cycle cycle time instruction count
(1.2)

Instruction count is determined by three contributing factors: the instruction

set architecture, the compiler, and the operating system. The ISA and the amount
of work encoded into each instruction can strongly influence the total number of
Escuela de Ingeniería Electrónica

CPI vs. IPC

EL4314 – Arquitectura de Computadoras I

• The use of CPI was popular during the days of scalar pipelined
processors.
• The performance penalties due to various forms of pipeline
stalls can be cleanly stated as different CPI overheads.
• The ultimate performance goal for scalar pipelined processors
was to reduce the average CPI = 1.
• In the superscalar domain, it becomes more convenient to use
IPC. The new performance goal for superscalar processors is to
achieve an average IPC > 1.
machine is measured
Figure by
1.5the overall utilization of the N processors or the fraction
MODERN PROCESSOR DESIGN

Amdahl’s law h +–Nx(l-h)

the number
Sequential
of processors
NNvN
of time the N processors are busy. Efficiency E can be modeled as
Scalar and Vector Processing in a Traditional Supercomputer.
_h + N-Nh _ bottleneck
xvL1
N becomes very
(1-3)
[Amdahl, 1967]. Traditional supercomputers are parallel processors that perform
Asscalar large, the efficiency
Escuela de Ingeniería Electrónica
EL4314 – Arquitectura de Computadoras I

both and vector computations.

Scalar During
and vectorscalar computation
processing onlyEoneapproaches
processor is
1 - During
used. h, whichvector
is the computation
fraction of timeall the machine spends
N processors in vector
are used computation.
to perform operationsAs
“The overallonspeedup due
N becomes large, the amount of time spent in vector computation
array data. The computation performed by such a parallel machine can betodepicted becomes smaller as
parallel processing is
and smaller and approaches zero. Hence, as A becomes
shown in Figure 1.5, where N is the number of processors in the machine very large, the efficiency
strongly E
anddictated
h is by the
theapproaches
fraction ofzero.timeThis the means
machine thatspends
almost all the computation
in scalar computation. time is taken up with
sequential
Conversely, 1 part
- h of the
scalar computation, and further increase of N makes very little impact onprogram reducingas the machine
is the fraction of the time
the overall execution time. the machine spends in vector computation.
parallelism increases.”
One formulation of Amdahl’s law states that the efficiency
Another formulation of this same principle is based on the amount of work
E of the parallel
machine
that canisbemeasured
done in the byvector
the overall utilization
computation mode,oforthe theNvectorizability
processors or ofthe
the fraction
pro¬
of time
gram.the
AsNshown

fraction of the
processors
parallelizedFigure
Scalar
in Figure
to run in
NNvN
are 1.5,/represents

hand+Vector
program
busy. Efficiency

Nx(l-h)must be_h
that Processing
the Efraction
can beofmodeled

in a+Traditional
executed N-Nh
as that can be
the program
1.5vector computation mode. Therefore, 1 - / represents the
_ x vIfLT 1is the total time(1-3)
Supercomputer.
sequentially.
required to run the program, then the relative speedup S can be represented as
As the number
[Amdahl, 1967]. of processors
Traditional N becomes very
supercomputers arelarge, theprocessors
parallel efficiencythatE approaches
perform
1 - scalar
both h, which
and is the fraction
vector of timeDuring
computations. the machine spends in vector
scalar computation computation.
only one processor isAs
N becomes
used. large, the
During vector amount ofalltime
computation
T (1 -f) in
spent
N processors
+ vector
(f/N) computation
are used to performbecomes
operations smaller
on
and data.
array smaller and approaches zero. Hence, as A becomesmachine
very large, thedepicted
efficiency E
where The
T is computation
the sum of (1performed
-/), the timebyrequired
such a parallel can be
to execute the sequential part, andas
approaches zero. This means that almost all the computation time is taken up with
1.4.1.3 Pipelined Processor Performance. Harold Stone proposed that a per¬
formance model similar to that for parallel processors can be developed for pipe¬
lined processors [Stone, 1987]. A typical execution profile of a pipelined processor
is shown in Figure 1.6(a). The machine parallelism parameter N is now the depth Escuela de Ingeniería Electrónica

Pipelined processor performance

of the pipeline, that is, the number of stages in the pipeline. There are three phases EL4314 – Arquitectura de Computadoras I

Oh
<D
T3 steady state
filling
draining
-1 - J
(a)
Ideal
N

I
a<
j
Pu

-1 - g~
(b)

Figure 1.6
Figure 1.7(a). Instead of remaining in the pipeline full phase for the duration of the
entire execution, this steady PROCESSOR 21
DESIGNby pipeline
state is interrupted stalls. Each stall effec¬
tively induces a new pipeline draining phase and a new pipeline filling phase, as
shown in Figure 1.7(a), due to the break in the pipeline full phase. Similar modifica¬ Escuela de Ingeniería Electrónica

Pipelined processor performance

EL4314 – Arquitectura de Computadoras I
tiontwo
part of the work done in the can pipeline
be performed onphases
filling this execution
to profile to result in the modified profile of
phases. Now the modified profile of Figure 1.7(b) resem¬
of parallel processors as shown in Figure 1.5.
of the execution profiles, we can nowPipeline
borrowstall the perfor¬ Pipeline stall
processors and apply it to pipelined
N processors. Instead of

i
ocessors, N is now the number of pipeline stages, or the

S = (1.5)
ible. The parameter g now becomes the fraction of time
ed, and the parameter 1 - g now£
represents the fraction of
<D
Oh

s stalled. The speedup S that can be obtained is now

(a)
(l-g) + (g/N) Real -> stalling cycles
of time when the pipeline is full, iskanalogous
Pipeline stall to/,
the vec Pipeline stall
N r— r—
am in the parallel processor model.
£a, Therefore, Amdahl’s 1

applied to pipelined processors.-o As g drops off just slightly r

<u
i
l_ _l
c
or the performance of a pipelined processor can drop offi--1
ra>
i
r 1

r —1
words, the actual performance£ gain that can be obtained
Oh

e strongly degraded by just a small fraction of stall cycles. i

r—

ing N increases, the fraction of stall cycles will become

(b)
to the actual speedup that can be achieved by a pipeline
n pipelined processors are now the key adversary and are
very quickly. In other words, the actual performance gain that can be obtained
through pipelining can be strongly degraded by just a small fraction of stall cycles.
As the degree of pipelining N increases, the fraction of stall cycles will EL4314
become
Escuela de Ingeniería Electrónica

Pipelined processor performance

– Arquitectura de Computadoras I
increasingly devastating to the actual speedup that can be achieved by a pipeline
processor. Stall cycles in pipelined processors are now the key adversary and are
analogous to the sequential bottleneck for parallel processors. Essentially, stall
• The performance gain that can be obtained through pipelining
cycles constitute the pipelined processor’s sequential bottleneck.
canEquation
be strongly(1.5) is adegraded by just
simple performance a small
model fraction
for pipelined of stall
processors basedcycles.
• As the degree
on Amdahl’s law forof pipelining
parallel NItincreases,
processors. themodel
is assumed in this fraction of stall
that whenever
the pipeline
cycles willis become
stalled, thereincreasingly
is only one instruction in the pipeline,
devastating or it effectively
to the actual
becomes a sequential
speedup that cannonpipelined
be processor.
achieved byThe
a implication
pipeline isprocessor.
that when a pipe¬
line is stalled no overlapping of instructions is allowed; this is effectively equiva¬
• Stall
lent tocycles
stalling theconstitute
pipeline for N the pipelined
cycles to allow theprocessor’s sequential
instruction causing the stall to
bottleneck.
completely traverse the pipeline. We know, however, that with clever design of
the pipeline, such as with the use of forwarding paths, to resolve a hazard that
• With
causesclever
a pipelinedesign
stall, the of the of
number pipeline, such
penalty cycles as with
incurred is notthe use ofN
necessarily
forwarding
and most likelypaths,
less thantoN. resolve a hazard that causestoathepipeline
stall, the number
of Equation
51 =2 N1 (1.6)
Based on this observation, a refinement
of penalty cycles incurred is not necessarily N
(1.5) is possible.
and most likely < N.
&1 _|_ §2 _|_ +gN
model
(1.9)

he curve for Equation (1.9) in Figure 1.8, we see that the speedup is
s 100%, that is, perfectly vectorizable. As/drops off from 100%, the Escuela de Ingeniería Electrónica

Superscalar proposal
ps off very quickly; as/becomes 0%, the speedup is one; that is, no 23
EL4314 – Arquitectura
PROCESSOR de Computadoras I
DESIGN
tained. With higher values of N, this speedup drop-off rate gets signifi¬ Vectorizability /
and as / approaches 0%, all the speedups approach one, regardless of
N. Now assume that the minimum degree of parallelism of 2 can be Figure 1.8
he nonvectorizable portion of the program. The speedup now becomes Easing of the Sequential Bottleneck with Instruction-Level Pa
for Nonvectorizable Code.
(1.10) Source: Agerwala and Cocke, 1987.

by Agerwala and Cocke [1987], plots the speedup as a function

e curve for Equation (1.10) in Figure 1.8, we see that it also starts at a ity of a program, for several values of N, the maximum parallel
when/is 100%, but drops off more slowly than the curve for Equa¬
Take the example of the case when N = 6. The speedup is
en/is lowered from 100%. In fact this curve crosses over the curve
(1.8) with N = 100 when/is approximately 75%. This means that for
ss than 75%, it is more beneficial to have a system with maximum
only 6, that is N = 6, but a minimum parallelism of two for the non¬
portion, than a system with maximum parallelism of N = 100 with Examining the curve for Equation (1.9) in Figure 1.8, we see
equal to 6 if/is 100%, that is, perfectly vectorizable. As/drops
speedup drops off very quickly; as/becomes 0%, the speedu
speedup is obtained. With higher values of N, this speedup drop-
cantly worse, and as / approaches 0%, all the speedups approac
Vectorizability /
the value of N. Now assume that the minimum degree of par
Figure 1.8 achieved for the nonvectorizable portion of the program. The sp
Easing of the Sequential Bottleneck with Instruction-Level Parallelism
for Nonvectorizable Code.
very quickly. In other words, the actual performance gain that can be obtained
through pipelining can be strongly degraded by just a small fraction of stall cycles.
As the degree of pipelining N increases, the fraction of stall cycles will EL4314
become
Escuela de Ingeniería Electrónica

Pipelined processor performance

Pipelined processor
EL4314 – Arquitectura de Computadoras I
lined processor that does not overlap the processing of multiple instructions. This
form of speedup is restricted to comparison within the domain of scalar processors
and focuses on the increased throughput that can be obtained by a (scalar) pipelined
processor with respect to a (scalar) nonpipelined processor. Beginning with Chapter 3,

Successive instructions

Time in cycles (of baseline machine)

Figure 1.9
Instruction Processing Profile of the Baseline Scalar Pipelined Machine.
are executed. On the other hand, a superpipelined machine issues instructions
faster than they are executed. A superpipelined machine of degree m, that is, one
that takes m minor cycles to execute a simple operation, can potentially achieve
Escuela de Ingeniería Electrónica

Superpipelined
better performance than that of themachine
baseline machine by a factor of m. Technically,
traditional pipelined computers that require multiple cycles for executing simple
EL4314 – Arquitectura de Computadoras I

operations should be classified as superpipelined. For example, the latency for per¬
• In a superpipelined
forming fixed-point addition machine,
is three cycles the machine
in both cycle
the CDC 6600 time 1964]
[Thornton, is shorter
than
and thethat of the
CRAY-1 baseline
[Russell, 1978], andmachine andcan
new instructions is be
referred to as
issued in every the minor
cycle.
Hence, these
cycle time.are really superpipelined machines.
In a way, the classification of superpipelined machines is somewhat artifi¬
• The simple
cial, because instruction
it depends still requires
on the choice onecycle
of the baseline baseline cycle, equal
and the definition of to
a simple
m minor operation.
cycles, Thefor
keyexecution,
characteristic ofbut
a superpipelined
the machine machine
canis issue
that the a new
instruction in every minor cycle
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Superpipeline
Escuela de Ingeniería Electrónica

Superpipelined
Superpipelined machine EL4314 – Arquitectura de Computadoras I

Los procesadores superpipilined explotan el hecho de que la

mayorı́a de implementaciones de etapas tardan menos de medio
ciclo de reloj en ejecutarse.

La siguiente instrucción se calendariza medio ciclo después

IPC=1.5
are required for D-cache access. These are noninterruptible operations; no data for¬
Ejemplo: MIPS R4000 warding can involve the buffers between the IF and IS stages or the buffers between
the DF and DS stages. Cache accesses, here considered “simple” operations, are pipe¬
lined and require an operation latency of two (minor) cycles. The issue latency
Escuela defor the Electrónica
Ingeniería

64-bit MIPS R4000 EL4314 – Arquitectura de Computadoras I

entire pipeline is one (minor) cycle; that is, one new instruction can be issued every

Figure 1.11
Posee reloj MIPS
The “Superpipelined” interno
R4000 de doble
8-Stage frecuencia.
Pipeline. Simula un pipeline de
8 etapas
Prof.Ing. Jeferson González G. Lección 9 6/ 25
Superescalares

Procesadores Superescalares
Escuela de Ingeniería Electrónica

Procesador superescalar EL4314 – Arquitectura de Computadoras I

Un procesador superescalar posee la habilidad de ejecutar

múltiples instrucciones por ciclo.

Unidades funcionales replicadas.

Profundidad: Número de pipelines independientes para cada
instrucción.
IPC > 1. Depende de la profundidad. Mayor desempeño.
Más complejo. Mayor consumo de potencia.
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Superescalares
Escuela de Ingeniería Electrónica

Pipeline
Pipeline superescalar
Superescalar EL4314 – Arquitectura de Computadoras I

Hardware duplicado: ”2 Pipelines simultáneos”.

Escuela de Ingeniería Electrónica

Comparación EL4314 – Arquitectura de Computadoras I

Escuela de Ingeniería Electrónica

From scalar to superscalar EL4314 – Arquitectura de Computadoras I

• Superscalar pipelines can be viewed as natural descendants of

the scalar pipelines and to alleviate limitations of scalar pipelines.
• Superscalar pipelines are parallel pipelines, instead of scalar
pipelines, in that they are able to initiate the processing of
multiple instructions in every machine cycle.
• In addition, superscalar pipelines are diversified pipelines in
employing multiple and heterogeneous functional units in their
execution stage(s).
• Superscalar pipelines can be implemented as dynamic pipelines
in order to achieve the best possible performance without
requiring reordering of instructions by the compiler.
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Segmentación paralela

Segmentación Paralela Escuela de Ingeniería Electrónica

Segmentación paralela EL4314 – Arquitectura de Computadoras I

Paralelismo temporal: Un procesador escalar con pipeline puede

lograr un grado paralelismo de k, donde k es el número de etapas
del pipeline

Paralelismo espacial: Un procesador sin pipeline puede lograr un

grado de paralelismo k, implementando k copias del procesador sin
pipeline.

Segmentación Paralela: Combinación de paralelismo temporal y

espacial : pipelines replicados.
Escuela de Ingeniería Electrónica

Segmentación paralela
182 MODERN PROCESSOR DESIGN
EL4314 – Arquitectura de Computadoras I

(a) No parallelism (b) Temporal parallelism

(c) Spatial parallelism

(d) Parallel pipeline

SUPER
Escuela de Ingeniería Electrónica

Segmentación paralela EL4314 – Arquitectura de Computadoras I

• Parallel pipeline of width s = 3

• Potentially advance up to s instructions
• Logic complexity increased.
Figure 4.3
A Parallel Pipeline of Width 5= 3.

Escuela de Ingeniería Electrónica

Ejemplo: Intel Pentium EL4314 – Arquitectura de Computadoras I

1
IF IF IF

D1 D1 D1 1

E)2 | D2 D2

EX | EX EX

WB |
“T
WB

U pipe
T WB

V pipe
(a) (b)
E X A M P E
Figure 4.4
(a) The Five-Stage i486 Scalar Pipeline; *__r
(b) The Five-Stage Pentium Parallel Pipeline
of Width s= 2.
Segmentación diversificada
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

El hardware para ejecutar las diferentes instrucciones puede variar

significativamente entre las instrucciones.

Si se acomodaran todos los tipos de instrucciones en un único

pipeline serı́a muy ineficiente (no todas las instrucciones
utilizarı́an todo el hardware).

Lo mismo aplica para pipelines en paralelo.

Segmentación diversificada
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Un pipeline diversificado utiliza hardware idéntico para algunas

etapas, mientras que se diversifica en otras, permitiendo la
ejecución paralela de una mayor cantidad de instrucciones
diferentes.

Tı́picamente la diferencia se encuentra en la etapa de ejecución.

SUPERSCALAR ORGANIZATION 185
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Figure 4.5
A Diversified Parallel Pipeline with Four
Execution Pipes.
186 MODERN PROCESSOR DESIGN Escuela de Ingeniería Electrónica

Ejemplo: CDC 6600 EL4314 – Arquitectura de Computadoras I

Figure 4.6
The CDC 6600 with 10 Diversified Functional Units
in Its CPU.
SUPERSCALAR ORGANIZATION 187
Escuela de Ingeniería Electrónica

Ejemplo: Motorola 88110 EL4314 – Arquitectura de Computadoras I

Bus
Target instruction Instruction
interface
cache cache unit

General Instruction
History Floating-point Data
buffer register sequencer cache
file register file and branch unit

Integer Integer Bit field Multiplier Floating-point Divider Graphics Graphics Load/
unit unit unit unit add unit unit add unit pack unit store unit

Writeback busses
Segmentación dinámica
Escuela de Ingeniería Electrónica

Segmentación dinámica EL4314 – Arquitectura de Computadoras I

En pipelines escalares se agregaba un registro (bu↵er) entre

etapas, para mantener aislada una de la otra y conservar señales de
control y datos entre las mismas.
Segmentación dinámica
Contenido Procesadores Escalares Procesadores Superescalares Referencias
Segmentación
Segmentación dinámica
dinámica
Escuela de Ingeniería Electrónica

Segmentación
Segmentación dinámica
En pipelines paralelos dinámica
se deben tener bu↵ers multientrada (para
EL4314 – Arquitectura de Computadoras I

cada etapa, para cada pipe)

En pipelines paralelos
188 MODERN seDESIGN
PROCESSOR deben tener bu↵ers multientrada (para
cada etapa, para cada pipe)

I Stage i
11 l• 1• • If
i1

11
Stage i

Buffer (1)
'1

| Buffer (n)
,1 v•••, iuI
jL n (in order)

Stage i + 1
'1

| Stage i + 1
1v1i
jL n (in order)

T T
(a) (b)
Permite paso de datos independiente entre las etapas
zz
(desplazamiento, fifo). Caso dependencias!
...
.1
Stage i
Cada bu↵er puede ser accedido por separado.
Permite paso de datos independiente entre las etapas, (in order)

(desplazamiento,
Prof.Ing. Jeferson González G. Caso
fifo).Lección dependencias!
Buffer (25 n)
9 20/ 25
I Stage i 1•••i1

11
Segmentación dinámicaStage i

Segmentación dinámica
'1
,1 v•••, iuI
jL n (in order)
Escuela de Ingeniería Electrónica

Segmentación
' 1 dinámica
Buffer (1) | Buffer (n) EL4314 – Arquitectura de Computadoras I

En los pipelines dinámicos, los bu↵ers

| Stage
Stage i + 1
1 v 1 i jL n (in order)

i + 1 multientrada tienen la
capacidad de hacer Tre-ordenamiento de las instrucciones,
T
permitiendo la ejecución fuera de orden (OoOE).
(a) (b)

zz ...
.1
Stage i

, (in order)
Buffer (25 n)

Stage i + 1
I I
FT H
f I (out of order)
i i

Calendarización de instrucciones EL4314 – Arquitectura de Computadoras I

En un procesador superescalar el orden (tı́picamente OoOE) de

ejecución de las instrucciones en los pipelines es dinámico.
Hardware especializado se encarga de determinar el orden, ası́
como de evitar y solucionar riesgos.
Segmentación dinámica permite ejecución fuera de orden.
pipeline. A dynamic pipeline achieves out-of-order execution via the use of complex
multientry buffers that allow instructions to enter and leave the buffers in different
orders. Such a reordering multientry buffer is shown in Figure 4.8(c).
Figure 4.9 illustrates a parallel diversified pipeline of width s = 3 that is a dynamic
pipeline. The execution portion of the pipeline, consisting of the four pipelined Escuela de Ingeniería Electrónica

Segmentación dinámica EL4314 – Arquitectura de Computadoras I

Calendarización

Calendarización Escuela de Ingeniería Electrónica

Calendarización EL4314 – Arquitectura de Computadoras I

Existen 3 tipos de polı́ticas calendarización según de búsqueda y

terminación de instrucciones:

búsqueda in order, terminación in order

búsqueda in order, terminación ouf-of-order
búsqueda out-of-order, terminación ouf-of-order
Escuela de Ingeniería Electrónica

Lectura – Semana 13 EL4314 – Arquitectura de Computadoras I

SL – Capítulo 1 y 4

Elna Experience 620 Sewing Machine Service Manual
100% (1)
Elna Experience 620 Sewing Machine Service Manual
38 pages
Lecture Notes On Parallel Processing Pipeline
No ratings yet
Lecture Notes On Parallel Processing Pipeline
12 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
Using Slic3r With The XYZprinting DaVinci 1.0
50% (2)
Using Slic3r With The XYZprinting DaVinci 1.0
6 pages
TLE-TE-10 Q1 W3 Mod3 ICT-CSS
100% (1)
TLE-TE-10 Q1 W3 Mod3 ICT-CSS
23 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Parallel Computing Seminar Report
100% (3)
Parallel Computing Seminar Report
35 pages
Encoder Types: Incremental Encoders Absolute Encoders Resolvers
No ratings yet
Encoder Types: Incremental Encoders Absolute Encoders Resolvers
22 pages
Assignment 1: Sample Solution
No ratings yet
Assignment 1: Sample Solution
8 pages
Tutorial - Manual Microwind 1.d
No ratings yet
Tutorial - Manual Microwind 1.d
125 pages
Yamaha HS80M HS50M Service Manual
No ratings yet
Yamaha HS80M HS50M Service Manual
27 pages
Advanced Processor Superscalarclass
50% (2)
Advanced Processor Superscalarclass
73 pages
Multilin 469
No ratings yet
Multilin 469
260 pages
Superscalar Vs Superpipeline Processor
No ratings yet
Superscalar Vs Superpipeline Processor
17 pages
Comtech/EFData CDM-760 Modem
No ratings yet
Comtech/EFData CDM-760 Modem
4 pages
6-981-B HRS050-HRS060
100% (1)
6-981-B HRS050-HRS060
15 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
Samsung Xpress C43x-Series C43xW-series EN
No ratings yet
Samsung Xpress C43x-Series C43xW-series EN
212 pages
W 04 Parallel Processing
No ratings yet
W 04 Parallel Processing
41 pages
ESD5522E Generator Speed Control Unit
No ratings yet
ESD5522E Generator Speed Control Unit
2 pages
XX Chapter16 InstructionLevelParallelismAndSuperscalarProcessors PDF
No ratings yet
XX Chapter16 InstructionLevelParallelismAndSuperscalarProcessors PDF
90 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Autodesk Inventor - F1 Team Challenge Car
No ratings yet
Autodesk Inventor - F1 Team Challenge Car
62 pages
@vtucode - in 21CS643 Module 2 2021 Scheme
No ratings yet
@vtucode - in 21CS643 Module 2 2021 Scheme
49 pages
The Development of MS-DOS
No ratings yet
The Development of MS-DOS
47 pages
Lecture 06 - (New) Pipelining and Parallelism
No ratings yet
Lecture 06 - (New) Pipelining and Parallelism
37 pages
AMP Modular Plug Catalog
No ratings yet
AMP Modular Plug Catalog
4 pages
H3SoD PC Manual
No ratings yet
H3SoD PC Manual
36 pages
Bakery Management System
No ratings yet
Bakery Management System
32 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
Introduction To Parallel Processing: Unit-2
No ratings yet
Introduction To Parallel Processing: Unit-2
32 pages
Parallelism - Multiprocessing, Multithreading & Pipelining
No ratings yet
Parallelism - Multiprocessing, Multithreading & Pipelining
65 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
ACA Module2 2018.PDF Extra
No ratings yet
ACA Module2 2018.PDF Extra
48 pages
Module 6
No ratings yet
Module 6
59 pages
Chapter - 1 Introduction To Computer
No ratings yet
Chapter - 1 Introduction To Computer
34 pages
LTE Throughput
No ratings yet
LTE Throughput
3 pages
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
No ratings yet
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
26 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
ELECH473 Th04
No ratings yet
ELECH473 Th04
59 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Unit 5
No ratings yet
Unit 5
44 pages
Flir - T420 - T440
No ratings yet
Flir - T420 - T440
2 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Module 5
No ratings yet
Module 5
45 pages
740 Fall10 Lecture4 Afterlecture Pipelining
No ratings yet
740 Fall10 Lecture4 Afterlecture Pipelining
24 pages
High Performance Computing Using Parallel Processing
No ratings yet
High Performance Computing Using Parallel Processing
3 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
System Requirements For AutoCAD 2024
No ratings yet
System Requirements For AutoCAD 2024
4 pages
E85001-0275 - Class A, B Signal Modules
No ratings yet
E85001-0275 - Class A, B Signal Modules
6 pages
15CS72 ACA Module2Final
No ratings yet
15CS72 ACA Module2Final
29 pages
OS Unit 1 & 2 Notes
No ratings yet
OS Unit 1 & 2 Notes
27 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
1.4-Parallel Computer Architecture
No ratings yet
1.4-Parallel Computer Architecture
22 pages
Processors
No ratings yet
Processors
13 pages
Pipelining
No ratings yet
Pipelining
13 pages
Parallelism in Microprocessor
No ratings yet
Parallelism in Microprocessor
17 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
MT8070iH MT6070i Install
No ratings yet
MT8070iH MT6070i Install
8 pages
Super Scalar & Super Pipeline Approach To Processor
No ratings yet
Super Scalar & Super Pipeline Approach To Processor
13 pages
08 Parallel Algorithms Approches
No ratings yet
08 Parallel Algorithms Approches
12 pages
Module 4
No ratings yet
Module 4
12 pages
ECE/CS 752: Advanced Computer Architecture I 1
No ratings yet
ECE/CS 752: Advanced Computer Architecture I 1
4 pages
Superscalar - Superpipeline - Processor
No ratings yet
Superscalar - Superpipeline - Processor
10 pages
AcA Assignment VIDHI KISHOR
No ratings yet
AcA Assignment VIDHI KISHOR
6 pages
Superscaling in Computer Architecture
No ratings yet
Superscaling in Computer Architecture
9 pages
Vector (Array) Processing and Superscalar Processors
No ratings yet
Vector (Array) Processing and Superscalar Processors
7 pages
Superscalar Processor - Wikipedia
No ratings yet
Superscalar Processor - Wikipedia
5 pages
National Digital Literacy Programme (NDLP) Letter To Parent-1
No ratings yet
National Digital Literacy Programme (NDLP) Letter To Parent-1
4 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
Make A Character With Interactive Eyes That Follow The Mouse 360 Degrees
No ratings yet
Make A Character With Interactive Eyes That Follow The Mouse 360 Degrees
4 pages
Model 11 Model 11: Mustee Mustee
No ratings yet
Model 11 Model 11: Mustee Mustee
4 pages
DQ87PG: Intel® Desktop Board
No ratings yet
DQ87PG: Intel® Desktop Board
4 pages
Card Action Analysis PDF
No ratings yet
Card Action Analysis PDF
2 pages
B 400 Manual
No ratings yet
B 400 Manual
1 page

Lec 14

Uploaded by

Lec 14

Uploaded by

Escuela de Ingeniería Electrónica

EL4314 – Arquitectura de Computadoras I

EL4314 Arquitectura de Computadoras I

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• Instruction-level parallel processing can be informally defined as

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• With pipelined (RISC) processors, even though each instruction

Aún con pipeline:

Performance = instructions 1 IPC X frequency

Instruction count is determined by three contributing factors: the instruction

CPI vs. IPC

Amdahl’s law h +–Nx(l-h)

both and vector computations.

Pipelined processor performance

Pipelined processor performance

s stalled. The speedup S that can be obtained is now

applied to pipelined processors.-o As g drops off just slightly r

e strongly degraded by just a small fraction of stall cycles. i

ing N increases, the fraction of stall cycles will become

Pipelined processor performance

by Agerwala and Cocke [1987], plots the speedup as a function

Pipelined processor performance

Time in cycles (of baseline machine)

Los procesadores superpipilined explotan el hecho de que la

La siguiente instrucción se calendariza medio ciclo después

64-bit MIPS R4000 EL4314 – Arquitectura de Computadoras I

Procesador superescalar EL4314 – Arquitectura de Computadoras I

Un procesador superescalar posee la habilidad de ejecutar

Unidades funcionales replicadas.

Hardware duplicado: ”2 Pipelines simultáneos”.

Comparación EL4314 – Arquitectura de Computadoras I

From scalar to superscalar EL4314 – Arquitectura de Computadoras I

• Superscalar pipelines can be viewed as natural descendants of

Segmentación Paralela Escuela de Ingeniería Electrónica

Segmentación paralela EL4314 – Arquitectura de Computadoras I

Paralelismo temporal: Un procesador escalar con pipeline puede

Paralelismo espacial: Un procesador sin pipeline puede lograr un

Segmentación Paralela: Combinación de paralelismo temporal y

(a) No parallelism (b) Temporal parallelism

(c) Spatial parallelism

(d) Parallel pipeline

Segmentación paralela EL4314 – Arquitectura de Computadoras I

• Parallel pipeline of width s = 3

Escuela de Ingeniería Electrónica

Ejemplo: Intel Pentium EL4314 – Arquitectura de Computadoras I

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

El hardware para ejecutar las diferentes instrucciones puede variar

Si se acomodaran todos los tipos de instrucciones en un único

Lo mismo aplica para pipelines en paralelo.

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Un pipeline diversificado utiliza hardware idéntico para algunas

Tı́picamente la diferencia se encuentra en la etapa de ejecución.

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Ejemplo: CDC 6600 EL4314 – Arquitectura de Computadoras I

Ejemplo: Motorola 88110 EL4314 – Arquitectura de Computadoras I

Segmentación dinámica EL4314 – Arquitectura de Computadoras I

En pipelines escalares se agregaba un registro (bu↵er) entre

cada etapa, para cada pipe)

En los pipelines dinámicos, los bu↵ers

Calendarización de instrucciones EL4314 – Arquitectura de Computadoras I

En un procesador superescalar el orden (tı́picamente OoOE) de

Segmentación dinámica EL4314 – Arquitectura de Computadoras I

Calendarización Escuela de Ingeniería Electrónica

Calendarización EL4314 – Arquitectura de Computadoras I

Existen 3 tipos de polı́ticas calendarización según de búsqueda y

búsqueda in order, terminación in order

Lectura – Semana 13 EL4314 – Arquitectura de Computadoras I

You might also like