0% found this document useful (0 votes)
20 views36 pages

Lec 14

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views36 pages

Lec 14

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Escuela de Ingeniería Electrónica

EL4314 – Arquitectura de Computadoras I

Paralelización del
procesamiento a nivel de
instrucción
Dr.-Ing. Jorge Castro-Godínez

EL4314 Arquitectura de Computadoras I


Escuela de Ingeniería Electrónica
Tecnológico de Costa Rica
Escuela de Ingeniería Electrónica

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• Instruction-level parallel processing can be informally defined as


the concurrent processing of multiple instructions.
• Traditional sequential processors execute one instruction at a
time.
• A leading instruction is completed before the next instruction is
processed.
• But pipelined processors achieve a form of instruction level
parallel processing by overlapping the processing of multiple
instructions.
Escuela de Ingeniería Electrónica

Procesadores Escalares EL4314 – Arquitectura de Computadoras I

• With pipelined (RISC) processors, even though each instruction


may still require multiple cycles to complete, by overlapping the
processing of multiple instructions in the pipeline, the effective
average CPI can be reduced to close to one if a new instruction
can be initiated every machine cycle.
• With scalar pipelined processors, there is still the limitation of
fetching and initiating at most one instruction into the pipeline
every machine cycle.
• Best possible CPI = 1, or best possible throughput -> 1 IPC.
Escuela de Ingeniería Electrónica

Procesadores Escalares
Contenido Procesadores Escalares Procesadores Superescalares
EL4314 – Arquitectura de Computadoras I
Referencias

Escalares

Procesadores
• Best possible CPI Escalares
= 1, or best possible throughput -> 1 IPC.
• IPC > Los
1 means a superscalar processor
procesadores secuenciales tradicionales sólo permiten la
(verdadera) ejecución paralela de una instrucción por ciclo.

Aún con pipeline:


Procesadores Escalares y Escuela de Ingeniería Electrónica
EL4314 – Arquitectura de Computadoras I

PROCESSOR DESIGN
Superescalares
• Scalar processor:
1.4.1.1 are pipelined
Processor Performance. processors
In Section that arethedesigned
1.3.1 we introduced iron law of to
fetch and issue
processor at most
performance, oneininstruction
as shown Equation (1.1).every machine
That equation cycle.
actually repre¬
sents the inverse of performance as a product of instruction count, average CPI, and
the clock cycle time. We can rewrite that equation to directly represent performance

xx—
• Superscalar processors
as a product of are those
the inverse of instruction that
count, are IPC
average designed to fetch
(IPC = 1/CPI), and the
andclock
issue frequency, as shown
multiple in Equation every
instructions (1.2). Looking at this cycle.
machine equation, we see that
performance can be increased by increasing the IPC, increasing the frequency, or
decreasing the instruction count.

Performance = instructions 1 IPC X frequency


instruction count cycle cycle time instruction count
(1.2)

Instruction count is determined by three contributing factors: the instruction


set architecture, the compiler, and the operating system. The ISA and the amount
of work encoded into each instruction can strongly influence the total number of
Escuela de Ingeniería Electrónica

CPI vs. IPC


EL4314 – Arquitectura de Computadoras I

• The use of CPI was popular during the days of scalar pipelined
processors.
• The performance penalties due to various forms of pipeline
stalls can be cleanly stated as different CPI overheads.
• The ultimate performance goal for scalar pipelined processors
was to reduce the average CPI = 1.
• In the superscalar domain, it becomes more convenient to use
IPC. The new performance goal for superscalar processors is to
achieve an average IPC > 1.
machine is measured
Figure by
1.5the overall utilization of the N processors or the fraction
MODERN PROCESSOR DESIGN

Amdahl’s law h +–Nx(l-h)


the number
Sequential
of processors
NNvN
of time the N processors are busy. Efficiency E can be modeled as
Scalar and Vector Processing in a Traditional Supercomputer.
_h + N-Nh _ bottleneck
xvL1
N becomes very
(1-3)
[Amdahl, 1967]. Traditional supercomputers are parallel processors that perform
Asscalar large, the efficiency
Escuela de Ingeniería Electrónica
EL4314 – Arquitectura de Computadoras I

both and vector computations.


Scalar During
and vectorscalar computation
processing onlyEoneapproaches
processor is
1 - During
used. h, whichvector
is the computation
fraction of timeall the machine spends
N processors in vector
are used computation.
to perform operationsAs
“The overallonspeedup due
N becomes large, the amount of time spent in vector computation
array data. The computation performed by such a parallel machine can betodepicted becomes smaller as
parallel processing is
and smaller and approaches zero. Hence, as A becomes
shown in Figure 1.5, where N is the number of processors in the machine very large, the efficiency
strongly E
anddictated
h is by the
theapproaches
fraction ofzero.timeThis the means
machine thatspends
almost all the computation
in scalar computation. time is taken up with
sequential
Conversely, 1 part
- h of the
scalar computation, and further increase of N makes very little impact onprogram reducingas the machine
is the fraction of the time
the overall execution time. the machine spends in vector computation.
parallelism increases.”
One formulation of Amdahl’s law states that the efficiency
Another formulation of this same principle is based on the amount of work
E of the parallel
machine
that canisbemeasured
done in the byvector
the overall utilization
computation mode,oforthe theNvectorizability
processors or ofthe
the fraction
pro¬
of time
gram.the
AsNshown

fraction of the
processors
parallelizedFigure
Scalar
in Figure
to run in
NNvN
are 1.5,/represents

hand+Vector
program
busy. Efficiency

Nx(l-h)must be_h
that Processing
the Efraction
can beofmodeled

in a+Traditional
executed N-Nh
as that can be
the program
1.5vector computation mode. Therefore, 1 - / represents the
_ x vIfLT 1is the total time(1-3)
Supercomputer.
sequentially.
required to run the program, then the relative speedup S can be represented as
As the number
[Amdahl, 1967]. of processors
Traditional N becomes very
supercomputers arelarge, theprocessors
parallel efficiencythatE approaches
perform
1 - scalar
both h, which
and is the fraction
vector of timeDuring
computations. the machine spends in vector
scalar computation computation.
only one processor isAs
N becomes
used. large, the
During vector amount ofalltime
computation
T (1 -f) in
spent
N processors
+ vector
(f/N) computation
are used to performbecomes
operations smaller
on
and data.
array smaller and approaches zero. Hence, as A becomesmachine
very large, thedepicted
efficiency E
where The
T is computation
the sum of (1performed
-/), the timebyrequired
such a parallel can be
to execute the sequential part, andas
approaches zero. This means that almost all the computation time is taken up with
1.4.1.3 Pipelined Processor Performance. Harold Stone proposed that a per¬
formance model similar to that for parallel processors can be developed for pipe¬
lined processors [Stone, 1987]. A typical execution profile of a pipelined processor
is shown in Figure 1.6(a). The machine parallelism parameter N is now the depth Escuela de Ingeniería Electrónica

Pipelined processor performance


of the pipeline, that is, the number of stages in the pipeline. There are three phases EL4314 – Arquitectura de Computadoras I

Oh
<D
T3 steady state
filling
draining
-1 - J
(a)
Ideal
N

I
a<
j
Pu

-1 - g~
(b)

Figure 1.6
Figure 1.7(a). Instead of remaining in the pipeline full phase for the duration of the
entire execution, this steady PROCESSOR 21
DESIGNby pipeline
state is interrupted stalls. Each stall effec¬
tively induces a new pipeline draining phase and a new pipeline filling phase, as
shown in Figure 1.7(a), due to the break in the pipeline full phase. Similar modifica¬ Escuela de Ingeniería Electrónica

Pipelined processor performance


EL4314 – Arquitectura de Computadoras I
tiontwo
part of the work done in the can pipeline
be performed onphases
filling this execution
to profile to result in the modified profile of
phases. Now the modified profile of Figure 1.7(b) resem¬
of parallel processors as shown in Figure 1.5.
of the execution profiles, we can nowPipeline
borrowstall the perfor¬ Pipeline stall
processors and apply it to pipelined
N ­ processors. Instead of

i
ocessors, N is now the number of pipeline stages, or the

S = (1.5)
ible. The parameter g now becomes the fraction of time
ed, and the parameter 1 - g now£
represents the fraction of
<D
Oh

s stalled. The speedup S that can be obtained is now

(a)
(l-g) + (g/N) Real -> stalling cycles
of time when the pipeline is full, iskanalogous
Pipeline stall to/,
the vec­ Pipeline stall
N r— r—
am in the parallel processor model.
£a, Therefore, Amdahl’s 1

applied to pipelined processors.-o As g drops off just slightly r


<u
i
l_ _l
c
or the performance of a pipelined processor can drop offi--1
ra>
i
r 1

r —1
words, the actual performance£ gain that can be obtained
Oh

e strongly degraded by just a small fraction of stall cycles. i


r—

ing N increases, the fraction of stall cycles will become


(b)
to the actual speedup that can be achieved by a pipeline
n pipelined processors are now the key adversary and are
very quickly. In other words, the actual performance gain that can be obtained
through pipelining can be strongly degraded by just a small fraction of stall cycles.
As the degree of pipelining N increases, the fraction of stall cycles will EL4314
become
Escuela de Ingeniería Electrónica

Pipelined processor performance


– Arquitectura de Computadoras I
increasingly devastating to the actual speedup that can be achieved by a pipeline
processor. Stall cycles in pipelined processors are now the key adversary and are
analogous to the sequential bottleneck for parallel processors. Essentially, stall
• The performance gain that can be obtained through pipelining
cycles constitute the pipelined processor’s sequential bottleneck.
canEquation
be strongly(1.5) is adegraded by just
simple performance a small
model fraction
for pipelined of stall
processors basedcycles.
• As the degree
on Amdahl’s law forof pipelining
parallel NItincreases,
processors. themodel
is assumed in this fraction of stall
that whenever
the pipeline
cycles willis become
stalled, thereincreasingly
is only one instruction in the pipeline,
devastating or it effectively
to the actual
becomes a sequential
speedup that cannonpipelined
be processor.
achieved byThe
a implication
pipeline isprocessor.
that when a pipe¬
line is stalled no overlapping of instructions is allowed; this is effectively equiva¬
• Stall
lent tocycles
stalling theconstitute
pipeline for N the pipelined
cycles to allow theprocessor’s sequential
instruction causing the stall to
bottleneck.
completely traverse the pipeline. We know, however, that with clever design of
the pipeline, such as with the use of forwarding paths, to resolve a hazard that
• With
causesclever
a pipelinedesign
stall, the of the of
number pipeline, such
penalty cycles as with
incurred is notthe use ofN
necessarily
forwarding
and most likelypaths,
less thantoN. resolve a hazard that causestoathepipeline
stall, the number
of Equation
51 =2 N1 (1.6)
Based on this observation, a refinement
of penalty cycles incurred is not necessarily N
(1.5) is possible.
and most likely < N.
&1 _|_ §2 _|_ +gN
model
(1.9)

he curve for Equation (1.9) in Figure 1.8, we see that the speedup is
s 100%, that is, perfectly vectorizable. As/drops off from 100%, the Escuela de Ingeniería Electrónica

Superscalar proposal
ps off very quickly; as/becomes 0%, the speedup is one; that is, no 23
EL4314 – Arquitectura
PROCESSOR de Computadoras I
DESIGN
tained. With higher values of N, this speedup drop-off rate gets signifi¬ Vectorizability /
and as / approaches 0%, all the speedups approach one, regardless of
N. Now assume that the minimum degree of parallelism of 2 can be Figure 1.8
he nonvectorizable portion of the program. The speedup now becomes Easing of the Sequential Bottleneck with Instruction-Level Pa
for Nonvectorizable Code.
(1.10) Source: Agerwala and Cocke, 1987.

by Agerwala and Cocke [1987], plots the speedup as a function


e curve for Equation (1.10) in Figure 1.8, we see that it also starts at a ity of a program, for several values of N, the maximum parallel
when/is 100%, but drops off more slowly than the curve for Equa¬
Take the example of the case when N = 6. The speedup is
en/is lowered from 100%. In fact this curve crosses over the curve
(1.8) with N = 100 when/is approximately 75%. This means that for
ss than 75%, it is more beneficial to have a system with maximum
only 6, that is N = 6, but a minimum parallelism of two for the non¬
portion, than a system with maximum parallelism of N = 100 with Examining the curve for Equation (1.9) in Figure 1.8, we see
equal to 6 if/is 100%, that is, perfectly vectorizable. As/drops
speedup drops off very quickly; as/becomes 0%, the speedu
speedup is obtained. With higher values of N, this speedup drop-
cantly worse, and as / approaches 0%, all the speedups approac
Vectorizability /
the value of N. Now assume that the minimum degree of par
Figure 1.8 achieved for the nonvectorizable portion of the program. The sp
Easing of the Sequential Bottleneck with Instruction-Level Parallelism
for Nonvectorizable Code.
very quickly. In other words, the actual performance gain that can be obtained
through pipelining can be strongly degraded by just a small fraction of stall cycles.
As the degree of pipelining N increases, the fraction of stall cycles will EL4314
become
Escuela de Ingeniería Electrónica

Pipelined processor performance


– Arquitectura de Computadoras I
increasingly devastating to the actual speedup that can be achieved by a pipeline
processor. Stall cycles in pipelined processors are now the key adversary and are
analogous to the sequential bottleneck for parallel processors. Essentially, stall
• The performance gain that can be obtained through pipelining
cycles constitute the pipelined processor’s sequential bottleneck.
canEquation
be strongly(1.5) is adegraded by just
simple performance a small
model fraction
for pipelined of stall
processors basedcycles.
• As the degree
on Amdahl’s law forof pipelining
parallel NItincreases,
processors. themodel
is assumed in this fraction of stall
that whenever
the pipeline
cycles willis become
stalled, thereincreasingly
is only one instruction in the pipeline,
devastating or it effectively
to the actual
becomes a sequential
speedup that cannonpipelined
be processor.
achieved byThe
a implication
pipeline isprocessor.
that when a pipe¬
line is stalled no overlapping of instructions is allowed; this is effectively equiva¬
• Stall
lent tocycles
stalling theconstitute
pipeline for N the pipelined
cycles to allow theprocessor’s sequential
instruction causing the stall to
bottleneck.
completely traverse the pipeline. We know, however, that with clever design of
the pipeline, such as with the use of forwarding paths, to resolve a hazard that
• With
causesclever
a pipelinedesign
stall, the of the of
number pipeline, such
penalty cycles as with
incurred is notthe use ofN
necessarily
forwarding
and most likelypaths,
less thantoN. resolve a hazard that causestoathepipeline
stall, the number
of Equation
51 =2 N1 (1.6)
Based on this observation, a refinement
of penalty cycles incurred is not necessarily N
(1.5) is possible.
and most likely < N.
&1 _|_ §2 _|_ +gN
model
When we discuss the performance or speedup of ILP machines, this baseline
machine is used as the reference. Earlier in this chapter we referred to the speedup
that can be obtained by a pipelined processor over that of a sequential nonpipe­
Escuela de Ingeniería Electrónica

Pipelined processor
EL4314 – Arquitectura de Computadoras I
lined processor that does not overlap the processing of multiple instructions. This
form of speedup is restricted to comparison within the domain of scalar processors
and focuses on the increased throughput that can be obtained by a (scalar) pipelined
processor with respect to a (scalar) nonpipelined processor. Beginning with Chapter 3,

Successive instructions

Time in cycles (of baseline machine)

Figure 1.9
Instruction Processing Profile of the Baseline Scalar Pipelined Machine.
are executed. On the other hand, a superpipelined machine issues instructions
faster than they are executed. A superpipelined machine of degree m, that is, one
that takes m minor cycles to execute a simple operation, can potentially achieve
Escuela de Ingeniería Electrónica

Superpipelined
better performance than that of themachine
baseline machine by a factor of m. Technically,
traditional pipelined computers that require multiple cycles for executing simple
EL4314 – Arquitectura de Computadoras I

operations should be classified as superpipelined. For example, the latency for per¬
• In a superpipelined
forming fixed-point addition machine,
is three cycles the machine
in both cycle
the CDC 6600 time 1964]
[Thornton, is shorter
than
and thethat of the
CRAY-1 baseline
[Russell, 1978], andmachine andcan
new instructions is be
referred to as
issued in every the minor
cycle.
Hence, these
cycle time.are really superpipelined machines.
In a way, the classification of superpipelined machines is somewhat artifi¬
• The simple
cial, because instruction
it depends still requires
on the choice onecycle
of the baseline baseline cycle, equal
and the definition of to
a simple
m minor operation.
cycles, Thefor
keyexecution,
characteristic ofbut
a superpipelined
the machine machine
canis issue
that the a new
instruction in every minor cycle
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Superpipeline
Escuela de Ingeniería Electrónica

Superpipelined
Superpipelined machine EL4314 – Arquitectura de Computadoras I

Los procesadores superpipilined explotan el hecho de que la


mayorı́a de implementaciones de etapas tardan menos de medio
ciclo de reloj en ejecutarse.

La siguiente instrucción se calendariza medio ciclo después

IPC=1.5
are required for D-cache access. These are noninterruptible operations; no data for¬
Ejemplo: MIPS R4000 warding can involve the buffers between the IF and IS stages or the buffers between
the DF and DS stages. Cache accesses, here considered “simple” operations, are pipe¬
lined and require an operation latency of two (minor) cycles. The issue latency
Escuela defor the Electrónica
Ingeniería

64-bit MIPS R4000 EL4314 – Arquitectura de Computadoras I


entire pipeline is one (minor) cycle; that is, one new instruction can be issued every

Figure 1.11
Posee reloj MIPS
The “Superpipelined” interno
R4000 de doble
8-Stage frecuencia.
Pipeline. Simula un pipeline de
8 etapas
Prof.Ing. Jeferson González G. Lección 9 6/ 25
Superescalares

Procesadores Superescalares
Escuela de Ingeniería Electrónica

Procesador superescalar EL4314 – Arquitectura de Computadoras I

Un procesador superescalar posee la habilidad de ejecutar


múltiples instrucciones por ciclo.

Unidades funcionales replicadas.


Profundidad: Número de pipelines independientes para cada
instrucción.
IPC > 1. Depende de la profundidad. Mayor desempeño.
Más complejo. Mayor consumo de potencia.
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Superescalares
Escuela de Ingeniería Electrónica

Pipeline
Pipeline superescalar
Superescalar EL4314 – Arquitectura de Computadoras I

Hardware duplicado: ”2 Pipelines simultáneos”.


Escuela de Ingeniería Electrónica

Comparación EL4314 – Arquitectura de Computadoras I


Escuela de Ingeniería Electrónica

From scalar to superscalar EL4314 – Arquitectura de Computadoras I

• Superscalar pipelines can be viewed as natural descendants of


the scalar pipelines and to alleviate limitations of scalar pipelines.
• Superscalar pipelines are parallel pipelines, instead of scalar
pipelines, in that they are able to initiate the processing of
multiple instructions in every machine cycle.
• In addition, superscalar pipelines are diversified pipelines in
employing multiple and heterogeneous functional units in their
execution stage(s).
• Superscalar pipelines can be implemented as dynamic pipelines
in order to achieve the best possible performance without
requiring reordering of instructions by the compiler.
Contenido Procesadores Escalares Procesadores Superescalares Referencias

Segmentación paralela

Segmentación Paralela Escuela de Ingeniería Electrónica

Segmentación paralela EL4314 – Arquitectura de Computadoras I

Paralelismo temporal: Un procesador escalar con pipeline puede


lograr un grado paralelismo de k, donde k es el número de etapas
del pipeline

Paralelismo espacial: Un procesador sin pipeline puede lograr un


grado de paralelismo k, implementando k copias del procesador sin
pipeline.

Segmentación Paralela: Combinación de paralelismo temporal y


espacial : pipelines replicados.
Escuela de Ingeniería Electrónica

Segmentación paralela
182 MODERN PROCESSOR DESIGN
EL4314 – Arquitectura de Computadoras I

(a) No parallelism (b) Temporal parallelism

JL

(c) Spatial parallelism

(d) Parallel pipeline


SUPER
Escuela de Ingeniería Electrónica

Segmentación paralela EL4314 – Arquitectura de Computadoras I

• Parallel pipeline of width s = 3


• Potentially advance up to s instructions
• Logic complexity increased.
Figure 4.3
A Parallel Pipeline of Width 5= 3.

Escuela de Ingeniería Electrónica

Ejemplo: Intel Pentium EL4314 – Arquitectura de Computadoras I

1
IF IF IF

D1 D1 D1 1

E)2 | D2 D2

EX | EX EX

WB |
“T
WB

U pipe
T WB

V pipe
(a) (b)
E X A M P E
Figure 4.4
(a) The Five-Stage i486 Scalar Pipeline; *__r
(b) The Five-Stage Pentium Parallel Pipeline
of Width s= 2.
Segmentación diversificada
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

El hardware para ejecutar las diferentes instrucciones puede variar


significativamente entre las instrucciones.

Si se acomodaran todos los tipos de instrucciones en un único


pipeline serı́a muy ineficiente (no todas las instrucciones
utilizarı́an todo el hardware).

Lo mismo aplica para pipelines en paralelo.


Segmentación diversificada
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Un pipeline diversificado utiliza hardware idéntico para algunas


etapas, mientras que se diversifica en otras, permitiendo la
ejecución paralela de una mayor cantidad de instrucciones
diferentes.

Tı́picamente la diferencia se encuentra en la etapa de ejecución.


SUPERSCALAR ORGANIZATION 185
Escuela de Ingeniería Electrónica

Segmentación diversificada EL4314 – Arquitectura de Computadoras I

Figure 4.5
A Diversified Parallel Pipeline with Four
Execution Pipes.
186 MODERN PROCESSOR DESIGN Escuela de Ingeniería Electrónica

Ejemplo: CDC 6600 EL4314 – Arquitectura de Computadoras I

Figure 4.6
The CDC 6600 with 10 Diversified Functional Units
in Its CPU.
SUPERSCALAR ORGANIZATION 187
Escuela de Ingeniería Electrónica

Ejemplo: Motorola 88110 EL4314 – Arquitectura de Computadoras I

Bus
Target instruction Instruction
interface
cache cache unit

General Instruction
History Floating-point Data
buffer register sequencer cache
file register file and branch unit

Integer Integer Bit field Multiplier Floating-point Divider Graphics Graphics Load/
unit unit unit unit add unit unit add unit pack unit store unit

Writeback busses
Segmentación dinámica
Escuela de Ingeniería Electrónica

Segmentación dinámica EL4314 – Arquitectura de Computadoras I

En pipelines escalares se agregaba un registro (bu↵er) entre


etapas, para mantener aislada una de la otra y conservar señales de
control y datos entre las mismas.
Segmentación dinámica
Contenido Procesadores Escalares Procesadores Superescalares Referencias
Segmentación
Segmentación dinámica
dinámica
Escuela de Ingeniería Electrónica

Segmentación
Segmentación dinámica
En pipelines paralelos dinámica
se deben tener bu↵ers multientrada (para
EL4314 – Arquitectura de Computadoras I

cada etapa, para cada pipe)


En pipelines paralelos
188 MODERN seDESIGN
PROCESSOR deben tener bu↵ers multientrada (para
cada etapa, para cada pipe)

I Stage i
11 l• 1• • If
i1

11
Stage i

Buffer (1)
'1

| Buffer (n)
,1 v•••, iuI
jL n (in order)

Stage i + 1
'1

| Stage i + 1
1v1i
jL n (in order)

T T
(a) (b)
Permite paso de datos independiente entre las etapas
zz
(desplazamiento, fifo). Caso dependencias!
...
.1
Stage i
Cada bu↵er puede ser accedido por separado.
Permite paso de datos independiente entre las etapas, (in order)

(desplazamiento,
Prof.Ing. Jeferson González G. Caso
fifo).Lección dependencias!
Buffer (25 n)
9 20/ 25
I Stage i 1•••i1

11
Segmentación dinámicaStage i

Segmentación dinámica
'1
,1 v•••, iuI
jL n (in order)
Escuela de Ingeniería Electrónica

Segmentación
' 1 dinámica
Buffer (1) | Buffer (n) EL4314 – Arquitectura de Computadoras I

En los pipelines dinámicos, los bu↵ers


| Stage
Stage i + 1
1 v 1 i jL n (in order)

i + 1 multientrada tienen la
capacidad de hacer Tre-ordenamiento de las instrucciones,
T
permitiendo la ejecución fuera de orden (OoOE).
(a) (b)

zz ...
.1
Stage i

, (in order)
Buffer (25 n)

Stage i + 1
I I
FT H
f I (out of order)
i i

(C)
Calendarización de instrucciones
Escuela de Ingeniería Electrónica

Calendarización de instrucciones EL4314 – Arquitectura de Computadoras I

En un procesador superescalar el orden (tı́picamente OoOE) de


ejecución de las instrucciones en los pipelines es dinámico.
Hardware especializado se encarga de determinar el orden, ası́
como de evitar y solucionar riesgos.
Segmentación dinámica permite ejecución fuera de orden.
pipeline. A dynamic pipeline achieves out-of-order execution via the use of complex
multientry buffers that allow instructions to enter and leave the buffers in different
orders. Such a reordering multientry buffer is shown in Figure 4.8(c).
Figure 4.9 illustrates a parallel diversified pipeline of width s = 3 that is a dynamic
pipeline. The execution portion of the pipeline, consisting of the four pipelined Escuela de Ingeniería Electrónica

Segmentación dinámica EL4314 – Arquitectura de Computadoras I


Calendarización

Calendarización Escuela de Ingeniería Electrónica

Calendarización EL4314 – Arquitectura de Computadoras I

Existen 3 tipos de polı́ticas calendarización según de búsqueda y


terminación de instrucciones:

búsqueda in order, terminación in order


búsqueda in order, terminación ouf-of-order
búsqueda out-of-order, terminación ouf-of-order
Escuela de Ingeniería Electrónica

Lectura – Semana 13 EL4314 – Arquitectura de Computadoras I

SL – Capítulo 1 y 4

You might also like