Mod5 1
Mod5 1
one cycle execution time: RISC processors have a CPI (clock per instruction)
of one cycle. This is due to the optimization of each instruction on the CPU
and pipelining.
In von-Neuman structure you can explore program memory and make any
operation on data memory by by the mean of CPU.
In Harvard the memory is split in two parts and the CPU can’t explore or make
operations on such parts.
RISC Features
• Hardwired Control
The EPIC architecture encodes its instructions into 128-bit-wide bundles. Each
bundle contains three instructions encoded in 41 bits each and a 5-bit
template field. The template field contains information about the types of
instructions in the bundle and which instructions can be executed in parallel.
This allows all the slots of an instruction to be filled even if enough independent
instructions cannot be found. The template also specifies whether one or more
instructions in this bundle can be executed in parallel with at least the first
instruction of the next bundle.
VLIW (very long instruction word):
Superscalar processors of the 1990s had the functional units to execute multiple
instructions in parallel. However, they used a great deal of die area on scheduling
circuits used to determine which instructions could execute in parallel. One
suggested solution to this was Very Long Instruction Word (VLIW) architectures.
VLIW architectures bundle multiple instructions that can be executed in parallel
into a single long instruction. The compiler performs the scheduling, so that the
processor avoids wasting run time and silicon area determining which instructions
to execute in parallel. Most of the new Intel streaming SIMD extensions (SSE) are
VLIW instructions.
Advantages of VLIW
VLIW code ends up being very tied to the pipeline it was scheduled for. It's difficult or impossible to
change the pipeline depth and/or the mix of functional units without forcing a recompile of the code to
match the new pipeline.
EPIC encodes runs of instructions that are categorized into broad classes (memory, integer, etc.)
separated by stops. Instructions grouped together between two stops must be independent (ie. no
register dependencies), and can be safely issued in parallel. Different groups of instructions may be
dependent on each other.
The EPIC pipeline is protected, meaning that the pipeline will stall if you try to use an instruction result
before it's ready. This is in contrast to the exposed pipelines of a traditional VLIW.
EPIC leverages VLIW scheduling techniques to simplify its pipeline, however. A given EPIC processor has
a certain number of functional units, and those functional units have certain latencies. An EPIC
compiler can then schedule code as if it were running on a lesser-or-equal VLIW to avoid dependency
stalls and register hazards, keeping the pipeline full.
Because EPIC pipelines are fully protected, it's possible to run code compiled for one machine
configuration on a differently configured device. You can change the latency of the instructions and/or
the number of functional units. The code may not run optimally on a differently-configured machine,
but it will run correctly.
Comparison: CISC, RISC, VLIW
Superscalar Implementation
• Simultaneously fetch multiple instructions
• Logic to determine true dependencies involving
register values
• Mechanisms to communicate these values
• Mechanisms to initiate multiple instructions in
parallel
• Resources for parallel execution of multiple
instructions
• Mechanisms for committing process state in correct
order
Superscalar Execution
Example Architectures
• PowerPC 604
– six independent execution units:
• Branch execution unit
• Load/Store unit
• 3 Integer units
• Floating-point unit
– in-order issue
– register renaming
• Power PC 620
– provides in addition to the 604 out-of-order issue
• Pentium
– three independent execution units:
• 2 Integer units
• Floating point unit
– in-order issue