Lecture 5 Bsit
Lecture 5 Bsit
Superblocks:
Superblocks. Superblocks are regions of code with only one entry point and one or more exit
points.A widely used alternative to the trace is the superblock (Hwu et al. 1993);
by construction, superblocks have only one entrance at the top and no side
entrances. In contrast, a trace may have both side entrances and side exits. For
example, in Figure 4.21, the trace ADEG contains a side exit from block D and
two side entrances into block G. As we shall see in Section 4.5.3, disallowing
side entrances simplifies later code optimizations.
It might first appear that if superblocks are formed using a method like the
one described for traces, the result is relatively small superblocks; for example,
in Figure 4.22a ADE, BC, F, and G form a complete set of superblocks. These
blocks are smaller than the traces and in some cases may be too small to
provide many opportunities for optimizations. However, larger superblocks
can be formed by allowing some basic blocks to appear more than once. This is
illustrated where larger superblocks have been formed. Here,
the superblock ADEG contains the most common sequence of basic blocks
(according to the profile information given in Figure 4.12). Now, because
block G in superblock ADEG can only be reached via a side entrance, block
G is replicated for the superblocks that contain BCG and FG. The process of
replicating code that appears at the end of a superblock in order to form other
superblocks is referred to as tail duplication.
Starting Points:
a superblock should start at a heavily used basic block. Consequently,
as code is initially being emulated, either through interpretation or
simple basic block translation, profile information is collected in order to determine
those heavily used basic blocks where superblock formation should be
started. For this purpose, there are two methods for determining profile points.
One is simply to profile all basic blocks. Another is to use heuristics based on
program structure to select a narrower set of good candidate start points and
then to profile only at those points.
A second heuristic is to use an exit arc from an existing superblock. These arcs
are good candidates because, by definition, the existing superblocks are known
to be hot, and some exit points will also be hot (although perhaps somewhat
less so than the original basic block).
Continuation: After a superblock is begun at an initial basic block, the next consideration is
which subsequent blocks should be collected and added as the superblock is
grown. This can be done using either node or edge information. There are two
basic heuristics for using this information: One is most frequently used and the
other is most recently used.
Stopping Points:
1. The start point of the same superblock is reached. This indicates the
closing of a loop that was started with this superblock. In some systems,
superblock formation can continue even after a loop is closed, which in
effect leads to dynamic loop unrolling.
2. A start point of some other superblock is reached. When this occurs,
superblock formation stops and the two superblocks can be linked together
(Section 2.7).
3. A superblock has reached some maximum length. This maximum length
may vary from a few tens to hundreds of instructions. A reason for having
a maximum length is that it will keep code expansion in check. Because a
basic block can be used in more than one superblock, there may be multiple
copies of a given basic block. The longer superblocks grow, the more basic
block replication there will be.
4. When using the most-frequently-used heuristic, there are no more
candidate basic blocks that have reached the candidate threshold.
Tree Groups:
Although traces and superblocks (as well as dynamic basic blocks) are the
most commonly used units for translation and optimization, there are other
possibilities. Traces and superblocks are based on the principle that conditional
branches are predominantly decided one way However, there are
some branches for which this is not the case. For example, almost
20% of the branches range between 30–70 and 70–30, taken versus not-taken.
Almost 10% of the branches are about 50–50. For branches that tend to split
their decisions, a superblock or trace side exit is frequently taken. When this
happens, there is often overhead involved in compensation code.
Optimization Framework:
we begin with traces and superblocks and consider ways of optimizing code within these large
translation blocks. Dynamic optimizations are performed in addition to any optimizations the
original compiler may have done. Because optimization is being performed
at run time, however, there are new optimization opportunities that may not
have been available to the static compiler. In general, these new opportunities
involve optimizations along frequently followed paths that cross basic block
boundaries.
Code Reordering:
An important optimization performed in a number of virtual machine applications
is code reordering. In many microarchitectures, performance is affected
by the order in which instructions are issued and executed. The most significant
examples are simple pipelined microarchitectures that execute instructions
strictly in program order. This was done in many of the early RISC processors
and is still done in a number of embedded processors.
Code Optimizations:
There are a number of optimizations that can be applied within translation
blocks to reduce execution time. Even if the original source binary code was
optimized when it was produced, additional optimization opportunities are
often present in the dynamic environment. For example, superblock formation
removes control flow join points, creating a locally different control flow than
in the original code. Partial procedure inlining converts what would originally
be interprocedural analysis into intrablock analysis.
Compatibility Issues:
An optimization is safe with respect to traps if, after the optimization is
performed, every trap in the original source ISA code is detected in the translated
target code and either traps to a handler in the runtime or branches
directly to the runtime. Furthermore, the precise architected state at that point
must be recoverable. There are no hard-and-fast rules for determining exactly
which optimizations are safe, but optimizations that do not remove trapping
operations tend typically to be safe. For example, copy-propagation, constant propagation,
and constant-folding optimizations are usually safe. There may
be some end cases where compatibility becomes an issue, for example, if constant
folding happens to result in overflow. This case can be determined at
optimization time and can be disabled in those rare cases where it occurs.
Inter-superblock Optimizations:
All the optimizations discussed thus far have a scope that is restricted to a
single superblock. During the optimization process only a single superblock is
buffered and analyzed. The register state at the time the superblock is entered
and exited is precisely consistent with the register state at the same points
in the source code. Even though superblocks are relatively large and provide
good optimization opportunities, there may be room for additional optimizations
that span superblock boundaries. One solution is to use tree groups, as
described .Another solution is to optimize across superblocks
incrementally.
At the time two superblocks are linked together, they can both be reexamined
and reoptimized, based on the new knowledge of control flow and a
more complete context. In theory, this could be a complete reoptimization and
reformulation of both superblocks that are being linked. However, this would
lead to additional optimization overhead and could affect the recovery of precise
state (i.e., superblock exits may no longer be precise exception points with
a consistent register mapping). Furthermore, the original superblocks are optimized
to take into account the most common control flow path, and modifying
some of these optimizations to account for early exit paths may be counterproductive.
Consequently, it is probably a better approach to stick with the original
superblock optimizations and optimize across superblocks only at the “seams.”
Instruction-Set-Specific Optimizations:
Each instruction set has its own features and quirks that could lead to special
optimizations that are instruction-set specific.