18 Code Optimization 07-02-2025
18 Code Optimization 07-02-2025
Basic compilation techniques can generate inefficient code. Compilers use a wide
range of algorithms to optimize the code they generate.
Function inlining replaces a subroutine call to a function with equivalent code to the function
body. By substituting the function call’s parameters into the body, the compiler can generate a
copy of the code that performs the same operations but without the subroutine overhead. C++
provides an inline qualifier that allows the compiler to substitute an inline version of the
function. In C, programmers can perform inlining manually or by using a preprocessor macro to
define the code body.
Although inlining eliminates function call overhead, it also increases program size. Inlining also
inhibits sharing of the function code in the cache, because the inlined copies are distinct pieces of
code, they cannot be represented by the same code in the cache. Outlining is sometimes useful to
improve the cache behavior of common functions.
Outlining is the opposite operation to inlining - a set of similar sections of code replaced with
calls to an equivalent function.
Loop Transformations:
Loops are important program structures - although they are compactly described in the source
code, they often use a large fraction of the computation time. Many techniques have been
designed to optimize loops.
A simple but useful transformation is known as loop unrolling. Loop unrolling is important
because it helps expose parallelism that can be used by later stages of the compiler.
Loop fusion
It combines two or more loops into a single loop. For this transformation to be legal, two
conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop
bodies must not have dependencies that would be violated if they are executed together.
for example, if the second loop’s ith iteration depends on the results of the (i+1)th iteration of the
first loop, the two loops cannot be combined.
Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple
loops.
Dead code is code that can never be executed. Dead code can be generated by programmers,
either inadvertently or purposefully. Dead code can also be generated by compilers. Dead code
can be identified by reachability analysis, finding the other statements or instructions from
which it can be reached. If a given piece of code cannot be reached, or it can be reached only by
a piece of code that is unreachable from the main program, then it can be eliminated.
Code motion lets us move unnecessary code out of a loop. If a computation’s result
does not depend on operations performed in the loop body, then we can safely move it out of the
loop. Code motion opportunities can arise because programmers may find some computations
clearer and more concise when put in the loop body, even though they are not strictly dependent
on the loop iterations.
(Explain with example)
An induction variable is a variable whose value is derived from the loop iteration
variable’s value. The compiler often introduces induction variables to help it implement the loop.
Properly transformed, we may be able to eliminate some variables and apply strength reduction
to others.
A nested loop is a good example of the use of induction variables. Here is a simple
nested loop:
The compiler uses induction variables to help it address the arrays. Let us rewrite
the loop in C using induction variables and pointers.
Assume that a and b arrays are sized with M at 265 and N at 4 and a 256-line, four-way set
associative cache with four words per line. The starting location for a[ ] is 1024 and the starting
location for b[ ] is 4099.
Although a[0][0] and b[0][0] do not map to the same word in the cache, they do map to the same
block.
Once the a[0][1] access brings that line into the cache, it remains there for the a[0][2] and a[0][3]
accesses because the b[] accesses are now on the next line. However, the scenario repeats itself at
a[1][0] and every four iterations of the cache. One way to eliminate the cache conflicts is to
move one of the arrays. We do not have to move it far. If we move b’s start to 4100, we
eliminate the cache conflicts.
However, that fix will not work in more complex situations. Moving one array may only
introduce cache conflicts with another array. In such cases, we can use another technique called
padding. If we extend each of the rows of the arrays to have four elements rather than three, with
the padding word placed at the beginning of the row, we eliminate the cache conflicts.
In this case, b[0][0] is located at 4100 by the padding. Although padding wastes memory, it
substantially improves memory performance. In complex situations with multiple arrays and
sophisticated access patterns, we have to use a combination of techniques, relocating arrays and
padding them to be able to minimize cache conflicts.
Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the
operations on a subset of the data. Loop tiling changes the order in which array elements are
accessed, thereby allowing us to better control the behavior of the cache during loop execution.
The next example illustrates the use of loop tiling.