0% found this document useful (0 votes)
10 views9 pages

18 Code Optimization 07-02-2025

The document discusses various compiler optimization techniques that improve code efficiency, including function inlining, outlining, loop transformations, dead code elimination, and register allocation. It also covers software performance optimization strategies, such as basic loop optimizations, cache-oriented loop optimizations, and techniques like data realignment, array padding, and loop tiling to enhance cache behavior. These methods aim to reduce execution time and memory usage in compiled programs.

Uploaded by

lanoxof509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

18 Code Optimization 07-02-2025

The document discusses various compiler optimization techniques that improve code efficiency, including function inlining, outlining, loop transformations, dead code elimination, and register allocation. It also covers software performance optimization strategies, such as basic loop optimizations, cache-oriented loop optimizations, and techniques like data realignment, array padding, and loop tiling to enhance cache behavior. These methods aim to reduce execution time and memory usage in compiled programs.

Uploaded by

lanoxof509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Optimization Techniques

Compilation involve Translation and Optimization

I. Compiler Optimization Technique

Basic compilation techniques can generate inefficient code. Compilers use a wide
range of algorithms to optimize the code they generate.

Techniques available are :


1. Function inlining
2. Outlining
3. Loop transformations
- Loop unrolling
- Loop fusion and distribution
- Dead code elimination
- Register allocation
4. Compile time evaluation (Constant folding, Constant Propagation)

Function inlining replaces a subroutine call to a function with equivalent code to the function
body. By substituting the function call’s parameters into the body, the compiler can generate a
copy of the code that performs the same operations but without the subroutine overhead. C++
provides an inline qualifier that allows the compiler to substitute an inline version of the
function. In C, programmers can perform inlining manually or by using a preprocessor macro to
define the code body.
Although inlining eliminates function call overhead, it also increases program size. Inlining also
inhibits sharing of the function code in the cache, because the inlined copies are distinct pieces of
code, they cannot be represented by the same code in the cache. Outlining is sometimes useful to
improve the cache behavior of common functions.

Outlining is the opposite operation to inlining - a set of similar sections of code replaced with
calls to an equivalent function.

Loop Transformations:

Loops are important program structures - although they are compactly described in the source
code, they often use a large fraction of the computation time. Many techniques have been
designed to optimize loops.
A simple but useful transformation is known as loop unrolling. Loop unrolling is important
because it helps expose parallelism that can be used by later stages of the compiler.
Loop fusion
It combines two or more loops into a single loop. For this transformation to be legal, two
conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop
bodies must not have dependencies that would be violated if they are executed together.
for example, if the second loop’s ith iteration depends on the results of the (i+1)th iteration of the
first loop, the two loops cannot be combined.

Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple
loops.

Dead Code Elimination

Dead code is code that can never be executed. Dead code can be generated by programmers,
either inadvertently or purposefully. Dead code can also be generated by compilers. Dead code
can be identified by reachability analysis, finding the other statements or instructions from
which it can be reached. If a given piece of code cannot be reached, or it can be reached only by
a piece of code that is unreachable from the main program, then it can be eliminated.

Register allocation is a very important compilation phase. Given a block of code,


we want to choose assignments of variables (both declared and temporary) to registers to
minimize the total number of required registers.
If a section of code requires more registers than are available, we must spill some
of the values out to memory temporarily. After computing some values, we write the values to
temporary memory locations, reuse those registers in other computations,

using graph coloring :

Example : Perform register allocation using life time graph


II. Software performance optimization
In this section we will look at several techniques for optimizing software
performance, including basic loop and cache-oriented loop optimizations as well as more generic
strategies.

Basic loop optimizations


3 techniques in optimizing loops:
- code motion
- induction variable elimination,
- strength reduction.

Code motion lets us move unnecessary code out of a loop. If a computation’s result
does not depend on operations performed in the loop body, then we can safely move it out of the
loop. Code motion opportunities can arise because programmers may find some computations
clearer and more concise when put in the loop body, even though they are not strictly dependent
on the loop iterations.
(Explain with example)

An induction variable is a variable whose value is derived from the loop iteration
variable’s value. The compiler often introduces induction variables to help it implement the loop.
Properly transformed, we may be able to eliminate some variables and apply strength reduction
to others.
A nested loop is a good example of the use of induction variables. Here is a simple
nested loop:

The compiler uses induction variables to help it address the arrays. Let us rewrite
the loop in C using induction variables and pointers.

Cache-oriented loop optimizations


A loop nest is a set of loops, one inside the other. Loop nests occur when we process arrays. A
large body of techniques has been developed for optimizing loop nests.

Data realignment and Array padding


It is required to optimize the cache behavior of the below code

Assume that a and b arrays are sized with M at 265 and N at 4 and a 256-line, four-way set
associative cache with four words per line. The starting location for a[ ] is 1024 and the starting
location for b[ ] is 4099.

Although a[0][0] and b[0][0] do not map to the same word in the cache, they do map to the same
block.
Once the a[0][1] access brings that line into the cache, it remains there for the a[0][2] and a[0][3]
accesses because the b[] accesses are now on the next line. However, the scenario repeats itself at
a[1][0] and every four iterations of the cache. One way to eliminate the cache conflicts is to
move one of the arrays. We do not have to move it far. If we move b’s start to 4100, we
eliminate the cache conflicts.
However, that fix will not work in more complex situations. Moving one array may only
introduce cache conflicts with another array. In such cases, we can use another technique called
padding. If we extend each of the rows of the arrays to have four elements rather than three, with
the padding word placed at the beginning of the row, we eliminate the cache conflicts.
In this case, b[0][0] is located at 4100 by the padding. Although padding wastes memory, it
substantially improves memory performance. In complex situations with multiple arrays and
sophisticated access patterns, we have to use a combination of techniques, relocating arrays and
padding them to be able to minimize cache conflicts.

Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the
operations on a subset of the data. Loop tiling changes the order in which array elements are
accessed, thereby allowing us to better control the behavior of the cache during loop execution.
The next example illustrates the use of loop tiling.

You might also like