0% found this document useful (0 votes)
13 views32 pages

19 Code Optimization 17-02-2025

The document discusses various methodologies for program optimization, including expression simplification, dead code elimination, procedure inlining, and loop transformations. It explains techniques such as loop unrolling, loop fusion, loop fission, and loop tiling, highlighting their benefits and potential drawbacks. The goal of these optimizations is to improve program performance by reducing execution time and resource usage.

Uploaded by

Qwerty 123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views32 pages

19 Code Optimization 17-02-2025

The document discusses various methodologies for program optimization, including expression simplification, dead code elimination, procedure inlining, and loop transformations. It explains techniques such as loop unrolling, loop fusion, loop fission, and loop tiling, highlighting their benefits and potential drawbacks. The goal of these optimizations is to improve program performance by reducing execution time and resource usage.

Uploaded by

Qwerty 123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 32

Program Optimization

Program Optimization
Methodologies for program optimisation
Expression simplification
Dead Code Elimination
Procedure Inlining
Loop transformation
Program Optimization – Expression Simplification

useful area for machine-independent


transformations
We can use the laws of algebra to
simplify expressions
Consider the following expression:
a*b + a*c
We can use the distributive law to
rewrite the expression as a*(b + c)

VIT University
Contd..
Since the new expression has
only two operations rather than
three for the original form, it is
almost certainly cheaper,
because it is both faster and
smaller
a*b + a*c=a*(b + c)
Program Optimization – Expression
Simplification
We can also use the laws
of arithmetic to further
simplify expressions on
constants
Consider the following C
statement:
for (i = 0; i < 8 + 1; i+
+)
VIT University
Contd..
We can simplify 8+ 1 to 9 at compile time
there is no need to perform that
arithmetic while the program is
executing
Why would a program ever contain
expressions that evaluate to constants?
Using named constants rather than
numbers is good programming practice
and often leads to constant expression
Program Optimization
Methodologies for program optimization
Expression simplification
Dead Code Elimination
Procedure Inlining
Loop transformation
Program Optimization – Dead Code
Elimination

Code that will never be


executed can be safely
removed from the program
The general problem of
identifying code that will
never be executed is
difficult, but
there are some important
special cases where it can
be done VIT University
Programmers generated Dead
Code
Programmers will intentionally introduce dead code in certain
situations.
Consider this C code fragment:

#define DEBUG 0
...
if (DEBUG) print_debug_stuff();
In the above case, the print_debug_stuff( ) function is never
executed, but the code allows the programmer to override the
preprocessor variable definition (perhaps with a compile-time flag)
to enable the debugging code

This case is easy to analyze because the condition is the constant


0,which C uses for the false condition
Program Optimization – Dead Code Elimination
Since there is no else clause in the if
statement, the compiler can totally
eliminate the if statement, rewriting the
CDFG to provide a direct edge between the
statements before and after the if

Compilers generated Dead Code


Some dead code may be introduced by the
compiler

For example, certain optimizations


introduce copy statements that copy one
variable to another

If uses of the first variable can be replaced


by references to the second one, then the
copy statement becomes dead code that
can be eliminated.
VIT University
Program Optimization
Methodologies for program optimization
Expression simplification
Dead Code Elimination
Procedure Inlining
Loop transformation
Program Optimization – Procedure Inlining

Inlining is a technique in C++


that allows the compiler to
replace a function call with the
body of the function itself. This
can improve the performance
of the program by eliminating
the overhead of function calls.

VIT University
There are a few things to keep in mind
when using inline functions:
Inline functions should be small and simple.
The compiler is less likely to inline a large
or complex function.
Inline functions should be used sparingly.
Inlining too many functions can actually
make the program slower.
Inline functions should not be used for
recursive functions. The compiler cannot
inline a recursive function, and doing so
will likely result in a stack overflow.
Program Optimization
Methodologies for program optimization
Expression simplification
Dead Code Elimination
Procedure Inlining
Loop transformation

Loop Unrolling
Loop fusion
Loop fission
Loop Tiling
Program Optimization – Loop Transformations

Loops are important program structure

Although they are compactly described


in the source code, they often use a
large fraction of the computation time

A simple but useful transformation is


known as loop unrolling.

Loop unrolling is important


because it helps expose
parallelism that can be used by
later stages of the compiler
VIT University
Loop unrolling is important because it
helps expose parallelism that can be
used by later stages of the compiler

A simple loop in C follows:

for (i = 0; i < N; i++) {


a[i] = b[i]*c[i];
}
This loop is executed a fixed number of
times, namely, N

VIT University
Contd..
Loop Overheads
A straightforward implementation of the
loop would
create and initialize the loop variable i ,
update its value on every iteration,
and test it to see whether to exit the loop

However, since the loop is executed a fixed


number of times, we can generate more
direct code
Program Optimization – Loop Unrolling

If we let N = 4, then we can substitute


the above C code for the following
loop:
a[0] = b[0]*c[0];
a[1] = b[1]*c[1];
a[2] = b[2]*c[2];
a[3] = b[3]*c[3];

This unrolled code has no loop


overhead code at all, that is, no
iteration variable and no tests

But the unrolled loop has the same


problems as the inlined procedure
it may interfere with the cache and
expands the amount of code required
VIT University
Partial Loop Unrolling
We do not, of course, have to fully unroll loops

Rather than unroll the above loop four times, we could unroll it twice
The following code results:
for (i = 0; i < 2; i++) {
a[i*2] = b[i*2]*c[i*2];
a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1];
}

In this case, since all operations in the two lines of the loop body are
independent, later stages of the compiler may be able to generate
code that allows them to be executed efficiently on the CPU’s
pipeline
Program Optimization –Loop fusion

Loop fusion combines two or more loops


into a single loop

For this transformation to be legal, two


conditions must be satisfied
First, the loops must iterate over the
same values
Second, the loop bodies must not have
dependencies that would be violated if
they are executed together
for example, if the second loop’s ith
iteration depends on the results of
the I+1th iteration of the first loop,
the two loops cannot be combined

VIT University
Loop fusion Example
The two adjacent loops on the code fragment below can be fused into on
loop.
Below is the code
for (i = 0; i < 300;fragment
i++) after loop
a[i] = a[i] + 3; fusion.
for (i = 0; i < 300; i+
for (i = 0; i < 300;+)i++)
b[i] = b[i] + 4; {
a[i] = a[i] + 3;
b[i] = b[i] + 4;
Loop fusion Caution to use
Notes:
Loop fusion is not commonly supported by C compilers.
Although loop fusion reduces loop overhead, it does not
always improve run-time performance, and may reduce
run-time performance. For example, the memory
architecture may provide better performance if two
arrays are initialized in separate loops, rather than
initializing both arrays simultaneously in one loop.
Loop fission
Loop distribution (Loop fission) is
the opposite of loop fusion, that
is, decomposing a single loop into
multiple loops
int i, a[100],
int i, a[100],
b[100]; b[100];
for (i = 0; i < for (i = 0; i < 100;
100; i++) { i++) {
a[i] = 1; a[i] = 1;
b[i] = 2; }
}
for (i = 0; i < 100;
is equivalent
Loop tiling
Basic Idea:
1.

1. Loop tiling divides the


processing of data into
smaller segments
called tiles or blocks.
2. These smaller blocks fit
more readily into data
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
c[i][j] = a[i] * b[j];
} If you look more closely, the whole array b needs to be fetched from the
} memory n times. This is not a problem if the size of the array b is small,
since then the values for b would be brought up from the main memory to
the data cache once and reused every time after that.
The problem happens when the array b is large. In that case, the CPU would
need to bring the values of the array b n times from the main memory, since
b doesn’t fit the data cache.
The solution to this problem is loop tiling: instead of running the inner loop
over j from 0 to m and then increasing the variable i, we pick a constant
called TILE_SIZE and run the loop according to this pattern:
Explanation of Example (Loop tiling)
In this example, the entire array b needs to be fetched from memory
n times.
If b is large, repeated fetching from memory becomes inefficient.
Instead of running the inner loop over j from 0 to m, we
introduce a constant called TILE_SIZE.
After Loop tiling
for (int jj = 0; jj < m; jj += TILE_SIZE) {
for (int i = 0; i < n; i++) {
for (int j = jj; j < MIN(jj + TILE_SIZE,
m); j++) {
c[i][j] = a[i] * b[j];
}
}
}
How to perform loop tiling
By doing this, we reuse the part of array b already in
the cache.
The values for array a are read m/TILE_SIZE times
from memory.
Further Optimization:
If the size of array a is also large, we can apply loop tiling
to the loop over i as well.
This results in better memory subsystem usage.
Choosing Tile Size:
Experimentally determine the optimal TILE_SIZE.
Start with a value (e.g., 16) and adjust until best performance is
achieved.
When to Apply Loop Tiling?

Scenario 1: Iterating over the same dataset


several times (as shown in the example).
Scenario 2: When data sets are large and cache
efficiency matters.

You might also like