Compiler Optimizations1
Compiler Optimizations1
– “It is easy to invent programs that will benefit from any number of
repetitions of a sequence of optimizing transformations. While such
examples can be constructed, it is important to note that they occur very
rarely in practice. It is usually sufficient to apply the transformations that
make up an optimizer once, or at most twice to get all or almost all the
benefit one is likely to derive from them.”
• The second phrase and the statements in the previous slide are the
reasons why compilers are implemented the way they are.
I. Assignment statement
optimizations
1. Constant folding
2. Scalar replacement of aggregates
3. Algebraic simplification and Reassociation
4. Common subexpression elimination
5. Copy propagation
• Example
i = 320 * 200 * 32
main()
{
printf (“%s \n”, “red”);
}
DSP Transform
Formula Generator
SPL Program
Scalarizarion is
carried out before SPL Compiler Search
the program is fed to the Engine
FORTRAN compiler
C/FORTRAN Programs
Performance Evaluation
Target machine DSP Library
Basic Optimizations
(FFT, N=25, SPARC, f77 –fast –O5)
Basic Optimizations
(FFT, N=25, PII, g77 –O6 –malign-double)
Basic Optimizations
(FFT, N=25, MIPS, f77 –O3)
Algebraic simplification and Reassociation
• For integers:
– Expressions simplification
• i +0 = 0 + i = i - 0 = i
• i ^ 2 = i * i (also strenght reduction)
• i*5 can be done by t := i shl 3; t=t-i
– Associativity and distributivity can be applied to improve parallelism (reduce the
height of expression trees).
• Algebraic simplifications for floating point operations are seldom applied.
– The reason is that floating point numbers do not have the same algebraic
properties as real numbers. For example, the in the code
eps:=1.0
while eps+1.0>1.0
oldeps := eps
eps:=0.5 * eps
+
+
a +
+ +
b +
a b c
+
c +
d e
d e
B 7.#2#+&2#+%/'&%+C-/+-.#+9&3('+9%/'DEF+*%/9&%F+&)1+()-#2$2/'#102&%
G#23(/)3+/4+'3#6
Copy propagation
• Eliminates unnecessary copy operations.
• For example:
x = y
<other instructions>
t = x + 1
Is replaced (assuming that neither x nor y are reassigned in …) with
<other instructions>
t = y + 1
• Copy propagation is useful after common subexpression elimination. For example.
x = a+b
…
y = a+b
• Is replaced by CSE into the following code
t = a+b
x = t
…
z = x
y = a+b
• Here x=t can be eliminated by copy propagation.
Example
.L95:
pp1() ! 8 a[i][j]=0;
sethi%hi(.L_cseg0),%o0
{ ld[%o0+%lo(.L_cseg0)],%f2
float a[100][100]; sethi39,%o0
integer a(100)
t1=202
do i=1,100
t1= t1-2
a(i)=t1
Bernstein’s conditions and induction variables
Strength reduction
• From Allen, Cocke, and Kennedy “Reduction of Operator Strength” in Muchnick and Jones
“Program Flow Analysis” AW 1981.
• In real compiler probably multiplication to addition is the only optimization performed.
• Candidates for strength reduction
1.Multiplication by a constant
loop
n=i*a
...
i=i+b
– after strength reduction
loop
n=t1
...
i=i+b
t1=t1+a*b
– after loop invariant removal
c = a * b
t1 = i*a
loop
n=t1
...
i=i+b
t1=t1+c
Strength reduction
2.Multiplication by a constant plus a term
loop
n=i*a+c
...
i=i+b
– after strength reduction
loop
n=t1
...
i=i+b
t1=t1+a*b
– Notice that the update to t1 does not change by the addition of
the constant. However, the initialization assignment before the
loop should change.
Strength reduction
3.Two induction variables multiplied by a constant and added
loop
n=i*a+j*b
...
i=i+c
...
j=j+d
– after strength reduction
loop
n=t1
...
i=i+c
t1=t1+a*c
j=j+d
t1=t1+b*d
Strength reduction
4.Multiplication of one induction variable by another
loop
n=i*j
...
i=i+a
...
j=j+b
– After strength reduction of i*j
loop
n=t1
...
--------- t1=i*j
i=i+a
--------- new t1 should be (i+a)*j=t1+a*j
t1=t1+a*j
...
j=j+b
-------- new t1 should be i*(j+b)=t1+b*i
t1=t1+b*i
Strength reduction
• After strength reduction of a*j
loop
n=t1
...
i=i+a
t1=t1+t2
...
j=j+b
t1=t1+b*i
t2=t2+a*b
• b*i is handled similarly.
Strength reduction
5.Multiplication of an induction variable by itself
loop
n=i*i
...
i=i+a
– After strength reduction
loop
n=t1
...
i=i+a
-------- new t1 should be (i+a)*(i+a)=t1+2*a*i+a*a
t1=t1+2*a*i+a*a
– Now strength reduce 2*a*i+a*a
loop
n=t1
...
i=i+a
t1=t1+t2
-------- new t2 should be 2*a*(i+a)+a*a=t2+2*a*a
t2=t2+2*a*a
Strength reduction
6. Integer division
loop
n=i/a
...
i=i+b
– After strength reduction
loop
n=t1
...
i=i+b
t2=t2+(b mod a)
if t2 >= a then
t1++
t2=t2-a
t1=t1+b/a
Strength reduction
Strength reduction
8. Exponentiation
loop
x=a^i
...
i=i+b
– After strength reduction
loop
x=t1
...
i=i+b
t1=t1*(a^b)
Strength reduction
9. Trigonometric functions
loop
y=sin(x)
...
x=x+_x
– After strength reduction
loop
y=sin(x)
...
x=x+_x
tsinx=tsinx*tcos_x+tcosx*tsin_x
tcosx=tsinx*tsin_x+tcosx*tcos_x
Procedure integration
Register allocation
Instruction scheduling
.
.
a=b+c
b=c*2
a = a+1
c>0
Y N
Straightening
If simplification
Loop inversion
• Transforms a while loop into a repeat loop.
– Repeat loops have only a conditional branch at the end.
– While loops have a conditional branch at the beginning and an
uncoditional branch at the end
– For the conversion the compiler needs to prove that the loop is
entered at least once.
Unswitching
Cache Optimizations
• Usually required “deep” transformations to the program.
• Most studied are those transformations related to loops:
– Loop tiling (blocking)
– Loop fission/fusion
– Loop interchange
• Although these are well understood, they are seldom
implemented in real compilers
• Other transformations that have been studied include the
change of data structures to increase locality.
• More will be said about this when we discuss locality
optimizations.
VII. Vectorization and
parallelization
Alliant FX/80
See. R. Eigenmann, J. Hoeflinger, D. Padua On the Automatic Parallelization
of the Perfect Benchmarks. IEEE TPDS, Jan. 1998.
Vectorization
Locality Enhancement
See. J. Xiong, J. Johnson, and D Padua. SPL: A Language and Compiler for DSP
Algorithms. PLDI 2001
• http://www.coyotegulch.com/products/acovea/index.html