HPC Unit 5 A
HPC Unit 5 A
6.172
Performance !"##$*
Engineering %&'&(
of Software
Systems
"#)*+)$#)*+,*-./01*
LECTURE 2
Bentley Rules for
Optimizing Work
Julian Shun
1
© 2008–2018 by the MIT 6.172 Lecturers
Work
Definition.
The work of a program (on a given input) is
the sum total of all the operations executed
by the program.
2
© 2008–2018 by the MIT 6.172 Lecturers Image used under CC0 by openclipart.
Optimizing Work
● Algorithm design can produce dramatic
reductions in the amount of work it takes to
solve a problem, as when a Θ(n lg n)-time sort
replaces a Θ(n2)-time sort.
● Reducing the work of a program does not auto-
matically reduce its running time, however, due
to the complex nature of computer hardware:
• instruction-level parallelism (ILP),
• caching,
• vectorization,
• speculation and branch prediction,
• etc.
● Nevertheless, reducing the work serves as a good
heuristic for reducing overall running time.
3
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(
"#)*+)$#)*+,*-./01*
“BENTLEY”
OPTIMIZATION RULES
4
© 2008–2018 by the MIT 6.172 Lecturers
New “Bentley” Rules
● Most of Bentley’s original rules dealt with work, but
some dealt with the vagaries of computer
architecture three and a half decades ago.
● We have created a new set of Bentley rules dealing
only with work.
● We shall discuss architecture-dependent
optimizations in subsequent lectures.
5
© 2008–2018 by the MIT 6.172 Lecturers
New Bentley Rules
Data structures Logic
● Packing and encoding ● Constant folding and
● Augmentation propagation
● Precomputation ● Common-subexpression
● Compile-time initialization elimination
● Caching ● Algebraic identities
● Lazy evaluation ● Short-circuiting
● Sparsity ● Ordering tests
● Creating a fast path
Loops
● Combining tests
● Hoisting
● Sentinels Functions
● Loop unrolling ● Inlining
● Loop fusion ● Tail-recursion elimination
● Eliminating wasted iterations ● Coarsening recursion
6
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(
"#)*+)$#)*+,*-./01*
DATA STRUCTURES
7
© 2008–2018 by the MIT 6.172 Lecturers
Packing and Encoding
The idea of packing is to store more than one data
value in a machine word. The related idea of
encoding is to convert data values into a
representation requiring fewer bits.
!"#$%&''()*!"#$"+$!"#$,-$.$
!/$*"$0$,-$1)#21"$34$
!/$*"$55$3-$1)#21"$64$
!/$*,$55$3-$1)#21"$64$
1)#21"$%&''()*"76+$,76-$8$%&''()*"76+$,-4$
9$
12
© 2008–2018 by the MIT 6.172 Lecturers
Precomputing Pascal
!"#$%&#>'())*+,*-.+>/00>
%&1>23445#6'())*+,*-.+76'())*+,*-.+78>
94%">%&%1,23445#:;><>
$4=>:%&1>&>?>08>&>@>'())*+,*-.+8>AA&;><>
23445#6&7607>?>/8>
23445#6&76&7>?>/8>
B>
$4=>:%&1>&>?>/8>&>@>'())*+,*-.+8>AA&;><>
23445#6076&7>?>08>
$4=>:%&1>C>?>/8>C>@>&8>AAC;><>
23445#6&76C7>?>23445#6&D/76CD/7>A>23445#6&D/76C78>
23445#6C76&7>?>08>
B>
B>
B>
Example
!"# $%&&'()*+,)*+, - .
. */ +/ +/ +/ +/ +/ +/ +/ +/ +/ 0/
. */ */ +/ +/ +/ +/ +/ +/ +/ +/ 0/
. */ 1/ */ +/ +/ +/ +/ +/ +/ +/ 0/
. */ 2/ 2/ */ +/ +/ +/ +/ +/ +/ 0/
. */ 3/ 4/ 3/ */ +/ +/ +/ +/ +/ 0/
. */ 5/ *+/ *+/ 5/ */ +/ +/ +/ +/ 0/
. */ 4/ *5/ 1+/ *5/ 4/ */ +/ +/ +/ 0/
. */ 6/ 1*/ 25/ 25/ 1*/ 6/ */ +/ +/ 0/
. */ 7/ 17/ 54/ 6+/ 54/ 17/ 7/ */ +/ 0/
. */ 8/ 24/ 73/ *14/ *14/ 73/ 24/ 8/ */ 0/
09
14
© 2008–2018 by the MIT 6.172 Lecturers
Compile-Time Initialization (2)
Idea: Create large static tables by metaprogramming.
!"#:$%!"&!"#:%'()*:)+",#:)-%':.%'(/012:3:
!"!#4)-++,5&26:
7'!"#8&9!"#:)-++,50;<10;<1:=:3>"?26:
8+':&!"#:%:=:<6:%:@:;<6:AA%2:3:
7'!"#8&?::3?26:
8+':&!"#:B:=:<6:B:@:;<6:AAB2:3:
7'!"#8&?CDE*:?*:)-++,50%10B126:
F:
7'!"#8&?F*>"?26:
F:
7'!"#8&?F6>"?26:
F:
15
© 2008–2018 by the MIT 6.172 Lecturers
Caching
The idea of caching is to store results that have been
accessed recently so that the program need not
compute them again.
!2
! ! " " " # " $
# &
"2
# " $ # " % & &
#2 " " " ' " ( n=6
# &
$2 # % " " ! " " & nnz = 14
%2 ## " " " " % " &&
&2 " " " " ) & * %
!2 "2 #2 $2 %2 &2
:2,%5'#>:@'#;('$=>;!(,?=!51A.5%2)96$51?.5%2)96$51"B5+5
&2(5@,-!5,5C5D05,5E5AFG-05,HHB5+5
"I,J5C5D05
&2(5@,-!5K5C5AFG(23'I,J05K5E5AFG(23'I,HLJ05KHHB5+5
,-!5M5C5AFG*26'IKJ05
"I,J5HC5AFG:;6'IKJ515?IMJ05
<5
<5
<5
Vertex IDs 0 1 2 3 4 3 1
Offsets 0 2 5 5 6 7
Edges 1 3 2 3 4 2 2 2 4
"#)*+)$#)*+,*-./01*
LOGIC
22
© 2008–2018 by the MIT 6.172 Lecturers
Constant Folding and Propagation
The idea of constant folding and propagation is to
evaluate constant expressions and substitute the
result into further expressions, all during compilation.
!"#$%&'(I)*+,-.-/I
01"'I122(2345I6I
$1#7,I'1&8%(I2+'"&7I9I:;<=>>>.>?I
$1#7,I'1&8%(I'"+*(,(2I9I@IAI2+'"&7?I
$1#7,I'1&8%(I$"2$&*B(2(#$(I9ICDEFIAI'"+*(,(2?I
$1#7,I'1&8%(I$2177D+2(+I9ICDEFIAI2+'"&7IAI2+'"&7?I
$1#7,I'1&8%(I7&2B+$(D+2(+I9I$"2$&*B(2(#$(IAI'"+*(,(2?I
$1#7,I'1&8%(I01%&*(I9IGIAICDEFIAI2+'"&7IAI2+'"&7IAI2+'"&7IHI;?I
HHI...I
JI
23
© 2008–2018 by the MIT 6.172 Lecturers
Common-Subexpression Elimination
The idea of common-subexpression elimination is to
avoid computing the same expression multiple times
by evaluating the expression once and storing the
result for later use.
!)")#)$)%&) !)")#)$)%&)
#)")!)' (&) #)")!)' (&)
%)")#)$)%&) %)")#)$)%&)
()")!)' (&) ()")#&)
24
© 2008–2018 by the MIT 6.172 Lecturers
Algebraic Identities
The idea of exploiting algebraic identities is to replace
expensive algebraic expressions with algebraic
equivalents that require less work.
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;
+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(;
'-&,%(;69; ::;62'"&*;-5;,2%%;
>;,2%%?+9;
'-&,%(;*@&26(A'-&,%(;8B;7;
6(+&6#;8C89;
>;
,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;';G;*@6+A*@&26(A,D<08;< ,F<08B;
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=BB9;
6(+&6#;';)G;,D<06;H;,F<069;
>;
25
© 2008–2018
2008 2018 by the MIT 6.172 Lecturers
Algebraic Identities
The idea of exploiting algebraic identities is to replace
expensive algebraic expressions with algebraic
equivalents that require less work.
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;
+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(; ,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;69; ::;62'"&*;-5;,2%%; '-&,%(;'*@&26(';G;*@&26(A,D<08;< ,F<08B;
>;,2%%?+9; H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=B9;
'-&,%(;*@&26(A'-&,%(;8B;7; 6(+&6#;'*@&26(';)G;*@&26(A,D<06;H;,F<06B9;
6(+&6#;8C89; >;
>;
,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7; అ
'-&,%(;';G;*@6+A*@&26(A,D<08;< ,F<08B; W ମ X exactly when
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=BB9; W ମ X .
6(+&6#;';)G;,D<06;H;,F<069;
>;
26
© 2008–2018
2008 2018 by the MIT 6.172 Lecturers
Short-Circuiting
When performing a series of tests, the idea of short-
circuiting is to stop evaluating as soon as you know
the answer.
!"#$%&'(2)*+',--%./02
1123%%2(%(4(#+*2-523267(2#-##(86+"9(2
,--%2*&4:(;$(('*<"#+2=3>2"#+2#>2"#+2%"4"+?2@2
"#+2*&42A2BC2
5-72<"#+2"2A2BC2"2)2#C2"DD?2@2
*&42DA23E"FC2
G2 !"#$%&'(2)*+',--%./02
7(+&7#2*&4202%"4"+C2 1123%%2(%(4(#+*2-523267(2#-##(86+"9(2
G2 ,--%2*&4:(;$(('*<"#+2=3>2"#+2#>2"#+2%"4"+?2@2
"#+2*&42A2BC2
5-72<"#+2"2A2BC2"2)2#C2"DD?2@2
Note that && and || *&42DA23E"FC2
are short-circuiting "52<*&4202%"4"+?2@2
7(+&7#2+7&(C2
logical operators, G2
and !"#$%"&"#'("$)*. G2
7(+&7#256%*(C2
G2 27
© 2008–2018 by the MIT 6.172 Lecturers
Ordering Tests
Consider code that executes a sequence of logical
tests. The idea of ordering tests is to perform those
that are more often “successful” — a particular
alternative is selected by the test — before tests that
are rarely successful. Similarly, inexpensive tests
should precede expensive ones.
!"#$%&'(>)*+',--%./0>
,--%>"*12/"+(*34$(5$/46>$7>8>
"9>5$>::>;<6;>==>$>::>;<+;>==>$>::>;>;>==>$>::>;<#;7>8>
6(+&6#>+6&(?>
@>
6(+&6#>94%*(?> !"#$%&'(>)*+',--%./0>
@> ,--%>"*12/"+(*34$(5$/46>$7>8>
"9>5$>::>;>;>==>$>::>;<#;>==>$>::>;<+;>==>$>::>;<6;7>8>
6(+&6#>+6&(?>
@>
6(+&6#>94%*(?>
@>
28
© 2008–2018 by the MIT 6.172 Lecturers
Creating a Fast Path
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;
+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(;
'-&,%(;69; ::;62'"&*;-5;,2%%;
>;,2%%?+9;
'-&,%(;*@&26(A'-&,%(;8B;7;
6(+&6#;8C89;
>;
,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;'*@&26(';G;*@&26(A,D<08;< ,F<08B;
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=B9;
6(+&6#;'*@&26(';)G;*@&26(A,D<06;H;,F<06B9;
>;
29
© 2008–2018 by the MIT 6.172 Lecturers
Creating a Fast Path
;<8=%#!&E9'7!$""%>?/E
;<8=%#!&E9@)7?>?/E
74A&!&BE'7*#=7ECE
!"#$%&E06E DDE0.=""*!<8)7&E
!"#$%&E46E DDE4.=""*!<8)7&E
!"#$%&E56E DDE5.=""*!<8)7&E
!"#$%&E*6E DDE*)!<#'E"BE$)%%E
:E$)%%F76E
!"#$%&E'(#)*&,!"#$%&E02ECE
*&7#*8E0G06E
:E
$""%E="%%<!&',$)%%F7EG$-HE$)%%F7EG$12ECE
<BE,,)$',$-./0EIE$1./02E/E,$-./*E3E$1./*22EJJE
,)$',$-./4EIE$1./42E/E,$-./*E3E$1./*22EJJE
,)$',$-./5EIE$1./52E/E,$-./*E3E$1./*222E
*&7#*8EB)%'&6E
!"#$%&E!'(#)*&!E+E'(#)*&,$-./0E. $1./02E
3E'(#)*&,$-./4E. $1./42E
3E'(#)*&,$-./5E. $1./526E
*&7#*8E!'(#)*&!E9+E'(#)*&,$-./*E3E$1./*26E
:E
30
© 2008–2018 by the MIT 6.172 Lecturers
Combining Tests
The idea of combining tests is to replace a sequence
of tests with one test or switch.
"#)*+)$#)*+,*-./01*
LOOPS
33
© 2008–2018 by the MIT 6.172 Lecturers
Hoisting
The goal of hoisting — also called loop-invariant code
motion — is to avoid recomputing loop-invariant code
each time through the body of a loop.
!"#$%&'( )*+,-.-/
34
© 2008–2018 by the MIT 6.172 Lecturers
Sentinels
Sentinels are special dummy values placed in a data
structure to simplify the logic of boundary conditions,
and in particular, the handling of loop-exit tests.
!"#$%&'(@)*+'"#+,-.@ !"#$%&'(@)*+'"#+,-.@
!"#$%&'(@)*+'/00%,-.@ !"#$%&'(@)*+'/00%,-.@
/00%@01(23%045"#+678+@9:;@*"<(8+@
*"<(8+@#=@>@
??@:**&A(*@+-B+@:H#I@B#'@:H#GKI@(L"*+@B#'@
??@:%%@(%(A(#+*@03@:@B2(@#0##(CB+"1(@
??@$B#@/(@$%0//(2('@
"#+678+@*&A@D@EF@ /00%@01(23%045"#+678+@9:;@*"<(8+@#=@>@
GG"@=@>@
302@5@*"<(8+@"@D@EF@"@)@#F@GG ??@:%%@(%(A(#+*@03@:@B2(@#0##(CB+"1(@
*&A@GD@:H"IF@ :H#I@D@MNO678P:QF@
+2&(F@
"3@5@*&A@)@:H"I@=@2(+&2#@+2&( :H#GKI@D@KF@ ??@02@B#R@S0*"+"1(@#&A/(2@
J@ *"<(8+@"@D@EF@
2(+&2#@3B%*(F@ "#+678+@*&A@D@:HEIF@
J@ 4-"%(@5@*&[email protected]@:H"I@=@>@
*&A@GD@:HGG"IF@
J@
"3@5"@)@#=@2(+&2#@+2&(F@
2(+&2#@3B%*(F@
J@
35
© 2008–2018 by the MIT 6.172 Lecturers
Loop Unrolling
Loop unrolling attempts to save work by combining
several consecutive iterations of a loop into a single
iteration, thereby reducing the total number of
iterations of the loop and, consequently, the number
of times that the instructions that control the loop
must be executed.
36
© 2008–2018 by the MIT 6.172 Lecturers
Full Loop Unrolling
!"# $%& ' () !"# $%& ' ()
*+, -!"# ! ' () ! . /() !001 2 $%& 0' 34(5)
$%& 0' 34!5) $%& 0' 34/5)
6 $%& 0' 3475)
$%& 0' 3485)
$%& 0' 3495)
$%& 0' 34:5)
$%& 0' 34;5)
$%& 0' 34<5)
$%& 0' 34=5)
$%& 0' 34>5)
37
© 2008–2018 by the MIT 6.172 Lecturers
Partial Loop Unrolling
!"# $%& ' () !"# $%& ' ()
*+, -!"# ! ' () ! . ") //!0 1 !"# 6)
$%& /' 23!4) *+, -6 ' () 6 . "78) 6 /' 90 1
5 $%& /' 2364)
$%& /' 236/:4)
$%& /' 236/;4)
$%& /' 236/84)
5
*+, -!"# ! ' 6) ! . ") //!0 1
$%& /' 23!4)
5
39
© 2008–2018 by the MIT 6.172 Lecturers
Eliminating Wasted Iterations
The idea of eliminating wasted iterations is to modify
loop bounds to avoid executing loop iterations over
essentially empty loop bodies.
40
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(
"#)*+)$#)*+,*-./01*
FUNCTIONS
41
© 2008–2018 by the MIT 6.172 Lecturers
Inlining
The idea of inlining is to avoid the overhead of a
function call by replacing a call to the function with
the body of the function itself.
!"#$%& '(#)*&+!"#$%& ,- .
*&/#*0 ,1,2
3
42
© 2008–2018 by the MIT 6.172 Lecturers
Inlining (2)
The idea of inlining is to avoid the overhead of a
function call by replacing a call to the function with
the body of the function itself.
!"#$%&B'(#)*&+!"#$%&B,-B.B
*&/#*0B,1,2B
3B
!"#$%&B'#45"65'(#)*&'+!"#$%&B178B90/B0-B.B
!"#$%&B'#4B:B;<;2B
6"*B+90/B9B:B;2B9B=B02B>>9-B.B
'#4B>:B'(#)*&+7?9@-2B
'/)/9AB90%90&B!"#$%&B'(#)*&+!"#$%&B,-B.B
3B
*&/#*0B'#42B *&/#*0B,1,2B
3B 3B
!"#$%&B'#45"65'(#)*&'+!"#$%&B178B90/B0-B.B
!"#$%&B'#4B:B;<;2B
Inlined functions can 6"*B+90/B9B:B;2B9B=B02B>>9-B.B
be just as efficient as 3B
'#4B>:B'(#)*&+7?9@-2B
44
© 2008–2018 by the MIT 6.172 Lecturers
Coarsening Recursion
The idea of coarsening recursion is to increase the
size of the base case and handle it with more efficient
code that avoids function-call overhead.
!"#$L%&#'()"*+,#-+L./0L#-+L-1L2L @$6A#-6LBCDEFCGHIL8JL
34#56L,-L7L81L2L !"#$L%&#'()"*+,#-+L./0L#-+L-1L2L
#-+L*L9L:;*+#+#"-,/0L-1<L 34#56L,-L7LBCDEFCGHI1L2L
%&#'()"*+L,/0L*1<L #-+L*L9L:;*+#+#"-,/0L-1<L
/L=9L*L=L8<L %&#'()"*+L,/0L*1<L
-L>9L*L=L8<L /L=9L*L=L8<L
?L -L>9L*L=L8<L
?L ?L
KKL#-)6*+#"-L)"*+LA"*L)M;55L;**;N)L
A"*L,#-+LOL9L8<LOLPL-<L==O1L2L
#-+L(6NL9L/QOR<L
#-+L#L9LOL> 8<L
34#56L,#L79LJLSSL/Q#RL7L(6N1L2L
/Q#=8RL9L/Q#R<L
>>#<L
?L
/Q#=8RL9L(6N<L
?L
?L
45
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(
"#)*+)$#)*+,*-./01*
SUMMARY
46
© 2008–2018 by the MIT 6.172 Lecturers
New Bentley Rules
Data structures Logic
● Packing and encoding ● Constant folding and
● Augmentation propagation
● Precomputation ● Common-subexpression
● Compile-time initialization elimination
● Caching ● Algebraic identities
● Lazy evaluation ● Short-circuiting
● Sparsity ● Ordering tests
● Creating a fast path
Loops
● Combining tests
● Hoisting
● Sentinels Functions
● Loop unrolling ● Inlining
● Loop fusion ● Tail-recursion elimination
● Eliminating wasted iterations ● Coarsening recursion
47
© 2008–2018 by the MIT 6.172 Lecturers
Closing Advice
● Avoid premature optimization. First get correct
working code. Then optimize, preserving
correctness by regression testing.
● Reducing the work of a program does not
necessarily decrease its running time, but it is a
good heuristic.
● The compiler automates many low-level
optimizations.
● To tell if the compiler is actually performing a
particular optimization, look at the assembly code.
48
© 2008–2018 by the MIT 6.172 Lecturers
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.
49