0% found this document useful (0 votes)
14 views49 pages

HPC Unit 5 A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views49 pages

HPC Unit 5 A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

!

6.172
Performance !"##$*
Engineering %&'&(
of Software
Systems

"#)*+)$#)*+,*-./01*

LECTURE 2
Bentley Rules for
Optimizing Work
Julian Shun

1
© 2008–2018 by the MIT 6.172 Lecturers
Work

Definition.
The work of a program (on a given input) is
the sum total of all the operations executed
by the program.

2
© 2008–2018 by the MIT 6.172 Lecturers Image used under CC0 by openclipart.
Optimizing Work
● Algorithm design can produce dramatic
reductions in the amount of work it takes to
solve a problem, as when a Θ(n lg n)-time sort
replaces a Θ(n2)-time sort.
● Reducing the work of a program does not auto-
matically reduce its running time, however, due
to the complex nature of computer hardware:
• instruction-level parallelism (ILP),
• caching,
• vectorization,
• speculation and branch prediction,
• etc.
● Nevertheless, reducing the work serves as a good
heuristic for reducing overall running time.
3
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

“BENTLEY”
OPTIMIZATION RULES

4
© 2008–2018 by the MIT 6.172 Lecturers
New “Bentley” Rules
● Most of Bentley’s original rules dealt with work, but
some dealt with the vagaries of computer
architecture three and a half decades ago.
● We have created a new set of Bentley rules dealing
only with work.
● We shall discuss architecture-dependent
optimizations in subsequent lectures.

5
© 2008–2018 by the MIT 6.172 Lecturers
New Bentley Rules
Data structures Logic
● Packing and encoding ● Constant folding and
● Augmentation propagation
● Precomputation ● Common-subexpression
● Compile-time initialization elimination
● Caching ● Algebraic identities
● Lazy evaluation ● Short-circuiting
● Sparsity ● Ordering tests
● Creating a fast path
Loops
● Combining tests
● Hoisting
● Sentinels Functions
● Loop unrolling ● Inlining
● Loop fusion ● Tail-recursion elimination
● Eliminating wasted iterations ● Coarsening recursion

6
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

DATA STRUCTURES

7
© 2008–2018 by the MIT 6.172 Lecturers
Packing and Encoding
The idea of packing is to store more than one data
value in a machine word. The related idea of
encoding is to convert data values into a
representation requiring fewer bits.

Example: Encoding dates


● The string “September 11, 2018” can be stored in 18
bytes — more than two double (64-bit) words —
which must moved whenever a date is manipulated.
● Assuming that we only store years between 4096
B.C.E. and 4096 C.E., there are about 365.25 × 8192
≈ 3 M dates, which can be encoded in ⎡lg(3×106)⎤ =
22 bits, easily fitting in a single (32-bit) word.
● But determining the month of a date takes more
work than with the string representation.
8
© 2008–2018 by the MIT 6.172 Lecturers
Packing and Encoding (2)
Example: Packing dates
! Instead, let us pack the three fields into a word:
!"#$%$& '!()*! +
,-! "$.(/ 012
,-! 34-!5/ 62
,-! %."/ 72
8 %.!$9!2

! This packed representation still only takes 22 bits,


but the individual fields can be extracted much
more quickly than if we had encoded the 3 M dates
as sequential integers.
Sometimes unpacking and decoding are the
optimization, depending on whether more work is
involved moving the data or operating on it.
9
© 2008–2018 by the MIT 6.172 Lecturers
Augmentation
The idea of data-structure augmentation is to add
information to a data structure to make common
operations do less work.
Example: Appending singly linked lists
! Appending one list to !"#$
another requires walking
the length of the first list
to set its null pointer to
the start of the second.
! Augmenting the list with !"#$ %#&'

a tail pointer allows


appending to operate in
constant time.
10
© 2008–2018 by the MIT 6.172 Lecturers
Precomputation
The idea of precomputation is to perform calculations
in advance so as to avoid doing them at “mission-
critical” times.
Example: Binomial coefficients
P P
ভ ম
M M P ਷ M 

Computing the “choose” function by implementing


this formula can be expensive (lots of multiplications),
and watch out for integer overflow for even modest
values of n and k.
Idea: Precompute the table of coefficients when
initializing, and perform table look-up at runtime.
11
© 2008–2018 by the MIT 6.172 Lecturers
Pascal’s Triangle
6$ 3$ 3$ 3$ 3$ 3$ 3$ 3$ 3$
6$ 6$ 3$ 3$ 3$ 3$ 3$ 3$ 3$
6$ :$ 6$ 3$ 3$ 3$ 3$ 3$ 3$
6$ ;$ ;$ 6$ 3$ 3$ 3$ 3$ 3$
P P
ভ ম 6$ <$ =$ <$ 6$ 3$ 3$ 3$ 3$
M M P ਷ M  6$ >$ 63$ 63$ >$ 6$ 3$ 3$ 3$
6$ =$ 6>$ :3$ 6>$ =$ 6$ 3$ 3$
6$ ?$ :6$ ;>$ ;>$ :6$ ?$ 6$ 3$
6$ @$ :@$ >=$ ?3$ >=$ :@$ @$ 6$

!"#$%&''()*!"#$"+$!"#$,-$.$
!/$*"$0$,-$1)#21"$34$
!/$*"$55$3-$1)#21"$64$
!/$*,$55$3-$1)#21"$64$
1)#21"$%&''()*"76+$,76-$8$%&''()*"76+$,-4$
9$
12
© 2008–2018 by the MIT 6.172 Lecturers
Precomputing Pascal
!"#$%&#>'())*+,*-.+>/00>
%&1>23445#6'())*+,*-.+76'())*+,*-.+78>

94%">%&%1,23445#:;><>
$4=>:%&1>&>?>08>&>@>'())*+,*-.+8>AA&;><>
23445#6&7607>?>/8>
23445#6&76&7>?>/8>
B>
$4=>:%&1>&>?>/8>&>@>'())*+,*-.+8>AA&;><>
23445#6076&7>?>08>
$4=>:%&1>C>?>/8>C>@>&8>AAC;><>
23445#6&76C7>?>23445#6&D/76CD/7>A>23445#6&D/76C78>
23445#6C76&7>?>08>
B>
B>
B>

Now, whenever we need a binomial coefficient (less


than 100), we can simply index the 23445#>array.
13
© 2008–2018 by the MIT 6.172 Lecturers
Compile-Time Initialization
The idea of compile-time initialization is to store the
values of constants during compilation, saving work at
execution time.

Example
!"# $%&&'()*+,)*+, - .
. */ +/ +/ +/ +/ +/ +/ +/ +/ +/ 0/
. */ */ +/ +/ +/ +/ +/ +/ +/ +/ 0/
. */ 1/ */ +/ +/ +/ +/ +/ +/ +/ 0/
. */ 2/ 2/ */ +/ +/ +/ +/ +/ +/ 0/
. */ 3/ 4/ 3/ */ +/ +/ +/ +/ +/ 0/
. */ 5/ *+/ *+/ 5/ */ +/ +/ +/ +/ 0/
. */ 4/ *5/ 1+/ *5/ 4/ */ +/ +/ +/ 0/
. */ 6/ 1*/ 25/ 25/ 1*/ 6/ */ +/ +/ 0/
. */ 7/ 17/ 54/ 6+/ 54/ 17/ 7/ */ +/ 0/
. */ 8/ 24/ 73/ *14/ *14/ 73/ 24/ 8/ */ 0/
09

14
© 2008–2018 by the MIT 6.172 Lecturers
Compile-Time Initialization (2)
Idea: Create large static tables by metaprogramming.

!"#:$%!"&!"#:%'()*:)+",#:)-%':.%'(/012:3:
!"!#4)-++,5&26:
7'!"#8&9!"#:)-++,50;<10;<1:=:3>"?26:
8+':&!"#:%:=:<6:%:@:;<6:AA%2:3:
7'!"#8&?::3?26:
8+':&!"#:B:=:<6:B:@:;<6:AAB2:3:
7'!"#8&?CDE*:?*:)-++,50%10B126:
F:
7'!"#8&?F*>"?26:
F:
7'!"#8&?F6>"?26:
F:

15
© 2008–2018 by the MIT 6.172 Lecturers
Caching
The idea of caching is to store results that have been
accessed recently so that the program need not
compute them again.

!"#!"$ %&'(#$ )*+&,$"'-$.%&'(#$ /0 %&'(#$ 12 3


4$,'4" -54,./6/ 7 16128
9
%&'(#$ :;:)$%</ = >?>8
%&'(#$ :;:)$%<1 = >?>8
%&'(#$ :;:)$%<) = >?>8

!"#!"$ %&'(#$ )*+&,$"'-$.%&'(#$ /0 %&'(#$ 12 3


About 30% faster !@ ./ == :;:)$%</ AA 1 == :;:)$%<12 3
if cache is hit 4$,'4" :;:)$%<)8
9
2/3 of the time. :;:)$%</ = /8
:;:)$%<1 = 18
:;:)$%<) = -54,./6/ 7 16128
4$,'4" :;:)$%<)8
9
16
© 2008–2018 by the MIT 6.172 Lecturers
Sparsity
The idea of exploiting sparsity is to avoid storing and
computing on zeroes. “The fastest way to compute is
not to compute at all.”
Example: Matrix-vector multiplication
⎛ 3 0 0 0 1 0 ⎞⎛ 1 ⎞
⎜ ⎟⎜ ⎟
⎜ 0 4 1 0 5 9 ⎟⎜ 4 ⎟
⎜ 0 0 0 2 0 6 ⎟⎜ 2 ⎟
y = ⎜ ⎟⎜ ⎟
5 0 0 3 0 0 8
⎜ ⎟⎜ ⎟
⎜⎜ 5 0 0 0 8 0 ⎟⎟ ⎜⎜ 5 ⎟⎟
⎝ 0 0 0 9 7 0 ⎠⎝ 7 ⎠

Dense matrix-vector multiplication performs n2 = 36


scalar multiplies, but only 14 entries are nonzero.
17
© 2008–2018 by the MIT 6.172 Lecturers
Sparsity
The idea of exploiting sparsity is to avoid storing and
computing on zeroes. “The fastest way to compute is
not to compute at all.”
Example: Matrix-vector multiplication
⎛ 3 1 ⎞⎛ 1 ⎞
⎜ ⎟⎜ ⎟
⎜ 4 1 5 9 ⎟⎜ 4 ⎟
⎜ 2 6 ⎟⎜ 2 ⎟
y = ⎜ ⎟⎜ 8 ⎟
5 3
⎜ ⎟⎜ ⎟
⎜⎜ 5 8 ⎟⎟ ⎜⎜ 5 ⎟⎟
⎝ 9 7 ⎠⎝ 7 ⎠

Dense matrix-vector multiplication performs n2 = 36


scalar multiplies, but only 14 entries are nonzero.
18
© 2008–2018 by the MIT 6.172 Lecturers
Sparsity (2)
Compressed Sparse Row (CSR)
!2 "2 #2 $2 %2 &2 '2 (2 )2 * "! "" "# "$2
+,-./ !2 #2 '2 )2 "! "" "%2

0,1./2!2 %2 "2 #2 %2 &2 $2 &2 !2 $2 !2 %2 $2 %2


341./2$2 "2 %2 "2 &2 *2 #2 '2 &2 $2 &2 )2 *2 (2

!2
! ! " " " # " $
# &
"2
# " $ # " % & &
#2 " " " ' " ( n=6
# &
$2 # % " " ! " " & nnz = 14
%2 ## " " " " % " &&
&2 " " " " ) & * %
!2 "2 #2 $2 %2 &2

Storage is O(n+nnz) instead of !"


19
© 2008–2018 by the MIT 6.172 Lecturers
Sparsity (3)
CSR matrix-vector multiplication
!"#$%$&5'!()*!5+5
,-!5-.5--/05
,-!51(23'05 4456$-7!85-5
,-!51*26'05 4456$-7!85--/5
%2)96$51:;6'05 4456$-7!85--/5
<5'#;('$=>;!(,?=!05

:2,%5'#>:@'#;('$=>;!(,?=!51A.5%2)96$51?.5%2)96$51"B5+5
&2(5@,-!5,5C5D05,5E5AFG-05,HHB5+5
"I,J5C5D05
&2(5@,-!5K5C5AFG(23'I,J05K5E5AFG(23'I,HLJ05KHHB5+5
,-!5M5C5AFG*26'IKJ05
"I,J5HC5AFG:;6'IKJ515?IMJ05
<5
<5
<5

Number of scalar multiplications = nnz,


which is potentially much less than n2.
20
© 2008–2018 by the MIT 6.172 Lecturers
Sparsity (4)
Storing a static sparse graph 0

Vertex IDs 0 1 2 3 4 3 1
Offsets 0 2 5 5 6 7

Edges 1 3 2 3 4 2 2 2 4

Can run many graph algorithms efficiently on this


representation, e.g., breadth-first search, PageRank

Can store edge weights with an additional array or


interleaved with Edges
21
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

LOGIC

22
© 2008–2018 by the MIT 6.172 Lecturers
Constant Folding and Propagation
The idea of constant folding and propagation is to
evaluate constant expressions and substitute the
result into further expressions, all during compilation.
!"#$%&'(I)*+,-.-/I

01"'I122(2345I6I
$1#7,I'1&8%(I2+'"&7I9I:;<=>>>.>?I
$1#7,I'1&8%(I'"+*(,(2I9I@IAI2+'"&7?I
$1#7,I'1&8%(I$"2$&*B(2(#$(I9ICDEFIAI'"+*(,(2?I
$1#7,I'1&8%(I$2177D+2(+I9ICDEFIAI2+'"&7IAI2+'"&7?I
$1#7,I'1&8%(I7&2B+$(D+2(+I9I$"2$&*B(2(#$(IAI'"+*(,(2?I
$1#7,I'1&8%(I01%&*(I9IGIAICDEFIAI2+'"&7IAI2+'"&7IAI2+'"&7IHI;?I
HHI...I
JI

With a sufficiently high optimization level, all the


expressions are evaluated at compile-time.

23
© 2008–2018 by the MIT 6.172 Lecturers
Common-Subexpression Elimination
The idea of common-subexpression elimination is to
avoid computing the same expression multiple times
by evaluating the expression once and storing the
result for later use.

!)")#)$)%&) !)")#)$)%&)
#)")!)' (&) #)")!)' (&)
%)")#)$)%&) %)")#)$)%&)
()")!)' (&) ()")#&)

The third line cannot be replaced by %)")!, because


the value of #)changes in the second line.

24
© 2008–2018 by the MIT 6.172 Lecturers
Algebraic Identities
The idea of exploiting algebraic identities is to replace
expensive algebraic expressions with algebraic
equivalents that require less work.
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;

+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(;
'-&,%(;69; ::;62'"&*;-5;,2%%;
>;,2%%?+9;

'-&,%(;*@&26(A'-&,%(;8B;7;
6(+&6#;8C89;
>;

,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;';G;*@6+A*@&26(A,D<08;< ,F<08B;
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=BB9;
6(+&6#;';)G;,D<06;H;,F<069;
>;
25
© 2008–2018
2008 2018 by the MIT 6.172 Lecturers
Algebraic Identities
The idea of exploiting algebraic identities is to replace
expensive algebraic expressions with algebraic
equivalents that require less work.
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;

+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(; ,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;69; ::;62'"&*;-5;,2%%; '-&,%(;'*@&26(';G;*@&26(A,D<08;< ,F<08B;
>;,2%%?+9; H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=B9;
'-&,%(;*@&26(A'-&,%(;8B;7; 6(+&6#;'*@&26(';)G;*@&26(A,D<06;H;,F<06B9;
6(+&6#;8C89; >;
>;

,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7; అ
'-&,%(;';G;*@6+A*@&26(A,D<08;< ,F<08B; W ମ X exactly when
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=BB9; W ମ X .
6(+&6#;';)G;,D<06;H;,F<069;
>;
26
© 2008–2018
2008 2018 by the MIT 6.172 Lecturers
Short-Circuiting
When performing a series of tests, the idea of short-
circuiting is to stop evaluating as soon as you know
the answer.
!"#$%&'(2)*+',--%./02
1123%%2(%(4(#+*2-523267(2#-##(86+"9(2
,--%2*&4:(;$(('*<"#+2=3>2"#+2#>2"#+2%"4"+?2@2
"#+2*&42A2BC2
5-72<"#+2"2A2BC2"2)2#C2"DD?2@2
*&42DA23E"FC2
G2 !"#$%&'(2)*+',--%./02
7(+&7#2*&4202%"4"+C2 1123%%2(%(4(#+*2-523267(2#-##(86+"9(2
G2 ,--%2*&4:(;$(('*<"#+2=3>2"#+2#>2"#+2%"4"+?2@2
"#+2*&42A2BC2
5-72<"#+2"2A2BC2"2)2#C2"DD?2@2
Note that && and || *&42DA23E"FC2
are short-circuiting "52<*&4202%"4"+?2@2
7(+&7#2+7&(C2
logical operators, G2
and !"#$%"&"#'("$)*. G2
7(+&7#256%*(C2
G2 27
© 2008–2018 by the MIT 6.172 Lecturers
Ordering Tests
Consider code that executes a sequence of logical
tests. The idea of ordering tests is to perform those
that are more often “successful” — a particular
alternative is selected by the test — before tests that
are rarely successful. Similarly, inexpensive tests
should precede expensive ones.

!"#$%&'(>)*+',--%./0>
,--%>"*12/"+(*34$(5$/46>$7>8>
"9>5$>::>;<6;>==>$>::>;<+;>==>$>::>;>;>==>$>::>;<#;7>8>
6(+&6#>+6&(?>
@>
6(+&6#>94%*(?> !"#$%&'(>)*+',--%./0>
@> ,--%>"*12/"+(*34$(5$/46>$7>8>
"9>5$>::>;>;>==>$>::>;<#;>==>$>::>;<+;>==>$>::>;<6;7>8>
6(+&6#>+6&(?>
@>
6(+&6#>94%*(?>
@>
28
© 2008–2018 by the MIT 6.172 Lecturers
Creating a Fast Path
!"#$%&'(;)*+',--%./0;
!"#$%&'(;)12+/./0;

+34('(5;*+6&$+;7;
'-&,%(;89; ::;8<$--6'"#2+(;
'-&,%(;39; ::;3<$--6'"#2+(;
'-&,%(;=9; ::;=<$--6'"#2+(;
'-&,%(;69; ::;62'"&*;-5;,2%%;
>;,2%%?+9;

'-&,%(;*@&26(A'-&,%(;8B;7;
6(+&6#;8C89;
>;

,--%;$-%%"'(*A,2%%?+;C,DE;,2%%?+;C,FB;7;
'-&,%(;'*@&26(';G;*@&26(A,D<08;< ,F<08B;
H;*@&26(A,D<03;< ,F<03B;
H;*@&26(A,D<0=;< ,F<0=B9;
6(+&6#;'*@&26(';)G;*@&26(A,D<06;H;,F<06B9;
>;

29
© 2008–2018 by the MIT 6.172 Lecturers
Creating a Fast Path
;<8=%#!&E9'7!$""%>?/E
;<8=%#!&E9@)7?>?/E

74A&!&BE'7*#=7ECE
!"#$%&E06E DDE0.=""*!<8)7&E
!"#$%&E46E DDE4.=""*!<8)7&E
!"#$%&E56E DDE5.=""*!<8)7&E
!"#$%&E*6E DDE*)!<#'E"BE$)%%E
:E$)%%F76E

!"#$%&E'(#)*&,!"#$%&E02ECE
*&7#*8E0G06E
:E

$""%E="%%<!&',$)%%F7EG$-HE$)%%F7EG$12ECE
<BE,,)$',$-./0EIE$1./02E/E,$-./*E3E$1./*22EJJE
,)$',$-./4EIE$1./42E/E,$-./*E3E$1./*22EJJE
,)$',$-./5EIE$1./52E/E,$-./*E3E$1./*222E
*&7#*8EB)%'&6E
!"#$%&E!'(#)*&!E+E'(#)*&,$-./0E. $1./02E
3E'(#)*&,$-./4E. $1./42E
3E'(#)*&,$-./5E. $1./526E
*&7#*8E!'(#)*&!E9+E'(#)*&,$-./*E3E$1./*26E
:E

30
© 2008–2018 by the MIT 6.172 Lecturers
Combining Tests
The idea of combining tests is to replace a sequence
of tests with one test or switch.

Full adder !"#$ %&''()$$ *#+, )-


#+, .-
: ;'1; 6
#% *. 77 85 6
#+, /- #% */ 77 85 6
a b c carry sum #+, 01&2- 01&2 7 <9
#+, 0/)3345 6 0/)334 7 89
0 0 0 0 0 #% *) 77 85 6 : ;'1; 6
0 0 1 0 1 #% *. 77 85 6 01&2 7 89
#% */ 77 85 6 0/)334 7 <9
0 1 0 0 1 01&2 7 89 :
0 1 1 1 0 0/)334 7 89 : ;'1; 6
: ;'1; 6 #% */ 77 85 6
1 0 0 0 1 01&2 7 <9 01&2 7 89
0/)334 7 89 0/)334 7 <9
1 0 1 1 0 : : ;'1; 6
1 1 0 1 0 : ;'1; 6 01&2 7 <9
#% */ 77 85 6 0/)334 7 <9
1 1 1 1 1 01&2 7 <9 :
0/)334 7 89 :
: ;'1; 6 :
01&2 7 89 :
0/)334 7 <9
:
:
31
© 2008–2018 by the MIT 6.172 Lecturers
Combining Tests (2)
The idea of combining tests is to replace a sequence
of tests with one test or switch.

Full adder !"#$ %&''()$$ *#+, )-


#+, .-
/)17 CA
01&2 8 @=
#+, /- 0/)334 8 9=
a b c carry sum #+, 01&2- .37)B=
0 0 0 0 0 #+, 0/)3345 6 /)17 DA
#+, ,71, 8 **) 88 95 :: ;5 01&2 8 9=
0 0 1 0 1
< **. 88 95 :: 95 0/)334 8 @=
0 1 0 0 1 < */ 88 95= .37)B=
0 1 1 1 0 1>#,/?*,71,5 6 /)17 EA
/)17 @A 01&2 8 @=
1 0 0 0 1 01&2 8 @= 0/)334 8 9=
1 0 1 1 0 0/)334 8 @= .37)B=
.37)B= /)17 FA
1 1 0 1 0 /)17 9A 01&2 8 @=
1 1 1 1 1 01&2 8 9= 0/)334 8 9=
0/)334 8 @= .37)B=
.37)B= /)17 GA
For this example, /)17 ;A 01&2 8 9=
table look-up is 01&2 8 9= 0/)334 8 9=
0/)334 8 @= .37)B=
even better! .37)B= H
32
H
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

LOOPS

33
© 2008–2018 by the MIT 6.172 Lecturers
Hoisting
The goal of hoisting — also called loop-invariant code
motion — is to avoid recomputing loop-invariant code
each time through the body of a loop.

!"#$%&'( )*+,-.-/

01"' 2$+%(3'1&4%( 567 '1&4%( 587 "#, 9: ;


<1= 3"#, " > ?@ " ) 9@ "AA: ;
8B"C > 6B"C 5 (DE32F=,3GHIJKL::@
M
M !"#$%&'( )*+,-.-/

01"' 2$+%(3'1&4%( 567 '1&4%( 587 "#, 9: ;


'1&4%( <+$,1= > (DE32F=,3GHIJKL::@
<1= 3"#, " > ?@ " ) 9@ "AA: ;
8B"C > 6B"C 5 <+$,1=@
M
M

34
© 2008–2018 by the MIT 6.172 Lecturers
Sentinels
Sentinels are special dummy values placed in a data
structure to simplify the logic of boundary conditions,
and in particular, the handling of loop-exit tests.
!"#$%&'(@)*+'"#+,-.@ !"#$%&'(@)*+'"#+,-.@
!"#$%&'(@)*+'/00%,-.@ !"#$%&'(@)*+'/00%,-.@
/00%@01(23%045"#+678+@9:;@*"<(8+@
*"<(8+@#=@>@
??@:**&A(*@+-B+@:H#I@B#'@:H#GKI@(L"*+@B#'@
??@:%%@(%(A(#+*@03@:@B2(@#0##(CB+"1(@
??@$B#@/(@$%0//(2('@
"#+678+@*&A@D@EF@ /00%@01(23%045"#+678+@9:;@*"<(8+@#=@>@
GG"@=@>@
302@5@*"<(8+@"@D@EF@"@)@#F@GG ??@:%%@(%(A(#+*@03@:@B2(@#0##(CB+"1(@
*&A@GD@:H"IF@ :H#I@D@MNO678P:QF@
+2&(F@
"3@5@*&A@)@:H"I@=@2(+&2#@+2&( :H#GKI@D@KF@ ??@02@B#R@S0*"+"1(@#&A/(2@
J@ *"<(8+@"@D@EF@
2(+&2#@3B%*(F@ "#+678+@*&A@D@:HEIF@
J@ 4-"%(@5@*&[email protected]@:H"I@=@>@
*&A@GD@:HGG"IF@
J@
"3@5"@)@#=@2(+&2#@+2&(F@
2(+&2#@3B%*(F@
J@
35
© 2008–2018 by the MIT 6.172 Lecturers
Loop Unrolling
Loop unrolling attempts to save work by combining
several consecutive iterations of a loop into a single
iteration, thereby reducing the total number of
iterations of the loop and, consequently, the number
of times that the instructions that control the loop
must be executed.

● Full loop unrolling: All iterations are unrolled.

● Partial loop unrolling: Several, but not all, of the


iterations are unrolled.

36
© 2008–2018 by the MIT 6.172 Lecturers
Full Loop Unrolling
!"# $%& ' () !"# $%& ' ()
*+, -!"# ! ' () ! . /() !001 2 $%& 0' 34(5)
$%& 0' 34!5) $%& 0' 34/5)
6 $%& 0' 3475)
$%& 0' 3485)
$%& 0' 3495)
$%& 0' 34:5)
$%& 0' 34;5)
$%& 0' 34<5)
$%& 0' 34=5)
$%& 0' 34>5)

37
© 2008–2018 by the MIT 6.172 Lecturers
Partial Loop Unrolling
!"# $%& ' () !"# $%& ' ()
*+, -!"# ! ' () ! . ") //!0 1 !"# 6)
$%& /' 23!4) *+, -6 ' () 6 . "78) 6 /' 90 1
5 $%& /' 2364)
$%& /' 236/:4)
$%& /' 236/;4)
$%& /' 236/84)
5
*+, -!"# ! ' 6) ! . ") //!0 1
$%& /' 23!4)
5

Benefits of loop unrolling


! Lower number of instructions in loop control code
! Enables more compiler optimizations

Unrolling too much can cause poor use of instruction


cache
38
© 2008–2018 by the MIT 6.172 Lecturers
Loop Fusion
The idea of loop fusion — also called jamming — is to
combine multiple loops over the same index range
into a single loop body, thereby saving the overhead
of loop control.

!"# $%&' % ( )* % + &* ,,%- .


/0%1 ( $20%1 +( 30%1- 4 20%1 5 30%1*
6

!"# $%&' % ( )* % + &* ,,%- .


70%1 ( $20%1 +( 30%1- 4 30%1 5 20%1*
6

!"# $%&' % ( )* % + &* ,,%- .


/0%1 ( $20%1 +( 30%1- 4 20%1 5 30%1*
70%1 ( $20%1 +( 30%1- 4 30%1 5 20%1*
6

39
© 2008–2018 by the MIT 6.172 Lecturers
Eliminating Wasted Iterations
The idea of eliminating wasted iterations is to modify
loop bounds to avoid executing loop iterations over
essentially empty loop bodies.

!"# $%&' % ( )* % + &* ,,%- . !"# $%&' % ( 8* % + &* ,,%- .


!"# $%&' / ( )* / + &* ,,/- . !"# $%&' / ( )* / + %* ,,/- .
%! $% 0 /- . %&' '123 ( 45%65/6*
%&' '123 ( 45%65/6* 45%65/6 ( 45/65%6*
45%65/6 ( 45/65%6* 45/65%6 ( '123*
45/65%6 ( '123* 7
7 7
7
7

40
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

FUNCTIONS

41
© 2008–2018 by the MIT 6.172 Lecturers
Inlining
The idea of inlining is to avoid the overhead of a
function call by replacing a call to the function with
the body of the function itself.
!"#$%& '(#)*&+!"#$%& ,- .
*&/#*0 ,1,2
3

!"#$%& '#45"65'(#)*&'+!"#$%& 178 90/ 0- .


!"#$%& '#4 : ;<;2
6"* +90/ 9 : ;2 9 = 02 >>9- .
'#4 >: '(#)*&+7?9@-2
3 !"#$%& '#45"65'(#)*&'+!"#$%& 178 90/ 0- .
*&/#*0 '#42 !"#$%& '#4 : ;<;2
3 6"* +90/ 9 : ;2 9 = 02 >>9- .
!"#$%& /&4A : 7?9@2
'#4 >: /&4A1/&4A2
3
*&/#*0 '#42
3

42
© 2008–2018 by the MIT 6.172 Lecturers
Inlining (2)
The idea of inlining is to avoid the overhead of a
function call by replacing a call to the function with
the body of the function itself.
!"#$%&B'(#)*&+!"#$%&B,-B.B
*&/#*0B,1,2B
3B

!"#$%&B'#45"65'(#)*&'+!"#$%&B178B90/B0-B.B
!"#$%&B'#4B:B;<;2B
6"*B+90/B9B:B;2B9B=B02B>>9-B.B
'#4B>:B'(#)*&+7?9@-2B
'/)/9AB90%90&B!"#$%&B'(#)*&+!"#$%&B,-B.B
3B
*&/#*0B'#42B *&/#*0B,1,2B
3B 3B

!"#$%&B'#45"65'(#)*&'+!"#$%&B178B90/B0-B.B
!"#$%&B'#4B:B;<;2B
Inlined functions can 6"*B+90/B9B:B;2B9B=B02B>>9-B.B
be just as efficient as 3B
'#4B>:B'(#)*&+7?9@-2B

macros, and they are *&/#*0B'#42B


3B
better structured.
43
© 2008–2018 by the MIT 6.172 Lecturers
Tail-Recursion Elimination
The idea of tail-recursion elimination is to replace a
recursive call that occurs as the last step of a function
with a branch, saving function-call overhead.

!"#$ %&#'()"*+,#-+ ./0 #-+ -1 2


#3 ,- 4 51 2
#-+ * 6 78*+#+#"-,/0 -19
%&#'()"*+ ,/0 *19
%&#'()"*+ ,/ : * : 50 - ; * ; 519
<
<
!"#$ %&#'()"*+,#-+ ./0 #-+ -1 2
=>#?@ ,- 4 51 2
#-+ * 6 78*+#+#"-,/0 -19
%&#'()"*+ ,/0 *19
/ :6 * : 59
- ;6 * : 59
<
<

44
© 2008–2018 by the MIT 6.172 Lecturers
Coarsening Recursion
The idea of coarsening recursion is to increase the
size of the base case and handle it with more efficient
code that avoids function-call overhead.
!"#$L%&#'()"*+,#-+L./0L#-+L-1L2L @$6A#-6LBCDEFCGHIL8JL
34#56L,-L7L81L2L !"#$L%&#'()"*+,#-+L./0L#-+L-1L2L
#-+L*L9L:;*+#+#"-,/0L-1<L 34#56L,-L7LBCDEFCGHI1L2L
%&#'()"*+L,/0L*1<L #-+L*L9L:;*+#+#"-,/0L-1<L
/L=9L*L=L8<L %&#'()"*+L,/0L*1<L
-L>9L*L=L8<L /L=9L*L=L8<L
?L -L>9L*L=L8<L
?L ?L
KKL#-)6*+#"-L)"*+LA"*L)M;55L;**;N)L
A"*L,#-+LOL9L8<LOLPL-<L==O1L2L
#-+L(6NL9L/QOR<L
#-+L#L9LOL> 8<L
34#56L,#L79LJLSSL/Q#RL7L(6N1L2L
/Q#=8RL9L/Q#R<L
>>#<L
?L
/Q#=8RL9L(6N<L
?L
?L
45
© 2008–2018 by the MIT 6.172 Lecturers
!
!"##$*
%&'&(

"#)*+)$#)*+,*-./01*

SUMMARY

46
© 2008–2018 by the MIT 6.172 Lecturers
New Bentley Rules
Data structures Logic
● Packing and encoding ● Constant folding and
● Augmentation propagation
● Precomputation ● Common-subexpression
● Compile-time initialization elimination
● Caching ● Algebraic identities
● Lazy evaluation ● Short-circuiting
● Sparsity ● Ordering tests
● Creating a fast path
Loops
● Combining tests
● Hoisting
● Sentinels Functions
● Loop unrolling ● Inlining
● Loop fusion ● Tail-recursion elimination
● Eliminating wasted iterations ● Coarsening recursion

47
© 2008–2018 by the MIT 6.172 Lecturers
Closing Advice
● Avoid premature optimization. First get correct
working code. Then optimize, preserving
correctness by regression testing.
● Reducing the work of a program does not
necessarily decrease its running time, but it is a
good heuristic.
● The compiler automates many low-level
optimizations.
● To tell if the compiler is actually performing a
particular optimization, look at the assembly code.

If you find interesting examples of work


optimization, please let us know!

48
© 2008–2018 by the MIT 6.172 Lecturers
MIT OpenCourseWare
https://ocw.mit.edu

6.172 Performance Engineering of Software Systems


Fall 2018

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

49

You might also like