Instruction selection: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 20:14, 3 December 2023

In computer science, instruction selection is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and register allocation; hence its output IR has an infinite set of pseudo-registers (often known as temporaries) and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target machine code, bytecode, or assembly language.

For example, for the following sequence of middle-level IR code

t1 = a
t2 = b
t3 = t1 + t2
a = t3
b = t1

a good instruction sequence for the x86 architecture is

MOV EAX, a
XCHG EAX, b
ADD a, EAX

For a comprehensive survey on instruction selection, see. ^[1] ^[2]

Macro expansion

The simplest approach to instruction selection is known as macro expansion^[3] or interpretative code generation.^[4]^[5]^[6] A macro-expanding instruction selector operates by matching templates over the middle-level IR. Upon a match the corresponding macro is executed, using the matched portion of the IR as input, which emits the appropriate target instructions. Macro expansion can be done either directly on the textual representation of the middle-level IR,^[7]^[8] or the IR can first be transformed into a graphical representation which is then traversed depth-first.^[9] In the latter, a template matches one or more adjacent nodes in the graph.

Unless the target machine is very simple, macro expansion in isolation typically generates inefficient code. To mitigate this limitation, compilers that apply this approach typically combine it with peephole optimization to replace combinations of simple instructions with more complex equivalents that increase performance and reduce code size. This is known as the Davidson-Fraser approach and is currently applied in GCC.^[10]

Graph covering

Another approach is to first transform the middle-level IR into a graph and then cover the graph using patterns. A pattern is a template that matches a portion of the graph and can be implemented with a single instruction provided by the target machine. The goal is to cover the graph such that the total cost of the selected patterns is minimized, where the cost typically represents the number of cycles it takes to execute the instruction. For tree-shaped graphs, the least-cost cover can be found in linear time using dynamic programming,^[11] but for DAGs and full-fledged graphs the problem becomes NP-complete and thus is most often solved using either greedy algorithms or methods from combinatorial optimization.^[12] ^[13] ^[14]

References

^ Blindell, Gabriel S. Hjort (2013). Survey on Instruction Selection: An Extensive and Modern Literature Review (Report). arXiv:1306.4898. ISBN 978-91-7501-898-0.
^ Blindell, Gabriel S. Hjort (2016). Instruction Selection: Principles, Methods, & Applications. Springer. doi:10.1007/978-3-319-34019-7. ISBN 978-3-319-34017-3. S2CID 13390131.
^ Brown, P. (1969). "A Survey of Macro Processors". Annual Review in Automatic Programming. 6 (2): 37–88. doi:10.1016/0066-4138(69)90001-9. ISSN 0066-4138.
^ Cattell, R. G. G. (1979). "A Survey and Critique of Some Models of Code Generation" (PDF). School of Computer Science, Carnegie Mellon University (Technical report). Archived (PDF) from the original on May 23, 2019.
^ Ganapathi, M.; Fischer, C. N.; Hennessy, J. L. (1982). "Retargetable Compiler Code Generation". Computing Surveys. 14 (4): 573–592. doi:10.1145/356893.356897. ISSN 0360-0300. S2CID 2361347.
^ Lunell, H. (1983). Code Generator Writing Systems (Doctoral thesis). Linköping, Sweden: Linköping University.
^ Ammann, U.; Nori, K. V.; Jensen, K.; Nägeli, H. (1974). "The PASCAL (P) Compiler Implementation Notes". Instituts für Informatik (Technical report).
^ Orgass, R. J.; Waite, W. M. (1969). "A Base for a Mobile Programming System". Communications of the ACM. 12 (9): 507–510. doi:10.1145/363219.363226. S2CID 8164996.
^ Wilcox, T. R. (1971). Generating Machine Code for High-Level Programming Languages (Doctoral thesis). Ithaca, New York, USA: Cornell University.
^ Davidson, J. W.; Fraser, C. W. (1984). "Code Selection Through Object Code Optimization". ACM Transactions on Programming Languages and Systems. 6 (4): 505–526. CiteSeerX 10.1.1.76.3796. doi:10.1145/1780.1783. ISSN 0164-0925. S2CID 10315537.
^ Aho, A. V.; Ganapathi, M.; Tjiang, S. W. K. (1989). "Code Generation Using Tree Matching and Dynamic Programming". ACM Transactions on Programming Languages and Systems. 11 (4): 491–516. CiteSeerX 10.1.1.456.9102. doi:10.1145/69558.75700. S2CID 1165995.
^ Wilson, T.; Grewal, G.; Halley, B.; Banerji, D. (1994). "An integrated approach to retargetable code generation". Proceedings of 7th International Symposium on High-Level Synthesis. pp. 70–75. CiteSeerX 10.1.1.521.8288. doi:10.1109/ISHLS.1994.302339. ISBN 978-0-8186-5785-6. S2CID 14384424.
^ Bashford, Steven; Leupers, Rainer (1999). "Constraint driven code selection for fixed-point DSPS". Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99. pp. 817–822. CiteSeerX 10.1.1.331.390. doi:10.1145/309847.310076. ISBN 978-1581331097. S2CID 5513238.
^ Floch, A.; Wolinski, C.; Kuchcinski, K. (2010). "Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric". Proceedings of the 21st International Conference on Application-Specific Architectures and Processors (ASAP'10): 167–174.

External links

Alternative ways of supporting different generations of computer^{[permanent dead link]}

[hjort-blindell-report-1] Blindell, Gabriel S. Hjort (2013). Survey on Instruction Selection: An Extensive and Modern Literature Review (Report). arXiv:1306.4898. ISBN 978-91-7501-898-0.

[hjort-blindell-book-2] Blindell, Gabriel S. Hjort (2016). Instruction Selection: Principles, Methods, & Applications. Springer. doi:10.1007/978-3-319-34019-7. ISBN 978-3-319-34017-3. S2CID 13390131.

[3] Brown, P. (1969). "A Survey of Macro Processors". Annual Review in Automatic Programming. 6 (2): 37–88. doi:10.1016/0066-4138(69)90001-9. ISSN 0066-4138.

[4] Cattell, R. G. G. (1979). "A Survey and Critique of Some Models of Code Generation" (PDF). School of Computer Science, Carnegie Mellon University (Technical report). Archived (PDF) from the original on May 23, 2019.

[5] Ganapathi, M.; Fischer, C. N.; Hennessy, J. L. (1982). "Retargetable Compiler Code Generation". Computing Surveys. 14 (4): 573–592. doi:10.1145/356893.356897. ISSN 0360-0300. S2CID 2361347.

[6] Lunell, H. (1983). Code Generator Writing Systems (Doctoral thesis). Linköping, Sweden: Linköping University.

[7] Ammann, U.; Nori, K. V.; Jensen, K.; Nägeli, H. (1974). "The PASCAL (P) Compiler Implementation Notes". Instituts für Informatik (Technical report).

[8] Orgass, R. J.; Waite, W. M. (1969). "A Base for a Mobile Programming System". Communications of the ACM. 12 (9): 507–510. doi:10.1145/363219.363226. S2CID 8164996.

[9] Wilcox, T. R. (1971). Generating Machine Code for High-Level Programming Languages (Doctoral thesis). Ithaca, New York, USA: Cornell University.

[10] Davidson, J. W.; Fraser, C. W. (1984). "Code Selection Through Object Code Optimization". ACM Transactions on Programming Languages and Systems. 6 (4): 505–526. CiteSeerX 10.1.1.76.3796. doi:10.1145/1780.1783. ISSN 0164-0925. S2CID 10315537.

[11] Aho, A. V.; Ganapathi, M.; Tjiang, S. W. K. (1989). "Code Generation Using Tree Matching and Dynamic Programming". ACM Transactions on Programming Languages and Systems. 11 (4): 491–516. CiteSeerX 10.1.1.456.9102. doi:10.1145/69558.75700. S2CID 1165995.

[12] Wilson, T.; Grewal, G.; Halley, B.; Banerji, D. (1994). "An integrated approach to retargetable code generation". Proceedings of 7th International Symposium on High-Level Synthesis. pp. 70–75. CiteSeerX 10.1.1.521.8288. doi:10.1109/ISHLS.1994.302339. ISBN 978-0-8186-5785-6. S2CID 14384424.

[13] Bashford, Steven; Leupers, Rainer (1999). "Constraint driven code selection for fixed-point DSPS". Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99. pp. 817–822. CiteSeerX 10.1.1.331.390. doi:10.1145/309847.310076. ISBN 978-1581331097. S2CID 5513238.

[14] Floch, A.; Wolinski, C.; Kuchcinski, K. (2010). "Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric". Proceedings of the 21st International Conference on Application-Specific Architectures and Processors (ASAP'10): 167–174.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

@@ Line 1: / Line 1: @@
+__NOTOC__
-{{Refimprove|date=October 2013}}
-In [[computer science]], '''instruction selection''' is the stage of a [[compiler]] backend that transforms its tree-based middle-level [[intermediate representation]] (IR) into a low-level IR very close to its final target language. In a typical compiler, it precedes both [[instruction scheduling]] and [[register allocation]], so its output IR has an infinite set of pseudoregisters and may still be subject to [[peephole optimization]]; otherwise, it closely resembles the target [[machine code]], [[bytecode]], or [[assembly language]]. It works by "covering" the intermediate representation with as few ''tiles'' as possible. A tile is a template that matches a portion of the IR tree and can be implemented with a single target instruction. For trees the pattern selection problem can be solved optimally in linear time, but for DAGs and full-fledged graphs the problem becomes NP-complete and is thus commonly addressed using heuristics or methods from combinatorial optimization.<ref name=hjort-blindell-survey>{{cite book|last=Hjort Blindell|first=Gabriel|title=Survey on Instruction Selection: An Extensive and Modern Literature Review|year=2013|publisher=KTH Royal Institute of Technology|location=Stockholm, Sweden|isbn=978-91-7501-898-0|url=http://www.diva-portal.org/smash/record.jsf?pid=diva2:653943}}</ref>
+In [[computer science]], ''instruction selection'' is the stage of a [[compiler]] backend that transforms its middle-level [[intermediate representation]] (IR) into a low-level IR. In a typical compiler, instruction selection precedes both [[instruction scheduling]] and [[register allocation]]; hence its output IR has an infinite set of pseudo-registers (often known as ''temporaries'') and may still be – and typically is – subject to [[peephole optimization]]. Otherwise, it closely resembles the target [[machine code]], [[bytecode]], or [[assembly language]].
-== Approach ==
-A basic approach in instruction selection is to use some templates for translation of each instruction in an intermediate representation. But naïve use of templates leads to inefficient code in general. Additional attention needs to be paid to avoid duplicated memory access by reordering and merging instructions and promoting the usage of registers.
-For example, see the following sequence of intermediate instructions:
+For example, for the following sequence of middle-level IR code
 <pre>
 t1 = a
@@ Line 16: / Line 11: @@
 </pre>
-A good tiling for the x86 architecture is a succinct set of instructions:
+a good instruction sequence for the [[X86|x86 architecture]] is
-<source lang="asm">
+<syntaxhighlight lang="asm">
 MOV EAX, a
 XCHG EAX, b
 ADD a, EAX
+</syntaxhighlight>
-</source>
+For a comprehensive survey on instruction selection, see.
-Typically, instruction selection is implemented with a backwards [[dynamic programming]] algorithm which computes the "optimal" tiling for each point starting from the end of the program and based from there.  Instruction selection can also be implemented with a [[greedy algorithm]] that chooses a local optimum at each step.
+<ref name = "hjort-blindell-report">
+{{cite report
+ | last      = Blindell
+ | first     = Gabriel S. Hjort
+ | title     = Survey on Instruction Selection: An Extensive and Modern Literature Review
+ | year      = 2013
+ | arxiv     = 1306.4898
+ | isbn      = 978-91-7501-898-0
+}}</ref>
+<ref name = "hjort-blindell-book">
+{{cite book
+ | last      = Blindell
+ | first     = Gabriel S. Hjort
+ | title     = Instruction Selection: Principles, Methods, & Applications
+ | url       = https://www.springer.com/us/book/9783319340173
+ | publisher = Springer
+ | doi       = 10.1007/978-3-319-34019-7
+ | year      = 2016
+ | isbn      = 978-3-319-34017-3
+ | s2cid     = 13390131
+}}</ref>
+== Macro expansion ==
-The code that performs instruction selection is usually automatically generated from a list of valid patterns.  Various generator programs differ in the amount of analysis that they perform while they run, rather during the compiler's instruction selection phase.
+The simplest approach to instruction selection is known as ''macro expansion''<ref>{{Cite journal|last=Brown|first=P.|year=1969|title=A Survey of Macro Processors|journal=Annual Review in Automatic Programming|volume=6|issue=2|pages=37–88|issn=0066-4138|doi=10.1016/0066-4138(69)90001-9}}</ref> or ''interpretative code generation''.<ref>{{Cite journal|last=Cattell|first=R. G. G.|year=1979|title=A Survey and Critique of Some Models of Code Generation|url=https://apps.dtic.mil/dtic/tr/fulltext/u2/a056027.pdf|archive-url=https://web.archive.org/web/20190523223442/https://apps.dtic.mil/dtic/tr/fulltext/u2/a056027.pdf|url-status=live|archive-date=May 23, 2019|journal=School of Computer Science, Carnegie Mellon University|type=Technical report}}</ref><ref>{{Cite journal|last1=Ganapathi|first1=M.|last2=Fischer|first2=C. N.|last3=Hennessy|first3=J. L.|year=1982|title=Retargetable Compiler Code Generation|journal=Computing Surveys|volume=14|issue=4|pages=573–592|issn=0360-0300|doi=10.1145/356893.356897|s2cid=2361347}}</ref><ref>{{Cite book|title=Code Generator Writing Systems|last=Lunell|first=H.|publisher=Linköping University|year=1983|location=Linköping, Sweden|type=Doctoral thesis}}</ref> A macro-expanding instruction selector operates by matching ''templates'' over the middle-level IR. Upon a match the corresponding ''macro'' is executed, using the matched portion of the IR as input, which emits the appropriate target instructions. Macro expansion can be done either directly on the textual representation of the middle-level IR,<ref>{{Cite journal|last1=Ammann|first1=U.|last2=Nori|first2=K. V.|last3=Jensen|first3=K.|last4=Nägeli|first4=H.|year=1974|title=The PASCAL (P) Compiler Implementation Notes|journal=Instituts für Informatik|type=Technical report}}</ref><ref>{{Cite journal|last1=Orgass|first1=R. J.|last2=Waite|first2=W. M.|year=1969|title=A Base for a Mobile Programming System|journal=Communications of the ACM|volume=12|issue=9|pages=507–510|doi=10.1145/363219.363226|s2cid=8164996|doi-access=free}}</ref> or the IR can first be transformed into a graphical representation which is then traversed depth-first.<ref>{{Cite book|title=Generating Machine Code for High-Level Programming Languages|last=Wilcox|first=T. R.|publisher=Cornell University|year=1971|location=Ithaca, New York, USA|type=Doctoral thesis}}</ref> In the latter, a template matches one or more adjacent nodes in the graph.
-<!-- unclear:
-There are other uses for instruction selection, such as [[strength reduction]] (a tiling with tiles that cover multiplication of powers of two) and [[algebraic analysis]] (tiles that recognize fixed patterns of algebra).
--->
+Unless the target machine is very simple, macro expansion in isolation typically generates inefficient code. To mitigate this limitation, compilers that apply this approach typically combine it with [[peephole optimization]] to replace combinations of simple instructions with more complex equivalents that increase performance and reduce code size. This is known as the ''Davidson-Fraser approach'' and is currently applied in [[GNU Compiler Collection|GCC]].<ref>{{Cite journal|last1=Davidson|first1=J. W.|last2=Fraser|first2=C. W.|year=1984|title=Code Selection Through Object Code Optimization|journal=ACM Transactions on Programming Languages and Systems|volume=6|issue=4|pages=505–526|issn=0164-0925|doi=10.1145/1780.1783|citeseerx=10.1.1.76.3796|s2cid=10315537}}</ref>
-== Lowest common denominator strategy==
-{{Unreferenced section|date=March 2009}}
+== Graph covering ==
-The ''[[lowest common denominator]] strategy'' is an instruction selection technique used on platforms where processor-supplementary instructions exist to make executable programs portable across a wide range of computers. Under a lowest common denominator strategy, the default behaviour of the [[compiler]] is to build for the lowest common architecture.  Use of any available processor extension is switched off by default, unless explicitly switched on by command line switches.
+Another approach is to first transform the middle-level IR into a [[Graph (discrete mathematics)|graph]] and then [[Covering graph|cover the graph]] using ''patterns''. A pattern is a template that matches a portion of the graph and can be implemented with a single instruction provided by the target machine. The goal is to cover the graph such that the total cost of the selected patterns is minimized, where the cost typically represents the number of cycles it takes to execute the instruction. For tree-shaped graphs, the least-cost cover can be found in linear time using [[dynamic programming]],<ref>{{Cite journal|last1=Aho|first1=A. V.|last2=Ganapathi|first2=M.|last3=Tjiang|first3=S. W. K.|year=1989|title=Code Generation Using Tree Matching and Dynamic Programming|journal=ACM Transactions on Programming Languages and Systems|volume=11|issue=4|pages=491–516|doi=10.1145/69558.75700|citeseerx=10.1.1.456.9102|s2cid=1165995}}</ref> but for [[Directed acyclic graph|DAG]]s and full-fledged graphs the problem becomes NP-complete and thus is most often solved using either [[greedy algorithm]]s or methods from combinatorial optimization.<ref>{{Cite book|last1=Wilson|first1=T.|last2=Grewal|first2=G.|last3=Halley|first3=B.|last4=Banerji|first4=D.|title=Proceedings of 7th International Symposium on High-Level Synthesis |chapter=An integrated approach to retargetable code generation |year=1994|pages=70–75|doi=10.1109/ISHLS.1994.302339|citeseerx=10.1.1.521.8288|isbn=978-0-8186-5785-6|s2cid=14384424}}</ref>
-The use of a lowest common denominator strategy means that processor-supplementary instructions and [[Processor supplementary capability|capabilities]] are not used by default.
+<ref>{{Cite book |doi=10.1145/309847.310076|citeseerx=10.1.1.331.390|isbn=978-1581331097|chapter=Constraint driven code selection for fixed-point DSPS|title=Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99|pages=817–822|year=1999|last1=Bashford|first1=Steven|last2=Leupers|first2=Rainer|s2cid=5513238}}</ref>
+<ref>{{Cite journal|last1=Floch|first1=A.|last2=Wolinski|first2=C.|last3=Kuchcinski|first3=K.|year=2010|title=Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric|journal=Proceedings of the 21st International Conference on Application-Specific Architectures and Processors (ASAP'10)|pages=167–174}}</ref>
 ==References==
@@ Line 42: / Line 57: @@
 ==External links==
-* [http://markhobley.yi.org/programming/generations.html Alternative ways of supporting different generations of computer]
+* [http://markhobley.yi.org/programming/generations.html Alternative ways of supporting different generations of computer]{{Dead link|date=January 2020 |bot=InternetArchiveBot |fix-attempted=yes }}
 {{Compiler optimizations}}
 [[Category:Compiler optimizations]]
-{{Compu-stub}}

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis