Practical Aggregation of Semantical Program Properties
Practical Aggregation of Semantical Program Properties
optimizations
Extracting program Building associations
Extracting program
features between program
features
Training
Figure 1: Typical machine learning scenario to predict “good” optimizations for programs. During a training phase (from left to
right) a predictive model is built to correlate complex dependencies between program structure and candidate optimizations. In the
prediction phase (from right to left), features of a new program are passed to the learned model and used to predict combinations of
optimizations.
great potential but require large number of compilations and exe- • loop hierarchy;
cutions as training examples. Moreover, although program features
are one of the key components of any machine learning approach, • control dependence graph;
little attention has been devoted so far to ways of extracting them
• dominator tree;
from program semantics.
• data dependence graph;
2. FEATURE EXTRACTION
• liveness information;
We may consider a program as being characterized by a number
of entities. Some of these entities are a direct mapping of similar • availability information;
entities defined by the specific programming language, while others
are generated during compilation. These entities include: • anticipatibility information;
2.3 Automatic Inference of New Relations 2.5 Extracting Features from Relations
A machine learning tool requires a quantitative measurement of
Given a set of basic relations (such as those listed in Section 2.1),
the program, provided by a vector of numerical features. In this
further useful relations can be inferred, including very complex
section we present several techniques for deriving numerical fea-
ones. For example, Whaley and Lam [33] were able to perform
tures from a relational representation of the program.
interprocedural context sensitive alias analysis using Datalog in-
We consider first the case of entities having numerical values.
ference. Although, as a general rule it is impractical to infer very
These values may need to be aggregated into their sum, average,
complex relations automatically, it is still useful to infer new rela-
variance, max, min, etc., and in this way produce numerical fea-
tions easily with Datalog, albeit of limited complexity.
tures for the relation. For example, given relation
The main operation we use for relation inference is the joining
of two relations: given two relations r ⊆ E1 × · · · × Ek and p ⊆
F1 × · · · × Fl such that some of the Es are identical to some of the count = {(b, n) | b is a basic block
Fs, we select a nonempty subset I of pairs of identical entities and whose estimated number of executions is n},
essentially concatenate the two relations with the common entities
(in I) appearing only once. The simplest way to explain this is 1 We focus on C because our work is implemented in the context of
through a Datalog example. Suppose the two relations are r ⊆ E1 × GCC, which is written in C.
we may want to compute numerical features such as the maximal merical features are provided by the number of occurrences of such
number of estimated executions of a basic block, or the average patterns in the graph.
number of estimated executions of a basic block. For instance, the control flow graph (CFG) may be considered
We focus now on the case of entities having categorical values as a relation over B × B, where B is the set of basic blocks. New
(i.e., symbols). Most of the entities important for the compilation relations over B × B may be induced from this relation by taking
process belong to this class. Typically, numerical features describ- into account the way in which two basic blocks are connected. For
ing relations over such entities provide information on basic struc- example, we may consider blocks connected via an if-then or an
tural aspects of the relation such as the number of tuples in the if-then-else pattern in CFG. The following Datalog rules provide
relation, the maximum out-degree of nodes in a tree relation, etc. possible definitions for these two relations. (In this example the
We show how to extract several typical types of numerical features relation bb_edge specifies whether two basic blocks are connected
by applying the the standard selection and projection operations, by an edge in the CFG.)
together with the num operator, defined as returning the number of
tuples in a relation. bb_ifthen(B1,B3) :-
First we note that applying num to a relation already provides a bb_edge(B1,B3), bb_edge(B1,B2), bb_edge(B2,B3).
numerical feature which is often of interest. This is particularly
so in the case of unary relations (e.g., number of basic blocks) bb_ifthen_else(B1,B4) :-
but may also be the case for higher arity relations (e.g., number bb_edge(B1,B2), bb_edge(B1,B3),
of edges in the control flow graph). Also, applying num to the bb_edge(B2,B4), bb_edge(B3,B4).
projection of relation r on dimension i—yielding the unary rela-
tion ri = {e | ∃t ∈ r such that t has e at position i}—often provides
These new relations may in turn induce new relations over basic
an interesting numerical feature. For example, consider the relation
blocks connected via nested if-then or if-then-else patterns.
st_in_block = {(i, b) | i is a store instruction in basic block b}. The following Datalog rule provides a possible definitions for a
relation having as elements pairs of basic blocks connected via a
Then num(st_in_block1 ) is the number of stores in all basic blocks, direct edge and a nested if-then pattern (an if-then pattern in
while num(st_in_block2 ) is the number of basic blocks containing which the then alternative is itself an if-then pattern).
store instructions.
We consider now the case of a binary relation r ⊆ E1 × E2 . For bb_ifthen_n(B1,B4) :-
every element e ∈ Ei , 1 ≤ i ≤ 2, we consider the selection induced bb_edge(B1,B4), bb_edge(B1,B2),
by this element, i.e., the relation ri (e) defined as the set of pairs bb_ifthen(B2,B3), bb_edge(B3,B4).
in r that contain e at position i. By associating with e the value of
num(ri (e)) we define a new relation in Ei × N. For this relation, nu-
merical features can be derived by aggregating the numerical values In a similar way we may derive relations describing patterns in
in the second position. any graph structure computed during the compilation. These pat-
For example, consider again the relation st_in_block. For a given terns can be described easily by Datalog rules. The semantics of
basic block b, the value num(st_in_block2 (b)) is the number of the graph structure being analyzed provide guidance in selecting
store instructions in basic block b. Thus the relation consisting the patterns to consider. Additional knowledge about the code may
of all pairs (b, num(st_in_block2 (b))) associates each block with help further trim the pattern space. For instance, knowing that for
the number of store instructions it contains. By aggregating these C programs without switch statements every node has at most two
counts we may obtain numerical features such as the average num- successors in the CFG could limit the number of possible patterns
ber of stores in a basic block. we look for.
For the general case of a k-arity relation r where k ≥ 2, we may Other patterns in graphs such as cycles may be considered as
derive a number of binary relations by considering the projection well. For the CFG, the loop structure may be extracted either from
of r on any two dimensions i, j, i 6= j. For each such binary rela- relevant data structures of the compiler if available, or by com-
tion we derive new features by the above technique. Furthermore, puting simple patterns directly from the CFG, such as single basic
for a relation r ⊆ E1 × ...Ek we can also consider any two disjunct block loops or innermost loops with a simple structure (e.g., con-
subsets I and J of the index set {1, . . . k}. The projection of r on the taining a single if-then pattern inside the loop body).
dimensions in I and J may be seen as a binary relation over the sets Finally we note that every binary relation r ⊆ E × F can be
S1 = Ei1 × · · · × Ei p and S2 = E j1 × · · · × E jq , where I = {i1 , . . . , i p } viewed as a bipartite graph in which the partite sets correspond
and J = { j1 , . . . , jq }. Again, for this binary relation new numerical to E and F. For example, the def -use relation over operand pairs
features may be derived. induces a bipartite graph in which one of the partite sets consists of
The techniques described above for derivations of numerical fea- the def s and the other consists of the uses. This allows us to apply
tures from relations can be automated. We implemented the extrac- the techniques presented in this section to any binary relation. For
tion of numerical features from the Datalog-derived representation instance, let r denote the def -use relation. then the web relation
of the program in Prolog, as the required aggregation operations below defines a web pattern in the bipartite graph corresponding to
are not supported in Datalog. the def -use relation.
speedup
1.4
1.3
1.2
1.1
1
sh a
32
t1
unt
_d
gsm
_c
tra
cia
a e l_ e
d
m_d
n_c
n_e
n_s
m_c
rch1
d
e
fish_
a e l_
ish_
qsor
CRC
jpeg
jpeg
dijks
patri
bitco
susa
susa
susa
adpc
adpc
gsea
rijnd
r ijn d
f
blow
blow
strin
AMD IA32 IA64
Figure 2: Speedups obtained using iterative search on 3 platforms (500 random combinations of optimizations with 50% probability
to select each optimization)
1.4
1.3
speedup
1.2
1.1
1
0.9
0.8
ge
sh a
32
t1
unt
_d
gsm
_c
tra
cia
a e l_ e
d
m_d
n_c
n_e
n_s
m_c
rch1
d
e
fish_
a e l_
ish_
qsor
CRC
jpeg
a
jpeg
dijks
patri
bitco
Aver
susa
susa
susa
adpc
adpc
gsea
rijnd
r ijn d
f
blow
blow
strin
Iterative compilation
Predicted optimization passes using static feature extractior and nearest neighbour classifier
Figure 3: Speedups when predicting best optimizations based on program features in comparison with the achievable speedups after
iterative compilation based on 500 runs per benchmark (ARC processor)
4. The expert aims to use this knowledge base to predict how to 9. To finalize the tuning, and improve compilation and training
select the best optimizations, when running the same bench- time, she performs principal component analysis (PCA) to
marks but on the embedded ARC target. narrow down the set of features that really make sense on her
platform of interest.
5. In the process, her first experiments are disappointing: the
predictions achieved by the model only reach a fraction of the As outlined in the use case scenario, the training of the machine
performance of the best combination of optimizations avail- learning model has been performed on all benchmarks and all the
able in the search space. platforms, except ARC which we used as a test platform for opti-
mization predictions.
6. The expert identifies the source of the problem using stan- To illustrate this scenario in practice, we applied 500 random
dard statistical metrics [19]. It may come from a model over- combinations of 88 compiler optimizations that are known to influ-
fit due to a limited number of features, or to lack of effective ence performance, with 50% probability of being selected, and run
correlations between these features and the semantical prop- each program variant 5 times. To make the adaptive optimization
erties that actually impact performance on the ARC platform. fully transparent, we directly invoke optimization passes inside a
modified GCC pass manager. Figure 2 shows speedups over the
7. The expert designs and implements new program feature ex- best GCC optimization level -O3 for all programs and all architec-
tractors, leveraging her understanding of the optimization tures. It confirms the previous findings about iterative compilation
process and of the performance anomalies involved. [10, 1, 27, 17] — that it is possible to considerably improve per-
formance over default compiler settings, which are tuned to per-
8. She incrementally adds these features into the training set, form well on average across all programs and platforms. In order
until the predictive model shows relevant results. to help end-users and researchers reproduce results and optimize
Feature # Description:
ft1 Number of basic blocks in the method
ft2 Number of basic blocks with a single successor
ft3 Number of basic blocks with two successors
ft4 Number of basic blocks with more than two successors
ft5 Number of basic blocks with a single predecessor
ft6 Number of basic blocks with two predecessors
ft7 Number of basic blocks with more than two predecessors
ft8 Number of basic blocks with a single predecessor and a single successor
ft9 Number of basic blocks with a single predecessor and two successors
ft10 Number of basic blocks with a two predecessors and one successor
ft11 Number of basic blocks with two successors and two predecessors
ft12 Number of basic blocks with more than two successors and more than two predecessors
ft13 Number of basic blocks with number of instructions less than 15
ft14 Number of basic blocks with number of instructions in the interval [15, 500]
ft15 Number of basic blocks with number of instructions greater than 500
ft16 Number of edges in the control flow graph
ft17 Number of critical edges in the control flow graph
ft18 Number of abnormal edges in the control flow graph
ft19 Number of direct calls in the method
ft20 Number of conditional branches in the method
ft21 Number of assignment instructions in the method
ft22 Number of unconditional branches in the method
ft23 Number of binary integer operations in the method
ft24 Number of binary floating point operations in the method
ft25 Number of instructions in the method
ft26 Average of number of instructions in basic blocks
ft27 Average of number of phi-nodes at the beginning of a basic block
ft28 Average of arguments for a phi-node
ft29 Number of basic blocks with no phi nodes
ft30 Number of basic blocks with phi nodes in the interval [0, 3]
ft31 Number of basic blocks with more than 3 phi nodes
ft32 Number of basic block where total number of arguments for all phi-nodes is in greater than 5
ft33 Number of basic block where total number of arguments for all phi-nodes is in the interval [1, 5]
ft34 Number of switch instructions in the method
ft35 Number of unary operations in the method
ft36 Number of instruction that do pointer arithmetic in the method
ft37 Number of indirect references via pointers ("*" in C)
ft38 Number of times the address of a variables is taken ("&" in C)
ft39 Number of times the address of a function is taken ("&" in C)
ft40 Number of indirect calls (i.e. done via pointers) in the method
ft41 Number of assignment instructions with the left operand an integer constant in the method
ft42 Number of binary operations with one of the operands an integer constant in the method
ft43 Number of calls with pointers as arguments
ft44 Number of calls with the number of arguments is greater than 4
ft45 Number of calls that return a pointer
ft46 Number of calls that return an integer
ft47 Number of occurrences of integer constant zero
ft48 Number of occurrences of 32-bit integer constants
ft49 Number of occurrences of integer constant one
ft50 Number of occurrences of 64-bit integer constants
ft51 Number of references of local variables in the method
ft52 Number of references (def/use) of static/extern variables in the method
ft53 Number of local variables referred in the method
ft54 Number of static/extern variables referred in the method
ft55 Number of local variables that are pointers in the method
ft56 Number of static/extern variables that are pointers in the method
Table 1: List of program features produced using our technique to be able to predict good optimizations
their programs, we made experimental data publicly available in predictive modeling techniques similar to [25, 30, 2, 6] to be able
the Collective Optimization Database at [9]. Note that the same to characterize similarities between programs and optimizations,
combination of optimizations found for one benchmark, for exam- and to predict good optimizations for a yet unseen program based
ple, susan_corners on AMD, does not improve execution time of on this knowledge. To validate our results, we decided to use a
the bitcount benchmark, and even degrades the execution time of state-of-the-art predictive model described in [2]. This model pre-
jpeg_c by 10% on the same architecture. It is of course a clear dicts optimizations for a given program based on a nearest-neighbor
signal that program features are key to the success of any machine static feature classifier, suggesting optimizations from the similar-
learning compiler. This of course does not diminish the importance ity of programs. We use a different training set on the embedded
of architecture features and data-set features. system platform ARC, and the traditional leave-one-out validation
Though obtaining strong speedups, the iterative compilation pro- where the evaluated benchmark is removed from the training set.
cess is very time-consuming and impractical in production. We use To avoid strong biasing of the same optimizations from the same
benchmark. When a new program is compiled, features are first ever, machine learning is only able to recover correlations (hence
extracted using our tool, then they are compared with all similar optimization knowledge) from the information it is fed with: it is
features of other programs using a nearest-neighbor classifier, as critical to select topical program features for a given optimization
described in [5]. The program is recompiled again with the combi- problem. To our knowledge, this is the first attempt to propose a
nation of optimizations for the most similar program encountered practical and general method for systematically generating numer-
so far. ical features from a program, and to implement it in a production
As outlined in the use case scenario, we iterated on this baseline compiler. This method does not put any restriction on how to logi-
method while gradually adding more and more features. We even- cally and algebraically aggregate semantical properties into numer-
tually reached 11% average performance across all benchmarks, ical features, offering a virtually exhaustive coverage of statistically
out of 15% when picking the optimal points in the search space; see relevant information that can be derived from a program.
Figure 3. Adding more features did not bring us more performance This method has been implemented in GCC and applied to a
on average across the benchmarks. The list of the 56 important number of general-purpose and embedded benchmarks. We illus-
features identified in this iterative process that are able to capture trate our method on the difficult problem of selecting the optimal
complex dependencies between program structure and a combina- setting of compiler optimizations for improving the performance of
tion of multiple optimizations is presented in Table 1. Though we an application, and demonstrate its practicality achieving 74% of
did not reach the best performance achieved with iterative compi- the available speedup obtained through iterative compilation on a
lation, we showed that our technique for automatic feature extrac- wide range of benchmarks and 4 different general-purpose and em-
tion can already be used effectively for machine learning, to enable bedded architectures. We believe this work is an important step to-
optimization knowledge reuse and automatically improve program wards generalizing machine learning techniques to tackle the com-
execution time. The simplicity and expressiveness of the feature plexity of present and future computing systems. Feature extractor
extractor is one key contribution of our approach: a few lines of presented in this paper is now available for download within MILE-
Prolog code for each new feature, building on a finite set of pretty- POST GCC at [14] while experimental data is available at [9] to
printers from GCC’s internal data structures into Datalog entities. help researchers reproduce and extend this work.
Our results pave the way for a more systematic study of the qual-
ity and importance of individual program features, a necessary step
towards automatic feature selection and the construction of robust
6. ACKNOWLEDGMENTS
predictive models for compiler optimizations. This work was partly supported by the European Commission
Our main contribution is to construct program features by ag- through the FP6 project MILEPOST id. 035307 and by the HiPEAC
gregation and filtering of a large amount of semantical properties. Network of Excellence.
But comparison with other predictive techniques is a relevant ques-
tion in itself, related with the selection of the features and machine 7. REFERENCES
learning classifier or predictor. Our work is intended to ease such
comparisons, replicating the work of others into a single machine [1] ACOVEA: Using Natural Selection to Investigate Software
learning optimization platform. Complexities.
http://www.coyotegulch.com/products/acovea.
5. CONCLUSION [2] F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin,
M.F.P. O’Boyle, J. Thomson, M. Toussaint, and C.K.I.
Though the combination of iterative compilation and machine
Williams. Using machine learning to focus iterative
learning has been studied for more than a decade and showed great
optimization. In Proceedings of the International Symposium
potential for program optimizations, there are surprisingly few re-
on Code Generation and Optimization (CGO), 2006.
search results on the problem of selecting good quality program
features. This problem is relevant for effective optimization knowl- [3] A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman. Compilers:
edge reuse, to speedup the search for good optimizations, to build Principles, Techniques and Tools. Addison-Wesley, 2nd
predictive models for compilation heuristics, to select optimization edition, 2007.
passes and ordering, to build and tune analytical performance mod- [4] F. Bodin, T. Kisuki, P.M.W. Knijnenburg, M.F.P. O’Boyle,
els, and more. and E. Rohou. Iterative compilation in a non-linear
Up to now, compiler experts had to manually construct and im- optimisation space. In Proceedings of the Workshop on
plement feature extractors that best suit their purpose. Without a Profile and Feedback Directed Compilation, 1998.
systematic way to construct features and evaluate their merits, this [5] Edwin V. Bonilla, Christopher K. I. Williams, Felix V.
task remains a tedious trial and error process relying on what the Agakov, John Cavazos, John Thomson, and Michael F. P.
experts believe they understand about the impact of optimization O’Boyle. Predictive search distributions. In William W.
passes. In a modern compiler like GCC, more than 200 passes com- Cohen and Andrew Moore, editors, Proceedings of the 23rd
pete in a dreadful interplay of tradeoffs and assumptions about the International Conference on Machine learning, pages
program and the target architecture (itself very complex and rather 121–128, New York, NY, USA, 2006. ACM.
unpredictable). The global impact of these heuristics can be very [6] J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. O’Boyle,
far from optimal, even on a major target of the compiler such as and O. Temam. Rapidly selecting good compiler
the x86 ISA and its most popular microarchitectural instances. But optimizations using performance counters. In Proceedings of
what about embedded targets which attract less attention from ex- the 5th Annual International Symposium on Code
perts developers and cannot afford large in-house compiler groups? Generation and Optimization (CGO), March 2007.
What about design-space exploration of the ISA, microarchitecture [7] K. Cooper, A. Grosul, T. Harvey, S. Reeves,
and compiler? D. Subramanian, L. Torczon, and T. Waterman. ACME:
So far, a limited set of largely syntactical features have been adaptive compilation made efficient. In Proceedings of the
devised to prove that optimization knowledge can be reused and Conference on Languages, Compilers, and Tools for
derived automatically from feedback-directed optimization. How- Embedded Systems (LCTES), 2005.
[8] K.D. Cooper, P.J. Schielke, and D. Subramanian. Optimizing architecture for the FFT. In Proceedings of the IEEE
for reduced code space using genetic algorithms. In International Conference on Acoustics, Speech, and Signal
Proceedings of the Conference on Languages, Compilers, Processing, volume 3, pages 1381–1384, Seattle, WA, May
and Tools for Embedded Systems (LCTES), pages 1–9, 1999. 1998.
[9] Collective Tuning Infrastructure: automating and [25] A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning
accelerating development and optimization of computing approach to automatic production of compiler heuristics. In
systems. http://cTuning.org. Proceedings of the International Conference on Artificial
[10] ESTO: Expert System for Tuning Optimizations. Intelligence: Methodology, Systems, Applications, LNCS
http://www.haifa.ibm.com/projects/systems/cot/esto. 2443, pages 41–50, 2002.
[11] Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea [26] S.S. Muchnick. Advanced Compiler Design and
Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Phil Implementation. Morgan Kaufmann, 1997.
Barnard, Elton Ashton, Eric Courtois, Francois Bodin, [27] Z. Pan and R. Eigenmann. Fast and effective orchestration of
Edwin Bonilla, John Thomson, Hugh Leather, Chris compiler optimizations for automatic performance tuning. In
Williams, and Michael O’Boyle. Milepost gcc: machine Proceedings of the International Symposium on Code
learning based research compiler. In Proceedings of the GCC Generation and Optimization (CGO), pages 319–332, 2006.
Developers’ Summit, June 2008. [28] David Parello, Olivier Temam, Albert Cohen, and
[12] Grigori Fursin and Olivier Temam. Collective optimization. Jean-Marie Verdun. Towards a systematic, pragmatic and
In Proceedings of the International Conference on High architecture-aware program optimization process for
Performance Embedded Architectures & Compilers complex processors. In ACM/IEEE Conf. on Supercomputing
(HiPEAC 2009), January 2009. (SC’04), page 15, Washington, DC, 2004.
[13] GCC: GNU Compiler Collection. http://gcc.gnu.org. [29] B. Singer and M. Veloso. Learning to predict performance
[14] MILEPOST GCC: Collaborative development website. from formula modeling and training data. In Proceedings of
http://cTuning.org/milepost-gcc. the Conference on Machine Learning, 2000.
[15] Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, [30] M. Stephenson and S. Amarasinghe. Predicting unroll factors
Todd M. Austin, Trevor Mudge, and Richard B. Brown. using supervised classification. In Proceedings of
Mibench: A free, commercially representative embedded International Symposium on Code Generation and
benchmark suite. In Proceedings of the IEEE 4th Annual Optimization (CGO), pages 123–134, 2005.
Workshop on Workload Characterization, Austin, TX, [31] S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D.I.
December 2001. August. Compiler optimization-space exploration. In
[16] K. Heydemann and F. Bodin. Iterative compilation for two Proceedings of the International Symposium on Code
antagonistic criteria: Application to code size and Generation and Optimization (CGO), pages 204–215, 2003.
performance. In Proceedings of the 4th Workshop on [32] J. D. Ullman. Principles of Database and Knowledge
Optimizations for DSP and Embedded Systems, colocated Systems, volume 1. Computer Science Press, 1988.
with CGO, 2006. [33] J. Whaley and M.S. Lam. Cloning based context sensitive
[17] K. Hoste and L. Eeckhout. Cole: Compiler optimization level pointer alias analysis using binary decision diagrams. In
exploration. In Proceedings of International Symposium on Proceedings of the ACM SIGPLAN Conference on
Code Generation and Optimization (CGO), 2008. Programming Language Design and Implementation (PLDI),
[18] Shih-Hao Hung, Chia-Heng Tu, Huang-Sen Lin, and 2004.
Chi-Meng Chen. An automatic compiler optimizations [34] R. Whaley and J. Dongarra. Automatically tuned linear
selection framework for embedded applications. In Intl. algebra software. In Proceedings of the Conference on High
Conf. on Embedded Software and Systems (ICESS’09), pages Performance Networking and Computing, 1998.
381–387, 2009. [35] D. Whitfield and M. L. Soffa. An approach to ordering
[19] Raj Jain. The Art of Computer Systems Performance optimizing transformations. In ACM Symp. on Principles &
Analysis. John Wiley and Sons, 1991. practice of parallel programming (PPoPP’90), pages
[20] P. Kulkarni, W. Zhao, H. Moon, K. Cho, D. Whalley, 137–146, Seattle, Washington, United States, 1990.
J. Davidson, M. Bailey, Y. Paek, and K. Gallivan. Finding
effective optimization phase sequences. In Proc. Languages,
Compilers, and Tools for Embedded Systems (LCTES), pages
12–23, 2003.
[21] L.Dehaspe and H.Toivonen. Discovery of frequent datalog
patterns. In Data Mining and Knowledge Discovery, pages
7–36, 1999.
[22] H. Leather, E. Yom-Tov, M. Namolaru, and A. Freund.
Automatic feature generation for setting compilers heuristics.
In 2nd Workshop on Statistical and Machine Learning
Approaches Applied to Architectures and Compilation
(SMART’08), colocated with HiPEAC’08 conference, 2008.
[23] S. MacLane. Categories for the Working Mathematician,
volume 5 of Graduate Texts in Mathematics. Springer
Verlag, Berlin, 1971.
[24] F. Matteo and S. Johnson. FFTW: An adaptive software