0% found this document useful (0 votes)
88 views13 pages

Mlir

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views13 pages

Mlir

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MLIR: Scaling Compiler Infrastructure for Domain

Specific Computation
Chris Lattner Mehdi Amini Uday Bondhugula Albert Cohen
Google, USA∗ Google, USA Indian Institute of Science, India† Google, France
[email protected] orcid.org/0000-0002-2066-3106 orcid.org/0000-0002-8297-6159 orcid.org/0000-0002-8866-5343
2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) | 978-1-7281-8613-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/CGO51591.2021.9370308

Andy Davis Jacques Pienaar River Riddle Tatiana Shpeisman


Google, USA Google, USA Google, USA Google, USA
orcid.org/0000-0003-0443-7624

Nicolas Vasilache Oleksandr Zinenko


Google, USA Google, France
orcid.org/0000-0003-1978-0222

Abstract—This work presents MLIR, a novel approach to of C++ code is very difficult on LLVM IR. We observe that
building reusable and extensible compiler infrastructure. MLIR many languages (including e.g. Swift, Rust, Julia, Fortran)
addresses software fragmentation, compilation for heterogeneousdevelop their own IR in order to solve domain-specific
hardware, significantly reducing the cost of building domain
specific compilers, and connecting existing compilers together. problems, like language/library-specific optimizations, flow-
MLIR facilitates the design and implementation of code sensitive type checking (e.g. for linear types), and to improve
generators, translators and optimizers at different levels of the implementation of the lowering process. Similarly, machine
abstraction and across application domains, hardware targets learning systems typically use “ML graphs” as a domain-
and execution environments. The contribution of this work specific abstraction in the same way.
includes (1) discussion of MLIR as a research artifact, built
for extension and evolution, while identifying the challenges and While the development of domain-specific IRs is a well
studied art, their engineering and implementation cost remains
opportunities posed by this novel design, semantics, optimization
specification, system, and engineering. (2) evaluation of MLIR high. The quality of the infrastructure is not always a first
as a generalized infrastructure that reduces the cost of building
priority (or easy to justify) for implementers of these systems.
compilers—describing diverse use-cases to show research and Consequently, this can lead to lower quality compiler systems,
educational opportunities for future programming languages,
compilers, execution environments, and computer architecture. including user-visible problems like slow compile times, buggy
The paper also presents the rationale for MLIR, its original implementations, suboptimal diagnostic quality, poor debugging
design principles, structures and semantics. experience for optimized code, etc.
The MLIR project1 aims to directly tackle these program-
I. I NTRODUCTION ming language design and implementation challenges—by mak-
Compiler design is a mature field with applications to code ing it cheap to define and introduce new abstraction levels, and
generation, static analysis, and more. The field has seen the provide “in the box” infrastructure to solve common compiler
development of a number of mature technology platforms engineering problems. MLIR does this by (1) standardizing
which have enabled massive reuse, including systems like the the Static Single Assignment (SSA)-based IR data structures,
LLVM compiler infrastructure [1], the Java Virtual Machine (2) providing a declarative system for defining IR dialects, and
(JVM) [2], and many others. A common characteristic of (3) providing a wide range of common infrastructure including
these popular systems is their “one size fits all” approach— documentation, parsing and printing logic, location tracking,
a single abstraction level to interface with the system: the multithreaded compilation support, pass management, etc.
LLVM Intermediate Representation (IR) is roughly “C with This paper further presents the overarching principles under-
vectors”, and JVM provides an “object-oriented type system lying the design and implementation of MLIR. We will explore
with a garbage collector” abstraction. This “one size fits all” the essential design points of the system and how they relate
approach is incredibly valuable—and in practice, the mapping to the overarching principles, sharing our experience applying
to these domains from ubiquitous source languages (C/C++ MLIR to a number of compilation problems.
and Java respectively) is straightforward.
A. Contributions
At the same time, many problems are better modeled at a
higher- or lower-level abstraction, e.g. source-level analysis Most of the MLIR system is built out of well known concepts
and algorithms. Yet the objectives and design are sufficiently

With SiFive at the time of publication.
† 1 https://mlir.llvm.org
Visiting researcher at Google at the time of this work.

978-1-7281-8613-9/21/$31.00 © 2021 IEEE 2 CGO 2021, Virtual, Republic of Korea


Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
novel that studying them offer vast opportunities for research, XLA HLO LLVM IR

and even more so within the boundaries of the following TensorRT TPU IR
overarching principles: TensorFlow
nGraph Several others
Graph
Parsimony: Apply Occam’s razor to builtin semantics, Core ML NNAPI
concepts, and programming interface. Harness both intrin- TensorFlow Lite Many other
sic and incidental complexity by abstracting properties of
operations and types. Specify invariants once, but verify Fig. 1. TensorFlow execution spanning different frameworks.
correctness throughout. Query properties in the context of
C, C++, Java & JVM
a given compilation pass. With very little builtin, this opens Java BC
ObjC, languages
the door to extensibility and customization. CUDA,
OpenCL Clang AST
Traceability: Retain rather than recover information.
Declare rules and properties to enable transformation, rather Swift Swift AST SIL IR
LLVM IR
than step wise imperative specification. Extensibility comes Rust Rust AST MIR IR
with generic means to trace information, enforced by extensive
Julia Julia AST Julia IR
verification. Composable abstractions stem from “glassboxing”
their properties and separating their roles—type, control, data Fig. 2. Compilation pipeline with mid-level language IRs.
flow, etc.
Progressivity: Premature lowering is the root of all evil. successful at unifying and integrating work across a range
Beyond representation layers, allow multiple transformation of different languages, but high-level languages often end up
paths that lower individual regions on demand. Together with building their own high-level IR and reinventing the same kind
abstraction-independent principles and interfaces, this enables of technology for higher levels of abstraction (see Figure 2).
reuse across multiple domains. At the same time, the LLVM community struggled with the
representation of parallel constructs, and how to share front-
While these principles are well established, one of them is
end lowering infrastructure (e.g. for C calling conventions, or
often implemented at the expense of another; e.g., layering
cross-language features like OpenMP), with no satisfactory
in network and operating system stacks aligns with the
solutions.
progressivity principle but breaks parsimony. This has also been
Faced with this challenge, given we could not afford to im-
the case in compilers with multiple layers of IR. Also, following
plement N improved compiler instances, we decided to go for a
these principles may hurt expressiveness and effectiveness;
more general solution: investing in a high quality infrastructure
e.g., traceability in safety-critical and secure systems involves
which would benefit multiple domains, progressively upgrading
limiting optimizations and their aggressivity.
existing systems, making it easier to tackle pressing problems
In a nutshell, we identify design and engineering principles
like heterogeneous compilation for specialized accelerators,
for compiler construction to thrive in a narrow middle that sup-
and provide new research opportunities. Now that we gathered
port an open semantics ecosystem. We discovered complexity
a significant amount of experience building and deploying
can be tamed without restricting expressivity, allowing for fast
MLIR-based systems, we are able to look back on its rationale
IR design exploration and consolidation across domains, both
and design and discuss why this direction was pursued.
of which are severely lacking in production systems.
The contributions of this paper are: (1) positioning the II. D ESIGN P RINCIPLES
problem of building scalable and modular compiler systems Let us now explore the requirements that guided the design
in terms of proven design and engineering principles; (2) a of MLIR and their relation with the overarching principles.
description of a novel compiler infrastructure that follows these Little Builtin, Everything Customizable [Parsimony]:
principles, with important industrial and research applications; The system is based on a minimal number of fundamental
(3) exploration of selected applications to diverse domains, il- concepts, leaving most of the intermediate representation fully
lustrating the generality of the approach and sharing experience customizable. A handful of abstractions—types, operations
developing systems that build on the MLIR infrastructure. and attributes, which are the most common in IRs—should
be used to express everything else, allowing fewer and more
B. Where Did MLIR Come From? consistent abstractions that are easy to comprehend, extend
Work on MLIR began with a realization that modern machine and adopt. Broadly, customizability ensures the system can
learning frameworks are composed of many different compilers, adapt to changing requirements and is more likely to be
graph technologies, and runtime systems (see Figure 1)—which applicable to future problems. In that sense, we ought to build
did not share a common infrastructure or design principles. This an IR as a rich infrastructure with reusable components and
manifested in multiple user-visible ways, including poor error programming abstractions supporting the syntax and semantics
messages, failures in edge cases, unpredictable performance, of its intermediate language.
and difficulty generalizing the stack to support new hardware. A success criterion for customization is the possibility to
We soon realized that the compiler industry as a whole express a diverse set of abstractions including machine learning
has a similar problem: existing systems like LLVM are very graphs, ASTs, mathematical abstractions such as polyhedral,

3
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
Control Flow Graphs (CFGs) and instruction-level IRs such the system should preserve the structured control flow such
as LLVM IR, all without hard-coding concepts from these as loop structure throughout the relevant transformations;
abstractions into the system. removing this structure, i.e. lowering to a CFG essentially
Certainly, customizability creates a risk of internal frag- means no further transformations will be performed that
mentation due to poorly compatible abstractions. While there exploits the structure. The state of the art in modeling parallel
is unlikely a purely technical solution, the system should computing constructs in a production compiler highlights how
encourage one to design reusable abstractions and assume difficult the task may be in general [7], [8].
they will be used outside of their initial scope. As a corollary, mixing different levels of abstraction and
SSA and Regions [Parsimony]: The Static Single Assign- different concepts in the same IR is a key to allowing a part of
ment (SSA) form [3] is a widely used representation in com- the representation to remain in higher-level abstraction while
piler IRs. It provides numerous advantages including making another part is lowered. This would enable, for instance, a
dataflow analysis simple and sparse, is widely understood by the compiler for a custom accelerator to reuse some higher-level
compiler community for its relation with continuation-passing structure and abstractions defined by the system alongside with
style, and is established in major frameworks. As a result, the primitive scalar/vector instructions specific to the accelerator.
IR enforces the value-based semantics of SSA, its referential Another corollary is that the system should support pro-
transparency and algorithmic efficiency, all considered essential gressive lowering,from the higher-level representation down
to a modern compiler infrastructure. However, while many to the lowest-level, performed in small steps along multiple
existing IRs use a flat, linearized CFG, representing higher abstractions. The need for multiple levels of abstractions stems
level abstractions push introducing nested regions as a first- from the variety of platforms and programming models a
class concept in the IR. This goes beyond the traditional region compiler infrastructure has to support.
formation to lift higher level abstractions (e.g., loop trees), Previous compilers have been introducing multiple fixed
speeding up the compilation process or extracting instruction, levels of abstraction in their pipeline—e.g. the Open64 WHIRL
or SIMD parallelism [4], [5], [6]. To support heterogeneous representation [9] has five levels, as does the Clang compiler
compilation, the system has to support the expression of which lowers from ASTs to LLVM IR, to SelectionDAG, to
structured control flow, concurrency constructs, closures in MachineInstr, and to MCInst. More flexible designs are required
source languages, and many other purposes. One specific to support extensibility. This has deep implications on the
challenge is to make CFG-based analyses and transformations phase ordering of transformations. As compiler experts started
compose over nested regions. implementing more and more transformation passes, complex
In doing so, we agree to sacrifice the normalization, and interactions between these passes started appearing. It was
sometimes the canonicalization properties of LLVM. Being shown early on that combining optimization passes allows the
able to lower a variety of data and control structures into compiler to discover more facts about the program. One of the
a smaller collection of normalized representations is key to first illustrations of the benefits of combining passes was to
keeping compiler complexity under control. The canonical mix constant propagation, value numbering and unreachable
loop structure with its pre-header, header, latch, body, is a code elimination [10].
prototypical case of a linearized control flow representation Declaration and Validation [Parsimony and Traceability]:
of a variety of loop constructs in front-end languages. We Defining representation modifiers should be as simple as
aim at offering users a choice: depending on the compilation introducing new abstractions; a compiler infrastructure is
algorithm of interest, of the pass in the compilation flow, nested only as good as the transformations it supports. Common
loops may be captured as nested regions, or as linearized transformations should be implementable as rewrite rules
control flow. By offering such a choice, we depart from the expressed declaratively, in a machine-analyzable format to
normalization-only orientation of LLVM while retaining the reason about properties of the rewrites such as complexity and
ability to deal with higher level abstractions when it matters. completion. Rewriting systems have been studied extensively
In turn, leveraging such choices raises questions about how to for their soundness and efficiency, and applied to numerous
control the normalization of abstractions, which is the purpose compilation problems, from type systems to instruction se-
of the next paragraph. lection. Since we aim for unprecedented extensibility and
Maintain Higher-Level Semantics [Progressivity]: The incremental lowering capabilities, this opens numerous avenues
system needs to retain the information and structure that are for modeling program transformations as rewrite systems.
required for analysis or optimizing performance. Attempts It also raises interesting questions about how to represent
to recover abstract semantics once lowered are fragile and the rewrite rules and strategies, and how to build machine
shoehorning this information at low-level often invasive (e.g., descriptions capable of steering rewriting strategies through
all passes need to be revisited in the case of using debug multiple levels of abstraction. The system needs to address
information to record structure). Instead, the system should these questions while preserving extensibility and enforcing
maintain the structure of computations and progressively lower monotonic and reproducible behavior.
to the hardware abstraction. The loss of structure is then The openness of the ecosystem also calls for an extensive
conscious and happens only where the structure is no longer validation mechanism. While verification and testing are useful
needed to match the underlying execution model. For example, to detect compiler bugs, and to capture IR invariants, the need

4
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
for robust validation methodologies and tools is amplified in // Attribute aliases can be forward-declared.
#map1 = (d0, d1) -> (d0 + d1)
an extensible system. The mechanism should aim to make this #map3 = ()[s0] -> (s0)
easy to define and as declarative as practical, providing a single
// Ops may have regions attached.
source of truth. A long term goal would be to reproduce the "affine.for"(%arg0) ({
// Regions consist of a CFG of blocks with arguments.
successes of translation validation [11], [12], [13], [14] and ^bb0(%arg4: index):
modern approaches to compiler testing [15]. Both are currently // Block are lists of operations.
"affine.for"(%arg0) ({
open problems in the context of an extensible compiler. ^bb0(%arg5: index):
Source Location Tracking [Traceability]: The provenance // Ops use and define typed values, which obey SSA.
%0 = "affine.load"(%arg1, %arg4) {map = (d0) -> (d0)}
of an operation—including its original location and applied : (memref<?xf32>, index) -> f32
%1 = "affine.load"(%arg2, %arg5) {map = (d0) -> (d0)}
transformations—should be easily traceable within the system. : (memref<?xf32>, index) -> f32
This intends to address the lack-of-transparency problem, %2 = "std.mulf"(%0, %1) : (f32, f32) -> f32
%3 = "affine.load"(%arg3, %arg4, %arg5) {map = #map1}
common to complex compilation systems, where it is virtually : (memref<?xf32>, index, index) -> f32
%4 = "std.addf"(%3, %2) : (f32, f32) -> f32
impossible to understand how the final representation was "affine.store"(%4, %arg3, %arg4, %arg5) {map = #map1}
constructed from the original one. : (f32, memref<?xf32>, index, index) -> ()
// Blocks end with a terminator Op.
This is particularly problematic when compiling safety- "affine.terminator"() : () -> ()
critical and sensitive applications, where tracing lowering // Ops have a list of attributes.
}) {lower_bound = () -> (0), step = 1 : index, upper_bound = #map3}
and optimization steps is an essential component of software : (index) -> ()
"affine.terminator"() : () -> ()
certification procedures [16]. When operating on secure code }) {lower_bound = () -> (0), step = 1 : index, upper_bound = #map3}
such as cryptographic protocols or algorithms operating on : (index) -> ()

privacy-sensitive data, the compiler often faces seemingly


redundant or cumbersome computations that embed a security Fig. 3. MLIR generic representation for polynomial multiplication using affine
and std dialects. The same IR is displayed with the custom syntax Figure 7.
or privacy property not fully captured by the functional
semantics of the source program: this code may prevent the %results:2 = "d.operation"(%arg0, %arg1) ({
// Regions belong to Ops and can have multiple blocks. Region
exposure of side channels or harden the code against cyber or
^block(%argument: !d.type): Block
fault attacks. Optimizations may alter or completely invalidate // Ops have function types (expressing mapping).
such protections [17]; this lack of transparency is known as %value = "nested.operation"() ({
// Ops can contain nested regions. Region
WYSINWYX [18] in secure compilation. One indirect goal "d.op"() : () -> ()
of accurately propagating high-level information to the lower }) : () -> (!d.other_type)
"consume.value"(%value) : (!d.other_type) -> ()
levels is to help support secure and traceable compilation. ^other_block: Block
"d.terminator"() [^block(%argument : !d.type)] : () -> ()
III. IR D ESIGN })
// Ops can have a list of attributes.
Our main contribution is to present an IR that follows the {attribute="value" : !d.type} : () -> (!d.type, !d.other_type)
principles defined in the previous section. This is what MLIR
does and we review its main design points in this section. Fig. 4. Operation (Op) is a main entity in MLIR; operations contain a list of
MLIR has a generic textual representation (example in Fig- regions, regions contain a list of blocks, blocks contains a list of Ops, enabling
ure 3) that supports MLIR’s extensibility and fully reflects the recursive structures
in-memory representation, which is paramount for traceability,
manual IR validation and testing. Extensibility comes with the and have a Type that encodes the compile-time knowledge
burden of verbosity, which can be compensated by the custom about the data. In addition to an opcode, operands and results,
syntax that MLIR supports; for example, Figure 7 illustrates Ops may also have Attributes, Regions, Successor Blocks, and
the user-defined syntax for Figure 3. Location Information. Figure 3 illustrates values and Ops, %-
Operations: The unit of semantics in MLIR is an identifiers are (packs of) named values, with “:” specifying
“operation”, referred to as Op. Everything from “instruction” the number in a pack if more than one and “#’ a particular
to “function” to “module” are modeled as Ops in this system. value. In the generic textual representation, operation names
MLIR does not have a fixed set of Ops, but allows (and en- are quoted string literals followed by operands in parentheses.
courages) user-defined extensions, according to the parsimony Compiler passes treat unknown Ops conservatively, and
and “everything customizable” principles. The infrastructure MLIR has rich support for describing the semantics of Ops to
provides a declarative syntax for defining Ops based on passes through traits and interfaces as described in Section V-A.
TableGen [19], as illustrated in Figure 5.2 Op implementation has verifiers that enforce the Op invariants
Ops (see Figure 4) have a unique opcode, a string identifying and participate in overall IR validation.
the operation and its dialect. Ops take and produce zero or Attributes: MLIR attributes contain compile-time
more values, called operands and results respectively, and these information about operations, other than the opcode. Attributes
are maintained in SSA form. Values represent data at runtime, are typed (e.g., integer, string), and each Op instance has an
2 Alternatives have been proposed, aiming for higher productivity, soundness
open key-value dictionary from string names to attribute values.
guarantees, and better interoperability with high-level languages; this still a In the generic syntax, attributes are found in a brace-enclosed
subject of active design discussions. comma-separated list of pairs. Figure 3 uses attributes to define

5
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
// An Op is a TableGen definition that inherits the "Op" class parameterized
control across regions. In this example, the body is executed
// with the Op name repeatedly until the upper bound is reached.
def LeakyReluOp: Op<"leaky_relu",
// and a list of traits used for verification and optimization. The body of each region is a list of blocks, and each block
[NoSideEffect, SameOperandsAndResultType]> {
// The body of the definition contains named fields for a one-line
ends with a terminator operation, that may have successor
// documentation summary for the Op. blocks to which the control flow may be transferred. Each
let summary = "Leaky Relu operator";
terminator (e.g. “switch”, “conditional branch” or “unwind”)
// The Op can also a full-text description that can be used to generate
// documentation for the dialect.
defines its own semantics. It may chose to transfer the control
let description = [{ flow to another block in the same region, or return it to the Op
Element-wise Leaky ReLU operator
x -> x >= 0 ? x : (alpha * x) enclosing the region. The graph of successors defines a CFG,
}];
allowing standard SSA-based control flow within a region.
// Op can have a list of named arguments, which include typed operands Instead of using φ nodes, MLIR uses a functional form of
// and attributes.
let arguments = (ins AnyTensor:$input, F32Attr:$alpha); SSA [20] where terminators pass values into block arguments
// And a list of named and typed outputs.
defined by the successor block. Each block has a (potentially
let results = (outs AnyTensor:$output); empty) list of typed block arguments, which are regular values
}
and obey SSA. The semantics of terminator Ops defines what
values the arguments of the block will take after the control
Fig. 5. Operation Definition Syntax (ODS) provides a concise way of defining
new Ops in MLIR. Here, one defines the LeakyRelu Op taking a tensor
is transferred. For the first (entry) block of the region, the
and a floating-point value, and returning a tensor of the same type as the input values are defined by the semantics of the enclosing Op. For
one. example, affine.for uses the entry block argument %arg4
as loop induction variable. Finally, this explicit graph design
and the extensibility of Ops is reminiscent of the sea-of-nodes
bounds of a loop that are known to be constant affine forms: representation [21]: this connection is intentional and has been
{lower_bound = () -> (0), step = 1 : index, a major influence for the selection of MLIR’s flavor of SSA.
upper_bound = #map3} where lower_bound is an Value Dominance and Visibility: Ops can only use values
example of an attribute name. The () -> (0) notation is that are in scope, i.e. visible according to SSA dominance,
used for inline affine forms, in this case producing an affine nesting, and semantic restrictions imposed by enclosing opera-
function producing a constant 0 value. The #map3 notation tions. Values are visible within a CFG if they obey standard
is used for attribute aliases, which allow associate attribute SSA dominance relationships, where control is guaranteed to
values with a label upfront. pass through a definition before reaching a use.
Attributes derive their meaning either from the Op semantics Region-based visibility is defined based on simple nesting of
or from the dialect (Section III) they are associated with. As regions: if the operand to an Op is outside the current region,
with opcodes, there is no fixed set of attributes. Attributes may then it must be defined lexically above and outside the region
reference foreign data structures, which is useful for integrating of the use. This is what allows Ops within an affine.for
with existing systems, e.g., the contents of (known at compile operation to use values defined in outer scopes.
time) data storage in an ML system. MLIR also allows operations to be defined as isolated from
Location Information: MLIR provides a compact represen- above, indicating that the operation is a scope barrier—e.g.
tation for location information, and encourages the processing the “std.func” Op defines a function, and it is not valid for
and propagation of this information throughout the system, operations within the function to refer to values defined outside
following the traceability principle. It can be used to keep the the function. In addition to providing useful semantic checking,
source program stack trace that produced an Op, to generate a module containing isolated-from-above Ops may be processed
debug information. It standardizes the way to emit diagnostics in parallel by an MLIR compiler since no use-def chains may
from the compiler, and is used by a wide range of testing tools. cross the isolation barriers. This is important for compilation
Location information is also extensible, allowing a compiler to utilize multicore machines.
to refer to existing location tracking systems, high-level All these design choices highlight the progressivity principle,
AST nodes, LLVM-style file-line-column address, DWARF while erring on the side of parsimony when a concept does
debugging info, etc. not appear to be generic and essential enough to be builtin.
Regions and Blocks: An instance of an Op may have a list Symbols and Symbol Tables: Ops can have a symbol
of attached regions. A region provides the nesting mechanism table attached. This table is a standardized way of associating
in MLIR: it contains a list of blocks, each of which contains names, represented as strings, to IR objects, called symbols.
a list of operations (that may contain further regions). As The IR does not prescribe what symbols are used for, leaving
with attributes, the semantics of a region are defined by the it up to the Op definition. Symbols are most useful for named
operation they are attached to, however the blocks inside the entities need that not obey SSA: they cannot be redefined within
region (if more than one) form a Control Flow Graph (CFG). the same table, but they can be used prior to their definition.
For example, the affine.for operation in Figure 3 is a For example, global variables, functions or named modules
loop with the single-block body attached as a region, located can be represented as symbols. Without this mechanism, it
between ({ and }) delimiters. The Op specifies the flow of would have been impossible to define, e.g., recursive function

6
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
referring to themselves in their definition. Symbol tables can However, these are not separate concepts in MLIR: they are
be nested if an Op with a symbol table attached has associated implemented as Ops in the builtin dialect, again an illustration
regions containing similar Ops. MLIR provides a mechanism of parsimony in the design.
to reference symbols from an Op, including nested symbols. A module is an Op with a single region containing a single
Dialects: MLIR manages extensibility using Dialects, block, and terminated by a dummy Op that does not transfer
which provide a logical grouping of Ops, attributes and the control flow. Like any block, its body contains a list of Ops,
types under a unique namespace. Dialects themselves do not which may be functions, global variables, compiler metadata,
introduce any new semantics but serve as a logical grouping or other top-level constructs. Modules may define a symbol in
mechanism that provides common Op functionality (e.g., order to be referenced.
constant folding behavior for all ops in the dialect). They Similarly, a function is an Op with a single region that may
organize the ecosystem of language- and domain-specific contain zero (in case of declaration) or more blocks. Built-in
semantics while following the parsimony principle. The dialect functions are compatible with “call” and “return” operations
namespace appears as a dot-separated prefix in the opcode, of the “std” dialect, which transfer the control to and from the
e.g., Figure 3 uses affine and std dialects. function, respectively. Other dialects are free to define their
The separation of Ops, types and attributes into dialects is own function-like Ops.
conceptual and is akin to designing a set of modular libraries.
IV. E VALUATION : A PPLICATIONS OF MLIR
For example, a dialect can contain Ops and types for operating
on hardware vectors (e.g., shuffle, insert/extract element, mask), MLIR is a system that aims to generalize and drive a wide
and another dialect can contain Ops and types for operating range of compiler projects, so our primary evaluation metric is
on algebraic vectors (e.g. absolute value, dot product, etc.). to show that it is being adopted and used for diverse projects.
Whether both dialects use the same vector type and where does By doing so we acknowledge the software engineering nature
this type belong are design decisions left to MLIR user. of the problem and contributions. We provide a summary of
While it is possible to put all Ops, types and attributes in community activity and describe a few use cases in more
a single dialect, it would quickly become unmanageable due detail to highlight the generality and extensibility of MLIR and
to the large number of simultaneously present concepts and evaluate how well compiler and domain experts experience the
name conflicts, amongst other issues. Although each Op, type design principles of the IR.
and attribute belongs to exactly one dialect, MLIR explicitly Today, MLIR is a growing open source project with a
supports a mix of dialects to enable progressive lowering. Ops community spanning academia and industry.3 For example,
from different dialects can coexist at any level of the IR at the first academic workshop about the use of MLIR in High-
any time, they can use types defined in different dialects, etc. Performance Computing was attended by individuals from 16
Intermixing of dialects allows for greater reuse, extensibility universities and involved 4 national laboratories from 4 different
4
and provides flexibility that otherwise would require developers countries. MLIR was also endorsed by 14 multinational
to resort to all kinds of non-composable workarounds. companies and at the 2019 LLVM Developer Meeting more
Type System: Every value in MLIR has a type, which is than 100 industry developers attended a roundtable event
specified in the Op that produces the value or in the block about MLIR. Community adoption and participation is a proxy
that defines the value as an argument. Types encode compile- measure for usability and need. More than 26 dialects are in
time information about a value. The type system in MLIR is development in public or private and 7 projects across different
user-extensible, and may, for example, refer to existing foreign companies are replacing custom infrastructure with MLIR. We
type systems. MLIR enforces strict type equality checking and argue that this shows a real need for MLIR, as well as endorses
does not provide type conversion rules. Ops list their inputs its usability.
and result types using trailing function-like syntax. In Figure 3, A. TensorFlow Graphs
std.load maps from the memory reference and index types
While the other discussed representations are familiar to
to the type of the value it loads.
most compiler developments, one of key use cases for MLIR
From the type theory point of view, MLIR only supports
is to support the development of machine learning frameworks.
non-dependent types, including trivial, parametric, function,
Their internal representations is often based on a data flow
sum and product types. While it is possible to implement a
graph [22] with a dynamic execution semantics.
dependent type system by combining Ops with symbols and
TensorFlow [23] is an example of such framework. Its
user-defined types, such types will be opaque to the IR.
representation is a high-level dataflow computation where the
For convenience, MLIR provides a standardized set of
nodes are computations which can be placed on various devices,
commonly used types, including arbitrary precision integers,
including specific hardware accelerators.
standard floating point types, and simple common containers—
MLIR is used in TensorFlow to model this internal represen-
tuples, multi-dimensional vectors, and tensors. These types
tation and perform transformations for the use cases presented
a merely a utility and their use is not required, illustrating
in Figure 1: from simple algebraic optimizations to retargeting
parsimony.
Functions and Modules: Similarly to conventional IRs, 3 https://www.c4ml.org.

MLIR is usually structured into functions and modules. 4 http://www.cs.utah.edu/~mhall/mlir4hpc.

7
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
%0 = tf.graph (%arg0 : tensor<f32>, %arg1 : tensor<f32>, is a conditional restricted by affine integer sets. The bodies
%arg2 : !tf.resource) {
// Execution of these operations is asynchronous, the %control return value of loops and conditionals are regions that use affine.load
// can be used to impose extra runtime ordering, for example the assignment
// to the variable %arg2 is ordered after the read explicitly below. and affine.store to restrict indexing to affine forms of
%1, %control = tf.ReadVariableOp(%arg2) surrounding loop iterators. This enables exact affine dependence
: (!tf.resource) -> (tensor<f32>, !tf.control)
%2, %control_1 = tf.Add(%arg0, %1) analysis while avoiding the need to infer affine forms from a
: (tensor<f32>, tensor<f32>) -> (tensor<f32>, !tf.control)
%control_2 = tf.AssignVariableOp(%arg2, %arg0, %control) lossy lower-level representation.
: (!tf.resource, tensor<f32>) -> !tf.control
%3, %control_3 = tf.Add(%2, %arg1)
: (tensor<f32>, tensor<f32>) -> (tensor<f32>, !tf.control) // Affine loops are Ops with regions.
tf.fetch %3, %control_2 : tensor<f32>, !tf.control affine.for %arg0 = 0 to %N {
} // Only loop-invariant values, loop iterators, and affine functions of
// those are allowed.
affine.for %arg1 = 0 to %N {
Fig. 6. SSA representation of a TensorFlow graph in MLIR. // Body of affine for loops obey SSA.
%0 = affine.load %A[%arg0] : memref<? x f32>
// Structured memory reference (memref) type can have
// affine layout maps.
graphs for parallel and distributed execution on data center %1 = affine.load %B[%arg1] : memref<? x f32, (d0)[s0] -> (d0 + s0)>
%2 = mulf %0, %1 : f32
clusters and asynchronous hardware acceleration, from lowering // Affine load/store can have affine expressions as subscripts.
to a representation suitable for mobile deployment to generating %3 = affine.load %C[%arg0 + %arg1] : memref<? x f32>
%4 = addf %3, %2 : f32
efficient native code using domain-specific code generators affine.store %4, %C[%arg0 + %arg1] : memref<? x f32>
}
like XLA [24]. The representation of a TensorFlow graph in }
MLIR is illustrated on Figure 6. It illustrates the modeling
of asynchronous concurrency, where the dataflow graph is Fig. 7. Affine dialect representation of polynomial multiplication
desynchronized via implicit futures and side-effecting Ops C(i+j) += A(i) * B(j).
are serialized through explicit control signals (also following
2) Differences with existing polyhedral: They are numerous:
dataflow semantics). Despite the widely different abstractions,
(1) Rich types: the MLIR structured memory reference type
concurrency, asynchrony, delayed evaluation, MLIR offers the
contains a layout map connecting the index space of the buffer
same infrastructure, analysis and transformation capabilities
to the actual address space. This separation of concerns makes
as for any other dialect or compiler pass. In particular,
loop and data transformations compose better: changes to data
essential graph-level transformations implemented in Grappler 5
layout do not affect the code and do not pollute dependence
are expressible in MLIR for both TensorFlow models and
analysis. Such mixes of transformations have been explored
low level LLVM IR: dead code/node elimination, constant
previously [30] but are uncommon.
folding, canonicalization, loop-invariant code motion, common
(2) Mix of abstractions: Bodies of affine loops in MLIR
subexpression/subgraph elimination, instruction/device-specific-
can be expressed with operations on typed SSA values.
kernel selection, rematerialization, layout optimization; while
Therefore, all traditional compiler analyses and transformations
other transformations may be domain-specific: optimizations
remain applicable and can be interleaved with polyhedral
for mixed precision, op fusion, shape arithmetic.
transformations. On the contrary, polyhedral compilers often
B. Polyhedral Code Generation abstract such details away completely, making it challenging
One of the original motivations for MLIR was the explo- for a polyhedral compiler to manipulate, e.g., vector types.
ration of polyhedral code generation for accelerators. The (3) Smaller representation gap: One of the key features
affine dialect is a simplified polyhedral representation that of the polyhedral model is its ability to represent the order
was designed to enable progressive lowering. While a full of loop iterations in the type system. In this system, a large
exploration of the design points here is out of scope for this number of loop transformations compose directly and can be
paper, we illustrate aspects of the affine dialect to show the reasoned about using simple mathematical abstractions [26].
modeling power of MLIR and contrast the affine dialect with However, polyhedral transformations require raising into a
past representations [25], [26], [27], [28], [29]. representation often drastically different from the original [31],
1) Similarities: The MLIR affine dialect operates on a [32]. Furthermore, the conversion from transformed polyhedra
structured multi-dimensional type for all accesses to memory. to loops is computationally hard [33]. MLIR-based represen-
In the default case, these structured types are injective: different tation maintains high-level loop structure around lower-level
indexings are guaranteed not to alias by construction, a common representation, removing the need for raising.
precondition for polyhedral dependence analyses. (4) Compilation speed is a crucial goal for MLIR as discussed
Affine modeling is split in two parts. Attributes are used in Section V-D, but has not been a focus of most existing
to model affine maps and integer sets at compile-time and polyhedral approaches. These rely heavily on algorithms with
Ops are used to apply affine restrictions to the code. Namely, exponential complexity: on integer linear programming to
affine.for Op is a “for” loop with bounds expressed as derive loop orderings automatically and on polyhedron scanning
affine maps of values required to be invariant in a function. algorithms to convert the representation back to loops. The
Thus loops have static control flow. Similarly, affine.if MLIR approach explicitly does not rely on polyhedron scanning
since loops are preserved in the IR. In addition, code generation
5 https://www.tensorflow.org/guide/graph_optimization may take place ahead-of-time, e.g., when producing generic

8
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
code for dynamic shapes, or just-in-time when specializing A reusable and modular infrastructure makes these specialized
tensor operations on static shapes. The latter puts stricter paths feasible and relatively cheap to build.
constraints on available resources, and both scenarios are Optimizing MLIR Pattern Rewriting: MLIR has an extensi-
important. ble system for pattern rewrites. In addition to statically declared
Experience with the affine dialect shows that first-class affine patterns, we had applications where the rewrite patterns needed
abstractions facilitate the design and implementation of domain- to be dynamically extensible at runtime, allowing hardware
specific code generators, including the linalg dialect,6 and vendors to add new lowerings in drivers. The solution was
declarative rewrite rules in RISE.7 These developments and to express MLIR pattern rewrites as an MLIR dialect itself,
the affine dialect itself represent important explorations that allowing us to use MLIR infrastructure to build and optimize
the MLIR design made practical. efficient Finite State Machine (FSM) matcher and rewriters on
the fly. This work includes FSM optimizations seen in other
C. Fortran IR (FIR)
systems, such as the LLVM SelectionDAG and GlobalISel
The LLVM Fortran frontend “flang” is currently under major instruction selection systems.
development, led by NVIDIA/PGI. Similar to Swift, Rust, and Lattice Regression Compiler: Lattice regression [35] is
others, flang needs a specialized IR in order to support advanced a machine learning technique renowned for fast evaluation
transformations for high-performance Fortran codebase, and is times and interpretability. The predecessor of the compiler
using MLIR to support these Fortran-specific optimizations [34]. was implemented using C++ templates. This allowed for
These high-level optimizations—advanced loop optimizations, high-performance code with metaprogramming, but expressing
array copy elimination, call specialization, devirtualization— general optimizations on the end-to-end models was not
would be hard implement using only LLVM. straightforward. This particular lattice regression system is
For example, FIR is able to model Fortran virtual dispatch used in applications with multiple millions of users and hence
table as a first class concept (see Figure 8). performance improvements are critical.
MLIR was used as the basis for a new compiler for this
// Dispatch table for type(u) specialized area, which was driven by a specialized search
fir.dispatch_table @dtable_type_u {
approach—effectively resulting in a machine learning problem
fir.dt_entry "method", @u_method
}
being solved during compilation. The resultant compiler was
developed by investing a 3 person-month effort, and resulted
func @some_func() { in up to 8× performance improvement on a production model,
%uv = fir.alloca !fir.type<u> : !fir.ref<!fir.type<u>>
while also improving transparency during compilation.
fir.dispatch "method"(%uv) : (!fir.ref<!fir.type<u>>) -> ()
// ...
}
V. C ONSEQUENCES OF THE MLIR D ESIGN
The MLIR design facilitates the modeling of new language
Fig. 8. FIR has first class support for dynamic virtual function dispatch tables. and compiler abstractions while reusing existing, generic ones.
Effectively, the solution to many problems is to “add new
The ability to model the high-level semantics of the pro- ops, new types”, possibly collected into “a new dialect”.
gramming language in a structured IR is very powerful. For This is a significant design shift for compiler engineering.
example, first-class modeling of the dispatch tables allows a It produces new opportunities, challenges, and insights. This
robust devirtualization pass to be implemented. While this could section explores a few of them.
have been implemented with a bespoke compiler IR, the use of
MLIR allowed the flang developers to spend their engineering A. Reusable Compiler Passes
resources focused on the IR design for their domain instead The ability to represent multiple levels of abstraction in
of reimplementing basic infrastructure. one IR incentivizes the passes that operate across these levels.
The choice of MLIR also unlocks the reusability of otherMLIR handles extensibility by inverting the common approach:
dialects that are not specific to Fortran: a language-independent
since there are more Ops than passes, it is easier for Ops
OpenMP dialect could be shared between Fortran and C lan- to know about passes. This also improves modularity as the
guage frontends. Similarly, targeting a heterogeneous platform
dialect-specific logic is implemented within the dialects instead
using OpenACC becomes tractable within MLIR through the of the core transformations. Since the passes rarely need to
sharing and reuse of the GPU-oriented dialects and passes. This
know all aspects of an Op, MLIR relies on the following
is straightforward thanks to MLIR begin specifically designed
mechanisms to implement generic passes.
to support a mix of composable dialects. Operation Traits: Many common “bread and butter”
D. Domain-Specific Compilers compiler passes, such as Dead Code or Common Subexpression
Elimination, rely on simple properties like “is terminator” or
The applications above are within large workflows. But
“is commutative”. We define such properties as Op Traits. An
MLIR also helps building smaller domain specific compilers.
Op exhibits a trait unconditionally, e.g., a “standard branch”
6 https://mlir.llvm.org/docs/Dialects/Linalg. Op is always a terminator. For many passes, it is sufficient
7 https://rise-lang.org/mlir. to know that an Op has a set of traits to operate on it, for

9
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
example by swapping the operands or removing Ops with no framework. This is a simple and useful starting point for new
side effects and no users. transformations, where generalization isn’t required.
Traits can serve as verification hooks allowing to share the C. Mixing Dialects Together
logic across multiple Ops that have the trait. For example,
the “isolated from above” trait verifies that no regions in the One of the most profound (but also most difficult to grok)
Op use values defined in the regions enclosing the Op. It aspects of MLIR is that it allows and encourages mixing
allows for generic processing of functions, modules and other operations from different dialects together into a single program.
self-contained structures. While certain cases of this are reasonably easy to understand
(e.g. holding host and accelerator computation in the same
Interfaces: When the unconditional, static behavior is
module) the most interesting cases occur when dialects are
insufficiently expressive, the processing can be parameterized
directly mixed— because this enables an entire class of reuse
through interfaces, a concept borrowed from object-oriented
that we have not seen in other systems.
programming. An interface defines a view into the behavior of
Consider the affine dialect described in Section IV-B. The
an IR object that abstracts away unnecessary details. Unlike
definition of affine control flow and affine mappings are
traits, interfaces are implemented by IR objects, using arbitrary
independent of the semantics of the operations that are
C++ code that can produce different results for different objects.
contained in affine regions. In our case, we combine the
For example, the “call” Op implements a “call-like” interface,
affine dialect with the “standard” dialect that represents simple
but different instances of the Op call different functions.
arithmetic in a target independent form like LLVM IR, with
MLIR passes can be implemented in terms of interfaces,
multiple target-specific machine instruction dialects for internal
establishing a contract with any Op that opts into being
accelerators. Others have combined it with abstractions from
processed by a pass. Continuing the call-like example, consider
other problem domains.
the MLIR inlining pass that works on TensorFlow graphs, Flang
The ability to reuse generic polyhedral transformations
functions, closures in a functional language etc. Such a pass
(using Op interfaces to get semantics of operations in specific
needs to know: (1) whether it is valid to inline an operation into
transformations) is a powerful (and exciting to us) way of
a given region, and (2) how to handle terminator operations
factoring compiler infrastructure. Another example is that an
that ended up in the middle of a block after inlining.
OpenMP dialect could be used and reused across a wide variety
In order to query an Op about these properties, the pass of source-language IRs.
defines a dedicated interface so that Ops may register their
implementation with MLIR to benefit from inlining. The D. Parallel Compilation
inlining pass will treat conservatively, i.e. ignore, any operation An important aspect of MLIR is the possibility to use multi-
that does not implement the respective interface. core machines to increase the compilation speed. In particular,
Constant folding is implemented through the same mecha- the “isolated from above” trait (Section V-A) allows Ops such
nism: each Op implements the “fold” interface by providing as functions to opt into the concurrent IR traversal mechanism
a function that may produce an attribute holding the value supported by MLIR’s pass manager. Indeed this trait guarantees
if the Op is foldable. More generic canonicalization can that SSA use-def chain cannot cross the region boundaries and
be implemented similarly: an interface populates the list of can be processed in isolation. MLIR also does not feature whole-
canonicalization patterns amenable to pattern-rewriting.This module use-def chains, but instead references global objects
design separates generic logic from Op-specific logic and puts through symbol tables (Section III) and defines constants as
the latter in the Op itself, reducing the well-known maintenance operations with attributes (Section III).
and complexity burden of “InstCombine”, “PeepholeOptimizer”
E. Interoperability
and the likes in LLVM.
Interfaces can be implemented by dialects rather than specific Our work involves interoperation with a large number of
Ops, which enables shared behavior or delegation to the existing systems, e.g., machine learning graphs encoded as
external logic, for example when constant folding TensorFlow protocol buffers, compiler IRs including LLVM IR, proprietary
Ops. Interfaces are also supported on types and attributes, for instruction sets, etc. Often the representation has a number
example an addition operation may support any type that self- of suboptimal or unfortunate decisions that made sense in
declares as “integer-like” with queryable signedness semantics. the context of an existing system, but capabilities of MLIR
enable a more expressive representation. Because importers and
exporters are notoriously difficult to test (test cases are often
B. Dialect-Specific Passes
binary), we want to make sure their complexity is minimized.
Finally, it is valid and useful to define passes that are specific The solution is to define a dialect that corresponds to the
to particular dialects, which can be driven by full semantics of foreign system as directly as possible—allowing round tripping
operations in the dialect(s) they are designed for. These passes to-and-from that format in a simple and predictable way. Once
are just as useful in the MLIR system as they are in other the IR is imported into MLIR, it can be raised and lowered
compiler systems. For example, code generators that want to do to a more convenient IR using all of the MLIR infrastructure,
custom scheduling of machine instructions based on particular which allows those transformations to be tested similarly to
machine constraints or other tricks that do not fit into a broader all the other MLIR passes.

10
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
There are numerous examples of such dialects, including would be an important next step [36], [37]. On the backend
the LLVM dialect which maps LLVM IR into MLIR. This side, MLIR’s DDR has an analogue to LLVM’s instruction
approach has worked well for us, and the MLIR tooling has selection infrastructure, supporting extensible operations with
also been useful to write tests for these foreign file formats. multi-result patterns and specification as constraints [38].
Numerous programming languages and models tackle hard-
F. Unopinionated Design Provides New Challenges ware heterogeneity. Originally a homogeneous programming
While MLIR allows one to define almost arbitrary abstrac- model, OpenMP added support for offloading tasks and parallel
tions, it provides very little guidance on what should be done: regions to accelerators [39], based on earlier proposals such as
what works better or worse in practice? We now have some StarSs and OpenACC [40], [41]. C++ AMP, HCC and SyCL
experience with a number of engineers and researchers applying leverage a conventional Clang/LLVM flow and modern C++ to
the techniques and technologies to new problem domains, provide a high-level abstraction for hardware acceleration [42].
and have realized that the “art” of compiler IR design and Unfortunately, all these examples very quickly lower high-level
abstraction design is not well understood in the compiler and constructs to calls to a runtime execution environment, relying
languages field—many people work within the constraints of on pre-existing optimizations in the host language (typically
established systems, but relatively few have had the opportunity C++) to alleviate the abstraction penalty. Far fewer efforts
define the abstractions themselves. target the heterogeneous compilation process itself. Parallel
This is a challenge, but is also another set of opportunities intermediate representations extending LLVM IR address part
for future research. The broader MLIR community is building of the issue but traditionally focus on the homogeneous setting
expertise with these abstraction design trade-offs, and we expect [7], [8]. The most ambitious effort to date may be Liquid Metal
this to be a fertile area of study over time. [43], with a co-designed Domain Specific Language (DSL)
and compilation flow converting managed object semantics
G. Looking Forward into static, vector or reconfigurable hardware; yet most of the
The design of MLIR is different enough from other compiler effort in its Lime compiler reside in fitting round objects into
infrastructures that we are still learning—even after building square hardware (paraphrasing Kou and Palsberg [44]). MLIR
and applying it to many different systems. We believe that provides a direct embedding for high level languages embracing
there is still a lot to discover, and several years of research heterogeneity through extensible set of operations and types,
will be required to better understand the design points and while providing a common infrastructure for gradually lowering
establish best practices. For example, the rise of out-of-tree these constructs with maximal reuse of common components
dialects, increasing number of source language frontends using across the different targets.
MLIR, possible application to Abstract Syntax Trees, and Tackling language heterogeneity has been a long-term
applications to structured data (like JSON, protocol buffers, etc) promise of metaprogramming systems, and of multistage
which are still very early and are likely to uncover interesting programming in particular. Lightweight Modular Staging
new challenges and opportunities. Better support for just-in- (LMS) [45] is a state of the art framework and runtime
time compilation and precise garbage-collection would also be code generator, providing a library of core components for
interesting, leveraging the modularity and programmability of generating efficient code and embedding DSLs in Scala.
the IR. Delite [46] promises dramatic productivity improvements for
DSL developers, while supporting parallel and heterogeneous
VI. R ELATED W ORK execution. This approach is complementary to MLIR, providing
MLIR is a project that overlaps with multiple different a higher-level of abstraction to embed DSLs and implement
domains. While the composed infrastructure provides a novel optimizations through generic metaprogramming constructs.
system, individual components have analogs in the literature. One step further up into the language syntax, ANTLR [47]
For references and discussion directly related to the IR design is among a class of parser generators that aim to facilitate
itself, please refer to Section II. the development of compiler frontends. MLIR currently does
MLIR is a compiler infrastructure akin to LLVM [1], but not have a general parser generator, no AST construction or
where LLVM has been a great boon to scalar optimizations modeling functionality. Combining MLIR with a system such
and homogeneous compilation, MLIR aims to model a rich as ANTLR could expand reusability upstream all the way to
set of data structures and algorithms as first-class values and frontends and development environments.
operations, including tensor algebra and algorithms, graph More narrowly construed by their application to machine
representations, as well as heterogeneous compilation. MLIR learning, XLA [24], Glow [48] and TVM [49], address similar
allows mix-and-match optimization decomposing compilation heterogeneous compilation objectives. These frameworks pro-
passes into components and redefining lowering, cleanup roles. vide domain-specific code generation instances, starting from
This is largely attributed to the pattern rewriting infrastructure, a graph abstraction and targeting multi-dimensional vector
capturing full-fledged transformations as a composition of abstractions for accelerators. All of these could leverage MLIR
small local patterns and controlling which pattern rewrites are as infrastructure, taking advantage of the common functionality
applied at the granularity of an individual operation. Extending, while using their current code generation strategies. Similarly,
formalizing, and verifying the rewriting logic automatically the loop nest metaprogramming techniques from Halide [50]

11
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
and TVM [49], earlier loop nest metaprogramming [26], [51], C. Description
[52], [53], and automatic flows such as PolyMage [54], Tensor
1) How Delivered: To download MLIR please run
Comprehensions [29], Stripe [55], Diesel [56], Tiramisu [57]
and their underlying polyhedral compilation techniques [25], git clone \
https://github.com/llvm/llvm-project.git
[27], [58], [28] could co-exist as different code generation paths
within an MLIR-based framework. This would greatly increase Instructions for downloading and building MLIR are also
code reuse, defragmentation of the landscape, interoperability available at https://mlir.llvm.org/getting_started.
across domain, and portability. This is actually one of the Additional information is available at mlir.llvm.org.
motivations for the IREE project,8 building on MLIR at multiple 2) Software Dependencies: Downloading MLIR requires git.
levels of abstraction, from tensor algebra and operator graphs Building MLIR requires Ninja (https://ninja-build.org/) and a
down to the low-level orchestration of asynchronous coroutines working C++ toolchain including clang and lld.
and code generation for multiple CPU and GPU architectures
(within the Vulkan/SPIR-V standard). D. Installation
Finally, interoperability formats, such as ONNX [59], have
To build and test MLIR on Linux execute the following
a different approach towards addressing the diversity of ML
commands:
frontends by providing a common set of ops that different
frameworks could map on to. ONNX would be a candidate as mkdir llvm-project/build
cd llvm-project/build
a dialect in MLIR to and from which ops could be converted. cmake -G Ninja ../llvm \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_BUILD_EXAMPLES=ON \
VII. C ONCLUSION AND F UTURE W ORK -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
-DCMAKE_BUILD_TYPE=Release \
We presented MLIR, a concrete answer to the dual scientific -DLLVM_ENABLE_ASSERTIONS=ON \
-DCMAKE_C_COMPILER=clang \
and engineering challenge of designing a flexible and extensible -DCMAKE_CXX_COMPILER=clang++ \
infrastructure for compiler construction, ranging from backend -DLLVM_ENABLE_LLD=ON
cmake --build . --target check-mlir
code generation and orchestration of heterogeneous systems, to
graph-level modeling for machine learning, and to the high-level
language semantics of programming languages and domain- E. Applications
specific frameworks. We demonstrated its applicability to a
MLIR use in TensorFlow can be observed in code lo-
range of domains and discussing research implications.
cated at https://github.com/tensorflow/tensorflow/tree/master/
Motivated by the success of LLVM and looking ahead, we
tensorflow/compiler/mlir/. Tests located in the tensorflow/tests
are eager to see how established communities in programming
subdirectory contain MLIR snippets illustrating TensorFlow
languages and high-performance computing, as well domain
graph representation and transformations. Instructions for
experts can benefit from the introduction of higher level,
building TensorFlow from source are available at https://www.
language-specific IRs. We also believe MLIR catalyzes new
tensorflow.org/install/source.
areas of research, as well as new approaches to teaching the
art of compiler and IR design.
R EFERENCES
ACKNOWLEDGMENTS
[1] C. Lattner and V. Adve, “LLVM: A compilation framework for
This paper and project would not have been possible without lifelong program analysis & transformation,” in Proceedings of
the International Symposium on Code Generation and Optimization:
the contributions of numerous other individuals. We express Feedback-directed and Runtime Optimization, ser. CGO ’04. Washington,
our gratitude to all. We also acknowledge the Google Visiting DC, USA: IEEE Computer Society, 2004, pp. 75–. [Online]. Available:
Researcher Program for supporting the third author at the early http://dl.acm.org/citation.cfm?id=977395.977673
[2] T. Lindholm and F. Yellin, Java Virtual Machine Specification, 2nd ed.
times of MLIR. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
[3] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K.
A PPENDIX Zadeck, “Efficiently computing static single assignment form and
the control dependence graph,” ACM Trans. Program. Lang. Syst.,
A. Abstract vol. 13, no. 4, pp. 451–490, Oct. 1991. [Online]. Available:
The artifact for this paper includes the MLIR system, http://doi.acm.org/10.1145/115372.115320
[4] R. Johnson, D. Pearson, and K. Pingali, “The program structure tree:
instructions on how to download and build it and link to Computing control regions in linear time,” in Proceedings of the ACM
MLIR-related source code in TensorFlow. SIGPLAN 1994 Conference on Programming Language Design and
Implementation, ser. PLDI ’94. New York, NY, USA: ACM, 1994, pp.
B. Artifact Check-List (Meta-Information) 171–185. [Online]. Available: http://doi.acm.org/10.1145/178243.178258
[5] W. A. Havanki, S. Banerjia, and T. M. Conte, “Treegion scheduling
• Program: MLIR for wide issue processors,” in Proceedings of the Fourth International
• Compilation: LLVM C++ toolchain Symposium on High-Performance Computer Architecture, Las Vegas,
• Run-time environment: Recommended Linux Nevada, USA, January 31 - February 4, 1998, 1998, pp. 266–276.
• Publicly available?: Yes [Online]. Available: https://doi.org/10.1109/HPCA.1998.650566
• Archived: DOI 10.5281/zenodo.4283090 [6] G. Ramalingam, “On loops, dominators, and dominance frontiers,” ACM
Trans. Program. Lang. Syst., vol. 24, no. 5, pp. 455–490, 2002. [Online].
8 https://google.github.io/iree. Available: https://doi.org/10.1145/570886.570887

12
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
[7] D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chapman, [24] “XLA - TensorFlow, compiled,” Google Developers Blog, https:
“LLVM parallel intermediate representation: Design and evaluation //developers.googleblog.com/2017/03/xla-tensorflow-compiled.html, Mar
using OpenSHMEM communications,” in Proceedings of the Second 2017. [Online]. Available: https://developers.googleblog.com/2017/03/
Workshop on the LLVM Compiler Infrastructure in HPC, ser. LLVM ’15. xla-tensorflow-compiled.html
New York, NY, USA: ACM, 2015, pp. 2:1–2:8. [Online]. Available: [25] P. Feautrier, “Some efficient solutions to the affine scheduling problem.
http://doi.acm.org/10.1145/2833157.2833158 part II. multidimensional time,” Int. J. Parallel Program., vol. 21, no. 6,
[8] T. B. Schardl, W. S. Moses, and C. E. Leiserson, “Tapir: Embedding fork- pp. 389–420, 1992.
join parallelism into LLVM’s intermediate representation,” SIGPLAN [26] S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler,
Not., vol. 52, no. 8, pp. 249–265, Jan. 2017. [Online]. Available: and O. Temam, “Semi-automatic composition of loop transformations
http://doi.acm.org/10.1145/3155284.3018758 for deep parallelism and memory hierarchies,” Int. J. Parallel
[9] Open64 Developers, “Open64 compiler and tools,” 2001. Program., vol. 34, no. 3, pp. 261–317, Jun. 2006. [Online]. Available:
[10] C. Click and K. D. Cooper, “Combining analyses, combining http://dx.doi.org/10.1007/s10766-006-0012-3
optimizations,” ACM Trans. Program. Lang. Syst., vol. 17, no. 2, pp. [27] S. Verdoolaege, “ISL: An integer set library for the polyhedral
181–196, Mar. 1995. [Online]. Available: http://doi.acm.org/10.1145/ model,” in Proceedings of the Third International Congress Conference
201059.201061 on Mathematical Software, ser. ICMS’10. Berlin, Heidelberg:
[11] A. Pnueli, M. Siegel, and E. Singerman, “Translation validation,” in Springer-Verlag, 2010, pp. 299–302. [Online]. Available: http:
Tools and Algorithms for Construction and Analysis of Systems, 4th //dl.acm.org/citation.cfm?id=1888390.1888455
International Conference, TACAS ’98, Held as Part of the European [28] S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado,
Joint Conferences on the Theory and Practice of Software, ETAPS’98, and F. Catthoor, “Polyhedral parallel code generation for CUDA,” ACM
Lisbon, Portugal, March 28 - April 4, 1998, Proceedings, 1998, pp. Trans. Archit. Code Optim., vol. 9, no. 4, pp. 54:1–54:23, Jan. 2013.
151–166. [Online]. Available: https://doi.org/10.1007/BFb0054170 [Online]. Available: http://doi.acm.org/10.1145/2400682.2400713
[12] G. C. Necula, “Translation validation for an optimizing compiler,” [29] N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito,
SIGPLAN Not., vol. 35, no. 5, pp. 83–94, May 2000. [Online]. Available: W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “The next
http://doi.acm.org/10.1145/358438.349314 700 accelerated layers: From mathematical expressions of network
[13] J. Tristan and X. Leroy, “Formal verification of translation validators: computation graphs to accelerated GPU kernels, automatically,” ACM
a case study on instruction scheduling optimizations,” in Proceedings Trans. Archit. Code Optim., vol. 16, no. 4, pp. 38:1–38:26, Oct. 2019.
of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of [Online]. Available: http://doi.acm.org/10.1145/3355606
Programming Languages, POPL 2008, San Francisco, California, [30] C. Reddy and U. Bondhugula, “Effective automatic computation
USA, January 7-12, 2008, 2008, pp. 17–27. [Online]. Available: placement and data allocation for parallelization of regular programs,”
https://doi.org/10.1145/1328438.1328444 in Proceedings of the 28th ACM International Conference on
[14] ——, “Verified validation of lazy code motion,” in Proceedings of the Supercomputing, ser. ICS ’14. New York, NY, USA: ACM, 2014, pp.
2009 ACM SIGPLAN Conference on Programming Language Design and 13–22. [Online]. Available: http://doi.acm.org/10.1145/2597652.2597673
Implementation, PLDI 2009, Dublin, Ireland, June 15-21, 2009, 2009, pp. [31] T. Grosser, A. Größlinger, and C. Lengauer, “Polly - performing
316–326. [Online]. Available: https://doi.org/10.1145/1542476.1542512 polyhedral optimizations on a low-level intermediate representation,”
[15] Y. Chen, A. Groce, C. Zhang, W. Wong, X. Z. Fern, E. Eide, and Parallel Processing Letters, vol. 22, no. 4, 2012. [Online]. Available:
J. Regehr, “Taming compiler fuzzers,” in ACM SIGPLAN Conference on https://doi.org/10.1142/S0129626412500107
Programming Language Design and Implementation, PLDI ’13, Seattle, [32] L. Chelini, O. Zinenko, T. Grosser, and H. Corporaal, “Declarative loop
WA, USA, June 16-19, 2013, 2013, pp. 197–208. [Online]. Available: tactics for domain-specific optimization,” TACO, vol. 16, no. 4, pp.
https://doi.org/10.1145/2491956.2462173 55:1–55:25, 2020. [Online]. Available: https://doi.org/10.1145/3372266
[16] B. Schommer, C. Cullmann, G. Gebhard, X. Leroy, M. Schmidt, and
[33] C. Bastoul, “Code generation in the polyhedral model is easier than you
S. Wegener, “Embedded Program Annotations for WCET Analysis,” in
think,” in Proceedings of the 13th International Conference on Parallel
WCET 2018: 18th International Workshop on Worst-Case Execution
Architectures and Compilation Techniques, ser. PACT ’04. Washington,
Time Analysis, vol. 63. Barcelona, Spain: Dagstuhl Publishing, Jul.
DC, USA: IEEE Computer Society, 2004, pp. 7–16. [Online]. Available:
2018. [Online]. Available: https://hal.inria.fr/hal-01848686
https://doi.org/10.1109/PACT.2004.11
[17] S. T. Vu, K. Heydemann, A. de Grandmaison, and A. Cohen, “Secure
[34] E. Schweitz, “An MLIR dialect for high-level optimization of fortran,”
delivery of program properties through optimizing compilation,” in ACM
LLVM Developer Meeting, Oct 2019.
SIGPLAN 2020 International Conference on Compiler Construction (CC
2020), San Diego, CA, Feb. 2020. [35] E. Garcia and M. Gupta, “Lattice regression,” in Advances in Neural
[18] G. Balakrishnan and T. Reps, “Wysinwyx: What you see is not Information Processing Systems 22, Y. Bengio, D. Schuurmans,
what you execute,” ACM Trans. Program. Lang. Syst., vol. 32, J. D. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Curran
no. 6, pp. 23:1–23:84, Aug. 2010. [Online]. Available: http: Associates, Inc., 2009, pp. 594–602. [Online]. Available: http:
//doi.acm.org/10.1145/1749608.1749612 //papers.nips.cc/paper/3694-lattice-regression.pdf
[19] “TableGen - LLVM 10 Documentation,” Online, [36] M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser, “Stratego/xt
=https://llvm.org/docs/TableGen/, accessed Nov 22, 2019, 2019. 0.17. A language and toolset for program transformation,” Sci. Comput.
[Online]. Available: https://llvm.org/docs/TableGen/ Program., vol. 72, no. 1-2, pp. 52–70, 2008. [Online]. Available:
[20] A. W. Appel, “SSA is functional programming,” ACM SIGPLAN https://doi.org/10.1016/j.scico.2007.11.003
NOTICES, vol. 33, no. 4, pp. 17–20, 1998. [37] J. Meseguer, “Twenty years of rewriting logic,” in Proceedings of the
[21] C. Click and M. Paleczny, “A simple graph-based intermediate 8th International Conference on Rewriting Logic and Its Applications,
representation,” in Papers from the 1995 ACM SIGPLAN Workshop ser. WRLA’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 15–17.
on Intermediate Representations, ser. IR ’95. New York, NY, USA: [Online]. Available: http://dl.acm.org/citation.cfm?id=1927806.1927809
Association for Computing Machinery, 1995, p. 35–49. [Online]. [38] P. Thier, M. A. Ertl, and A. Krall, “Fast and flexible instruction
Available: https://doi.org/10.1145/202529.202534 selection with constraints,” in Proceedings of the 27th International
[22] A. Veen, “Dataflow machine architecture,” ACM Comput. Surv., vol. 18, Conference on Compiler Construction, ser. CC 2018. New York,
pp. 365–396, 12 1986. NY, USA: ACM, 2018, pp. 93–103. [Online]. Available: http:
[23] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. //doi.acm.org/10.1145/3178372.3179501
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, [39] OpenMP ARB, “The OpenMP API specification for parallel program-
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, ming,” Online, https://www.openmp.org, accessed Feb 19, 2020.
M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, [40] J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, “Hierarchical task-
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, based programming with starss,” IJHPCA, vol. 23, no. 3, pp. 284–299,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, 2009. [Online]. Available: https://doi.org/10.1177/1094342009106195
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, [41] “OpenACC application programming interface,” Online, https://www.
“TensorFlow: Large-scale machine learning on heterogeneous systems,” openacc.org, accessed Feb 19, 2020.
2015, software available from tensorflow.org. [Online]. Available: [42] “SyCL: C++ single-source heterogeneous programming for OpenCL,”
https://www.tensorflow.org/ Online, https://www.khronos.org/sycl, accessed Feb 19, 2020.

13
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.
[43] J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, ACM, vol. 61, no. 1, pp. 106–115, Dec. 2017. [Online]. Available:
and S. Shukla, “A compiler and runtime for heterogeneous computing,” http://doi.acm.org/10.1145/3150211
in Proceedings of the 49th Annual Design Automation Conference, ser. [51] G. Rudy, M. M. Khan, M. Hall, C. Chen, and J. Chame, “A programming
DAC ’12. New York, NY, USA: ACM, 2012, pp. 271–276. [Online]. language interface to describe transformations and code generation,” in
Available: http://doi.acm.org/10.1145/2228360.2228411 Languages and Compilers for Parallel Computing, K. Cooper, J. Mellor-
[44] S. Kou and J. Palsberg, “From oo to fpga: Fitting round objects into Crummey, and V. Sarkar, Eds. Berlin, Heidelberg: Springer Berlin
square hardware?” in Proceedings of the ACM International Conference Heidelberg, 2011, pp. 136–150.
on Object Oriented Programming Systems Languages and Applications, [52] L. Bagnères, O. Zinenko, S. Huot, and C. Bastoul, “Opening polyhedral
ser. OOPSLA ’10. New York, NY, USA: ACM, 2010, pp. 109–124. compiler’s black box,” in Proceedings of the 2016 International Sym-
[Online]. Available: http://doi.acm.org/10.1145/1869459.1869470 posium on Code Generation and Optimization, CGO 2016, Barcelona,
[45] T. Rompf and M. Odersky, “Lightweight modular staging: a pragmatic Spain, March 12-18, 2016, 2016, pp. 128–138.
approach to runtime code generation and compiled dsls,” Commun. [53] A. Cohen, S. Donadio, M.-J. Garzaran, C. Herrmann, O. Kiselyov,
ACM, vol. 55, no. 6, pp. 121–130, 2012. [Online]. Available: and D. Padua, “In search of a program generator to implement
https://doi.org/10.1145/2184319.2184345 generic transformations for high-performance computing,” Sci. Comput.
[46] A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and Program., vol. 62, no. 1, pp. 25–46, Sep. 2006. [Online]. Available:
K. Olukotun, “Delite: A compiler architecture for performance-oriented http://dx.doi.org/10.1016/j.scico.2005.10.013
embedded domain-specific languages,” ACM Trans. Embedded Comput. [54] R. T. Mullapudi, V. Vasista, and U. Bondhugula, “PolyMage: Automatic
Syst., vol. 13, no. 4s, pp. 134:1–134:25, 2014. [Online]. Available: optimization for image processing pipelines,” in International Conference
https://doi.org/10.1145/2584665 on Architectural Support for Programming Languages and Operating
[47] T. J. Parr and R. W. Quong, “Antlr: A predicated-ll(k) parser generator,” Systems (ASPLOS), 2015, pp. 429–443.
Softw. Pract. Exper., vol. 25, no. 7, pp. 789–810, Jul. 1995. [Online].
[55] T. Zerrell and J. Bruestle, “Stripe: Tensor compilation via the
Available: http://dx.doi.org/10.1002/spe.4380250705
nested polyhedral model,” CoRR, vol. abs/1903.06498, 2019. [Online].
[48] N. Rotem, J. Fix, S. Abdulrasool, G. Catron, S. Deng, R. Dzhabarov,
Available: http://arxiv.org/abs/1903.06498
N. Gibson, J. Hegeman, M. Lele, R. Levenstein, J. Montgomery, B. Maher,
S. Nadathur, J. Olesen, J. Park, A. Rakhov, M. Smelyanskiy, and M. Wang, [56] V. Elango, N. Rubin, M. Ravishankar, H. Sandanagobalane, and
“Glow: Graph lowering compiler techniques for neural networks,” 2018. V. Grover, “Diesel: Dsl for linear algebra and neural net computations
[49] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, on gpus,” in Proceedings of the 2nd ACM SIGPLAN International
L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, Workshop on Machine Learning and Programming Languages, ser.
“TVM: An automated end-to-end optimizing compiler for deep MAPL 2018. New York, NY, USA: ACM, 2018, pp. 42–51. [Online].
learning,” in 13th USENIX Symposium on Operating Systems Available: http://doi.acm.org/10.1145/3211346.3211354
Design and Implementation (OSDI 18). Carlsbad, CA: USENIX [57] R. Baghdadi, J. Ray, M. B. Romdhane, E. Del Sozzo, A. Akkas, Y. Zhang,
Association, Oct. 2018, pp. 578–594. [Online]. Available: https: P. Suriana, S. Kamil, and S. Amarasinghe, “Tiramisu: A polyhedral
//www.usenix.org/conference/osdi18/presentation/chen compiler for expressing fast and portable code,” in Proceedings of the
[50] J. Ragan-Kelley, A. Adams, D. Sharlet, C. Barnes, S. Paris, M. Levoy, 2019 IEEE/ACM International Symposium on Code Generation and
S. Amarasinghe, and F. Durand, “Halide: Decoupling algorithms Optimization, ser. CGO 2019. IEEE Press, 2019, p. 193–205.
from schedules for high-performance image processing,” Commun. USA, June 7-13, 2008, 2008, pp. 101–113. [Online]. Available:
[58] U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, https://doi.org/10.1145/1375581.1375595
“A practical automatic polyhedral parallelizer and locality optimizer,” [59] The Linux Foundation, “ONNX: Open neural network exchange,”
in Proceedings of the ACM SIGPLAN 2008 Conference on Online, https://github.com/onnx/onnx, accessed Feb 19, 2020. [Online].
Programming Language Design and Implementation, Tucson, AZ, Available: https://github.com/onnx/onnx

14
Authorized licensed use limited to: University of Waterloo. Downloaded on June 05,2023 at 16:48:00 UTC from IEEE Xplore. Restrictions apply.

You might also like