Compiling ONNX Neural Network Models Using Mlir
Compiling ONNX Neural Network Models Using Mlir
MLIR
Abstract: Deep neural network models are becoming popular and have used in various tasks such as com-
puter vision, speech recognition, and natural language processing. It is often the case that the training phase
of a model is executed in one environment, while the inference phase is executed in another environment.
arXiv:2008.08272v1 [cs.PL] 19 Aug 2020
This is because the optimization characteristics for each phase significantly differ. Therefore, it is critical to
efficiently compile a trained model for inferencing on different environments. To represent neural network
models, users often use Open Neural Network Exchange (ONNX) which is an open standard format for
machine learning interoperability. We are developing a compiler for rewriting a model in ONNX into a stan-
dalone binary that is executable on different target hardwares such as x86 machines, IBM Power Systems, and
IBM System Z. The compiler was written using Multi-level Intermediate Representation (MLIR), a modern
compiler infrastructure. In particular, we introduce two internal representations: ONNX IR for representing
ONNX operators, and Kernel IR for efficiently lowering ONNX operators into LLVM bitcode. In this paper,
we will discuss the overall structure of our compiler and give some practical examples of converting ONNX
operators and models. We also cover several issues related to endianness. Our framework is publicly available
as an open source project under the ONNX project.
1
ject inside the ONNX project*1 . Although it is still under
development, it can already compile some popular models Listing 1: ONNX model for LeakyRelu operator (printed
such MNIST and ResNet50 to native code on x86 machines, using ‘protoc’ command).
IBM Power Systems*2 , and IBM System Z*3 . In this paper, 1 ir_version : 3
2 producer_name : " backend - test "
we will introduce our compiler by 3 graph {
4 node {
• presenting its overall design and architecture of the com- 5 input : " x "
piler, 6 output : " y "
7 op_type : " LeakyRelu "
8 attribute {
• introducing two new IRs: ONNX IR for representing
9 name : " alpha "
ONNX operators, and Kernel IR for efficiently lowering 10 f : 0.1
ONNX operators into LLVM bitcode, 11 type : FLOAT
12 }
13 }
• introducing optimization passes such as graph rewrit-
14 name : " test_leakyrelu "
ing, constant propagation, and memory management, 15 input {
and 16 name : " x "
17 type {
18 tensor_type {
• discussing some problems we encountered when emit- 19 elem_type : 1
ting native code for different architectures. 20 shape {
21 dim {
The remainder of the paper is organized as follows. In 22 dim_value : 3
23 }
Sec. 2, we briefly discuss ONNX and MLIR on which our 24 dim {
compiler is based. In Sec. 3, we introduce our compiler, its 25 dim_value : 4
design principle, and architecture. We also discuss in this 26 }
27 dim {
section two new IRs: ONNX IR and Kernel IR, and some 28 dim_value : 5
optimization passes. In Sec. 4, we present some preliminary 29 }
30 }
experiemental results for MNIST and ResNet50 models on
31 }
IBM Power Systems. Finally, we conclude our paper and 32 }
discuss future work in Sec. 5. 33 }
34 output {
2. Background 35
36
name : " y "
type {
2.1 ONNX 37 tensor_type {
38 elem_type : 1
Open Neural Network Exchange (ONNX) [1] is an open 39 shape {
source format for artificial intelligence models, including 40 dim {
41 dim_value : 3
both deep learning and traditional machine learning. It de- 42 }
fines an extensible computational graph model, operators, 43 dim {
and standard data types, which provides a common IR for 44 dim_value : 4
45 }
different frameworks. There are two ONNX variants: the 46 dim {
neural-network-only ONNX variant recognizes only tensors 47 dim_value : 5
48 }
as input and output types, while the classic machine learning
49 }
ONNX-ML also recognizes sequences and maps. ONNX-ML 50 }
extends the ONNX operator set with machine learning al- 51 }
52 }
gorithms that are not based on neural networks. In this 53 }
paper, we focus on the neural-network-only ONNX variant 54 opset_import {
and refer to it as just ONNX. Supporting ONNX-ML is un- 55 version : 9
56 }
der development in our compiler, thus, we do not discuss it
in this paper.
In ONNX, the top-level structture is a ‘Model’ to asso- topological sort of the list of nodes in the graph. Each node
ciate metadata with a graph. Operators in ONNX are di- in a graph contains the name of the operator it invokes, in-
vided into a set of primitive operators and functions, where puts, outputs, and attributes associated with the operator.
a function is an operator whose calculation can be expressed Inputs and outputs can be marked as variadic or optional.
via a subgraph of other operators. A graph is used to de- There are three data types used to define inputs and out-
scribe a function. There are lists of nodes, inputs, outputs, puts, i.e., ‘Tensor’, ‘Sequence’, and ‘Map’.
and initializers (constant values or default values for inputs) ONNX uses the Protocol Buffers*4 definition language for
in a graph. An acyclic dataflow graph is constructed as a its syntax. Listing 1 shows an example of an ONNX model
*1 https://github.com/onnx/onnx-mlir for the LeakyRelu operator. There is one node in the graph
*2 https://www.ibm.com/it-infrastructure/power/power9
*3 https://www.ibm.com/it-infrastructure/z/hardware *4 https://developers.google.com/protocol-buffers
2
have attributes that store static information. An operation
can hold a region which is a list of blocks. A block contains
a list of operations and ends with a terminator operation
that may have successor blocks to which the control flow
may be transferred. Be said that, nested regions becomes
a first-class concept in MLIR, which is efficient to represent
control flow graphs. A function is an operation with a sin-
gle region and attributes. A module is an operation with a
single region containing a single block and terminated by a
Fig. 1: Operations and Regions in MLIR.
dummy operation.
To develop a compiler using MLIR, users often need to
(Lines 4–13), which is associated with LeakyRelu, and has define dialects and optimization passes. A dialect serves as
one input, one output, and one attribute. The input and an abstraction level or intermediate representation, and an
output tensors have the shape of h3x4x5i and element type optimization pass is to enable optimization at an abstraction
of float32 (elem type: 1 at Lines 19 and 38). level or transformation among abstraction levels.
There are dialects in MLIR that are ready to use, e.g.,
2.2 MLIR ‘llvm’, ‘std’, ‘scf’, and ‘affine’. The ‘llvm’ dialect is a low-
Multi-level Intermediate Representation (MLIR) [5] is a level dialect. It wraps the LLVM IR types and instructions
modern compiler infrastructure, developed by Google, which into MLIR types and operations. The ‘std’ dialect includes
is reusable and extensible. It reduces the cost of building standard operations such as load, store, addi, addf, absf, and
domain-specfic compilers by facilitating the design and im- call. The ‘scf’ dialect defines control flow operations such as
plementation of code generators, translators, and optimizers for and if. The ‘affine’ dialect provides an abstraction for
at different abstraction levels. MLIR is a subproject of the affine operations and analyses.
LLVM project [6] and has many similarities to the LLVM Optimization passes can be roughly classified into three
compiler infrastructure [4]. In this section, we briefly review categories: general transformation, conversion, and dialect-
some of the features in MLIR that were used to build our specific. General transformation passes includes common
compiler. For more information about MLIR, one can re- passes such as ‘canonicalize’ pass for operation canonical-
fer to a previous study [5]. Readers who are familiar with ization, ‘cse’ pass to eliminate common sub-expressions,
MLIR can skip this section. and passes to print IR information such as ‘print-op-graph’,
Similar to LLVM, MLIR is a three-address static single ‘print-op-stats’, and ‘print-cfg-graph’. Conversion passes are
assignment (SSA)-based IR, where values are defined be- to convert operations in one dialect to operations in another
fore use and have a scope defined by their dominance rela- dialect, e.g., ‘convert-std-to-llvm’ pass to convert standard
tions. Operations may produce zero or more results, and operations into LLVM instructions. Finally, dialect-specific
each operation is a distinct SSA value with its own type passess are for transformation in a dialect, e.g., ‘affine-
defined by the type system. The type system in MLIR is loop-unroll-jam’ pass to unroll and jam affine loops in the
open, and one can define application-specific types. There ‘affine’ dialect. MLIR passes can be expressed via declar-
are a number of primitive types, e.g., integers, as well as ative rewriting rules (DRRs) using tablegen records or via
aggregate types for tensors and memory buffers, e.g., ‘Ten- writing code in C++.
sor’ and ‘MemRef’ types. A Tensor type is abstracted To denote an operation in a dialect, we explicitly use a
and does not have a pointer to the data while a Mem- form of dialect name.operation name. For example, std.load
Ref type is a lower representation, referring to a region of means the operation load of dialect ‘std’. Optimization
memory. In MLIR, Tensor and MemRef types are syn- passes are named with prefix ‘--’, for example, --canonicalize
tactically represented as tensorhD1 ×D2 × . . . ×DN ×dtypei is the canonlicalization pass.
and memrefhD1 ×D2 × . . . ×DN ×dtypei, respectively, where Listing 2 shows an example for calculating the exponential
D1 , D2 , . . . , DN are intergers representing the dimensions of of a given input tensor, element-wise, using ‘std’ and ‘affine’
a tensor or memref, and dtype is the type of the elements in dialects. The top level is a module containing a function
a tensor or memref, e.g., f32 for float32. hD1 ×D2 × . . . ×DN i ‘exp’. The function ‘exp’ accepts one input that is of memref
is called the shape of a tensor or memref. Tensor and Mem- type, and produces an output of the same type. The mem-
Ref types can be unranked when their shapes are unknown. ory for the output is allocated via std.alloc (Line 3). There is
In MLIR, unranked Tensor and MemRef types are syntacti- a nested loop (Lines 4–11), iterating over dimensions of the
cally represented as tensorh∗×dtypei and memrefh∗×dtypei, inputs using affine.for, loading each element from the input
respectively. using affine.load (Line 6), computing the exponential using
An operation is the unit of code in MLIR. To define an std.exp (Line 7), and storing the result in the output using
operation, a TableGen-based [7] specification for an opera- affine.store (Line 8). The output of the function is finally
tion descriptor is used. Figure 1 shows the structure of an returned using std.return.
operation. An operation has a list of SSA operands and may
3
Listing 2: Compute the exponential of a tensor in MLIR.
1 module {
2 func @exp ( arg0 : memref <3 x4xf32 >) -> memref <3 x4xf32 > {
3 %1 = std . alloc () : memref <3 x4xf32 >
4 affine . for %arg1 = 0 to 3 {
5 affine . for %arg2 = 0 to 4 {
6 %2 = affine . load %arg0 [ %arg1 , %arg2 ] : memref <3 x4xf32 >
7 %3 = std . exp %2 : f32
8 affine . store %3 , %1 [ %arg1 , %arg2 ] : memref <3 x4xf32 >
9 }
10 }
11 std . return %1 : memref <3 x4xf32 >
12 }
13 }
4
Listing 3: ONNX IR for operation add, generated using importer.
1 module {
2 func @main_graph ( %arg0 : tensor <3 x4x5xf32 > , %arg1 : tensor <3 x4x5xf32 >) -> tensor <* xf32 > {
3 %0 = " onnx . add " ( %arg0 , %arg1 ) : ( tensor <3 x4x5xf32 > , tensor <3 x4x5xf32 >) -> tensor <* xf32 >
4 std . return %0 : tensor <* xf32 >
5 }
6 " onnx . EntryPoint " () { func = @main_graph , numInputs = 2 : i32 , numOutputs = 1 : i32 } : () -> ()
7 }
Listing 4: Kernel IR for operation add, generated by applying passes --shape-inference and --convert-onnx-to-kernel.
1 module {
2 func @main_graph ( %arg0 : memref <3 x4x5xf32 > , %arg1 : memref <3 x4x5xf32 >) -> memref <3 x4x5xf32 > {
3 %0 = alloc () : memref <3 x4x5xf32 >
4 %1 :3 = krnl . define_loops 3
5 krnl . iterate ( %1 #0 , %1 #1 , %1 #2) with ( %1 #0 -> %arg2 = 0 to 3 , %1 #1 -> %arg3 = 0 to 4 , %1 #2 -> ←-
%arg4 = 0 to 5) {
6 %2 = affine . load %arg0 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
7 %3 = affine . load %arg1 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
8 %4 = std . addf %2 , %3 : f32
9 affine . store %4 , %0 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
10 }
11 std . return %0 : memref <3 x4x5xf32 >
12 }
13 " krnl . entry_point " () { func = @main_graph , numInputs = 2 : i32 , numOutputs = 1 : i32 } : () -> ()
14 }
Listing 5: AffineStd IR for operation add, generated by applying the pass --convert-kernel-to-affine.
1 module {
2 func @main_graph ( %arg0 : memref <3 x4x5xf32 > , %arg1 : memref <3 x4x5xf32 >) -> memref <3 x4x5xf32 > {
3 %0 = alloc () : memref <3 x4x5xf32 >
4 affine . for %arg2 = 0 to 3 {
5 affine . for %arg3 = 0 to 4 {
6 affine . for %arg4 = 0 to 5 {
7 %1 = affine . load %arg0 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
8 %2 = affine . load %arg1 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
9 %3 = std . addf %1 , %2 : f32
10 affine . store %3 , %0 [ %arg2 , %arg3 , %arg4 ] : memref <3 x4x5xf32 >
11 }
12 }
13 }
14 std . return %0 : memref <3 x4x5xf32 >
15 }
16 " krnl . entry_point " () { func = @main_graph , numInputs = 2 : i32 , numOutputs = 1 : i32 } : () -> ()
17 }
the operation due to space limitations. level, we still have a Kernel operation, i.e., krnl.entry point.
At ONNX IR, operations are represented similarly to their Such an operation is not related to the main computation
descriptions in ONNX. The ONNX model is converted into and will be directly converted to LLVM IR. Operations in
the function main graph. To generate an entry point func- the ‘affine’ dialect will be converted to operations in the ‘std’
tion into which users feed their inputs, we create a helper and ‘scf’ dialects before being lowered to instructions in the
operation in the ONNX dialect, i.e., onnx.EntryPoint, which ‘llvm’ dialect.
keeps meta-data in the operation’s attributes such as func-
tion name to call and the number of inputs and outputs. 3.2 ONNX IR
At Kernel IR, operation onnx.add is translated into a loop- ONNX IR is the first abstraction level in onnx-mlir and
based computation represented by operations in the ‘Kernel’ represents an ONNX model in MLIR language. We wrote
dialect, where scalar computation is represented by primi- a python script to automatically import ONNX opera-
tive operations in the ‘affine’ and ‘std’ dialects. We can tions into the tablegen-based operation definitions in MLIR.
apply polyhedral optimizations, such as tile, skew, or trans- These imported operations are organized into the ‘onnx’ di-
pose, to loop-based computation. At this level, we allocate alect. Thanks to tablegen, the operation definition in the
memory for output tensors, and memory management can ‘onnx’ dialect is quite similar to the operation description in
be performed. ONNX, where we are able to represent all necessary infor-
At AffineStd IR, optimized loop-based computation in the mation, such as inputs, outputs, attributes, and description,
‘Kernel’ dialect is translated into affine.for loops. At this into a single tablegen-based definition in human-readable
5
Listing 6: Tablegen-based definition for operation relu.
1 def O NN XL ea k yR el u Op : ONNX_Op < " LeakyRelu " ,
2 [ NoSideEffect , DeclareOpInterfaceMethods < ShapeInferenceOpInterface >] > {
3 let summary = " ONNX LeakyRelu operation " ;
4 let description = [{ " LeakyRelu takes ... " }] ;
5 let arguments = ( ins AnyTypeOf <[ TensorOf <[ F16 ] > , TensorOf <[ F32 ] > , TensorOf <[ F64 ] >] >: $ X , ←-
DefaultValuedAttr < F32Attr , " 0.01 " >: $ alpha ) ;
6 let results = ( outs AnyTypeOf <[ TensorOf <[ F16 ] > , TensorOf <[ F32 ] > , TenorOf <[ F64 ] >] >: $ Y ) ;
7 let e x t r a C l a s s D e c l a r a t i o n = [{ ... }] ;
8 }
6
many optimizations can be expressed easily via Declarative 3 ( GemmOp $ m1 , $ m2 , $ m3 ) ,
Rewriting Rules (DRRs) using tablegen records or writing 4 [( HasOneUse $ res ) ]
5 >;
code in C++.
3.4.1 Operation Decomposition Another example is to remove an IdentityOp operation by
In ONNX, many operations can be expressed using other passing its input directly to its consuming operations.
basic operations. For example, ReduceL1 over a vector x 1 def I d e n t i t y E l i m i n a t i o n P a t t e r n : Pat <
2 ( ONNXIdentityOp $ arg ) ,
is mathematically calculated by summing up the absolute
3 ( replaceWithValue $ arg )
values of the elements in x. In other words, we have 4 >;
ReduceL1 = ReduceSum (Abs x) Users can write as many rewriting rules as possible in the
same manner.
We only need to lower a subset of operations in the ‘onnx’ 3.4.4 Constant propagation
dialect to ‘kernel’ dialect, while the remaining operations in Constant propagation is a well-known optimization in
the ‘onnx’ dialect will be decomposed into operations in the compilers. In onnx-mlir, we created a pass to do this during
subset. compilation. There are two key ideas in constant propaga-
Using the DRRs in MLIR, operation decomposition is con- tion: ( 1 ) if all the inputs of an operation are constant, com-
cisely written as the following pattern: pute its outputs during compilcation and remove the opera-
1 def R ed uc eL 1 Pa tt e rn : Pat < tion, ( 2 ) if there is a mix of constant and non-constant in-
2 ( ReduceL1Op $ x , $ axes , $ keepdims ) , puts, normalize the operation. Normalization is to increase
3 ( ReduceSumOp ( AbsOp $ x ) , $ axes , $ keepdims )
4 >;
the possibility of constant propagation and strongly depends
on the mathematical properties of an operation. Below are
where ReduceL1Op, ReduceSumOp, and AbsOp are some normalization rules in onnx-mlir for the onnx.add op-
programmable forms of operations onnx.ReduceL1, eration whose properties are associative and communicative.
onnx.ReduceSum, and onnx.Abs respectively. Vari-
ables x, axes, and keepdims are for keeping input values (1) c + x ⇒ x + c
of operation ReduceL1Op. The pattern ‘ReduceL1Pattern’
( 2 ) (x + c1 ) + c2 ⇒ x + (c1 + c2 )
contains a source pattern to match a graph of one operation
ReduceL1Op (Line 2) and a destination pattern to generate ( 3 ) (x + c) + y ⇒ (x + y) + c
a graph of two operations ReduceSumOp and AbsOp
(Line 3). Whenever an operation ReduceL1Op appears in ( 4 ) x + (y + c) ⇒ (x + y) + c
an ONNX model, it will be replaced with a combination of ( 5 ) (x + c1 ) + (y + c2 ) ⇒ (x + y) + (c1 + c2 )
ReduceSumOp and AbsOp.
3.4.2 Shape Inference where x and y are non-constant values, and c, c1 , and c2 are
The --shape-inference pass attempts to infer shapes for all constant values. Normalization rules are expressed by using
tensors in a program at ONNX IR. The pass traverses all the DRRs in MLIR.
operations in a program, infers the shapes of tensors with 3.4.5 Memory management
unrank shapes (i.e. tensorh∗xf32i), propagates the ranked This pass is under development. The central idea is cre-
shapes to consuming operations, and terminates once all ating a memory pool to efficiently manage memory usage in
tensors have ranked shapes. For one operation, if its inputs a program by statically analyzing memory allocations and
have static shapes, it is likely that the --shape-inference pass deallocations. With the current version of onnx-mlir, mem-
will be able to infer static shapes for its outputs. If the ory pool is simply creating a single memory area for tensors
inputs have dynamic shapes (e.g. tensorh?x?x?xf32i), the in a model. The mechanism for memory reuse has not yet
outputs will also have dynamic shapes also, except for some been implemented.
operations whose output tensors’ shapes are specified in the
4. Preliminary Experiments
operation attributes.
3.4.3 Graph Rewriting 4.1 ONNX operation support and testcases
Graph rewriting is a powerful optimization tool. It is ONNX provides a set of test cases for each operation.
intensively applied to neural networks since calculation in When we support any operation in onnx-mlir, we enable its
a neural network is expressed via a dataflow graph. In ONNX test cases to check whether the operation behaves
MLIR, graph rewriting rules are conveniently represented correctly and produces correct result. At the time of writing
using DRRs. this paper, onnx-mlir supports 51 operations out of 139 op-
For example, the following rule is to fuse onnx.add and erations in ONNX, including important operations such as
onnx.MatMul into a single operation onnx.Gemm under the convolution, pooling, Gemm, and LSTM. These are enough
condition that the result of MatMulOp is only consumed by to compile and execute major networks such as MNIST and
AddOp: ResNet50. On the GitHub repository of onnx-mlir, we en-
1 def M u l A d d T o G e m m P a t t e r n : Pat < able continuous integration on different environments, i.e.,
2 ( AddOp ( MatMulOp : $ res $ m1 , $ m2 ) , $ m3 ) , Windows, Linux, and Docker environments, and different
7
systems, i.e., x86 machines, IBM Power Systems, and Sys- open source softwares such as ONNX and MLIR, we found a
tem Z. All supported operations have passed tests on the problem related to supporting different systems. In particu-
above environments. lar, we could not run ONNX models on Linux on IBM Sys-
tem Z (s390-linux) because the big-endian format was not
4.2 MNIST and ResNet50 well-supported in ONNX and MLIR. There are two reasons
In this section, we present some of our preliminary results for such a problem. First, a large amount of public input
for two neural network models in the ONNX Model Zoo: data and models in ONNX are stored in little-endian format.
MNIST and ResNet50 [2]. The MNIST*5 and ResNet50*6 Hence, they must be converted to big-endian format before
models have already been trained in the CNTK and Caffe2 they are used in a big-endian system. Second, we found that
frameworks, respectively. We ran inferences on the given constant values in ONNX models are not correctly loaded in
test data set in each model. The experiments were con- MLIR. LLVM was well-supported in big-endian, but MLIR
ducted on a machine with 2.3-GHz POWER9 processors. was not. We created two patches to solve this problem: one
For onnx-mlir, graph rewriting and canonicalization passes in ONNX*7 and one in MLIR*8 , and they are now available
were enabled. Polyheral optimizations were turned off since at the master branches of ONNX and MLIR. As a result,
they are under development and are not matured. Memory onnx-mlir now supports Linux on x86 (x86-Linux), Linux
pool was applied to create a single memory area for all nec- on Power Systems (ppc64le-Linux), Linux on IBM Z (s390-
cessary tensors in a model, but there was no mechanism for Linux), and Windows.
memory reuse. Under the above conditions, results shown
here are not suitable for using as reference for performance
5. Conclusion
comparision. We are developing an open source compiler called onnx-
mlirfor compiling ONNX models into native code. MLIR
Table 1: Run inferencing with MNIST and ResNet50 on a
POWER9 machine. Time in seconds. was used as an infrastructure to build the compiler, and
two novel IRs were introduced, i.e., ONNX IR and Ker-
Model Compilation time Inference time
nel IR. We also discussed some optimizations such as graph
MNIST 0.237 0.001
rewriting and constant propagation. It is worth noting that
ResNet50 7.661 7.540
new optimizations can be easily integrated into onnx-mlir
thanks to the MLIR infrastructure. In the future, we will
Table 1 shows the running times for the MNIST and
add more optimizations, e.g., polyhedral optimization, loop
ResNet50 models when doing inferencing. For each model,
fusion, SIMD optimization, and enable code generation for
we measured the compilation time for compiling the model
accelerators.
to native code and inference time for running the native
code with real inputs. MNIST is a small model with two
References
convolutional operations, one max pooling operation and a
[1] Bai, J., Lu, F., Zhang, K. et al.: ONNX: Open Neu-
matrix multiplication followed by an element-wise addition. ral Network Exchange, GitHub (online), available from
Compiling the MNIST model and carrying out inferencing hhttps://github.com/onnx/onnxi (accessed 2020-07-01).
[2] He, K., Zhang, X., Ren, S. and Sun, J.: Deep Residual
was rather fast, i.e., finished in less than one second. In the Learning for Image Recognition, CoRR, Vol. abs/1512.03385
MNIST model, the graph rewriting rule MulAddToGemm- (online), available from hhttp://arxiv.org/abs/1512.03385i
(2015).
Pattern mentioned in Sec. 3.4.3 was applied to fuse matrix [3] Krizhevsky, A., Sutskever, I. and Hinton, G. E.: ImageNet
multiplication and element-wise addition into a Gemm op- Classification with Deep Convolutional Neural Networks, In-
ternational Conference on Neural Information Processing
eration. ResNet50 is a complex deep model consisting of 50 Systems (NIPS), pp. 1097–1105 (2012).
layers of operations such as convolutions and poolings. The [4] Lattner, C. and Adve, V.: LLVM: A Compilation Framework
model is about 100 megabytes including learned weights. for Lifelong Program Analysis and Transformation, San Jose,
CA, USA, pp. 75–88 (2004).
For ResNet50, the current version of onnx-mlir does not [5] Lattner, C., Amini, M., Bondhugula, U., Cohen, A.,
have any optimization applied to the model during com- Davis, A., Pienaar, J., Riddle, R., Shpeisman, T., Vasi-
lache, N. and Zinenko, O.: MLIR: A Compiler Infrastruc-
pilation. However, we believe that the compilation time ture for the End of Moore’s Law, (online), available from
looks reasonable and the inference time is not so slow. We hhttp://arxiv.org/abs/2002.11054i (2020).
[6] LLVM: The LLVM Project, LLVM (online), available from
hope that once we integrate important optimizations, such hhttps://github.com/llvm/llvm-projecti (accessed 2020-07-
as polyhedral optimizations, SIMD optimization, and loop 01).
[7] LLVM: TableGen, LLVM (online), available from
fusion in near future, the inference time will be significantly hhttps://llvm.org/docs/TableGen/i (accessed 2020-07-01).
reduced. [8] Pouchet, L.-N., Bastoul, C., Cohen, A. and Cavazos, J.: Iter-
ative optimization in the polyhedral model: Part II, multidi-
mensional time, ACM SIGPLAN Notices, Vol. 43, No. 6, pp.
4.3 Supported Systems 90–100 (2008).
Although onnx-mlir is completely built upon widely-used
*5 https://github.com/onnx/models/tree/master/vision/
classification/mnist
*6 https://github.com/onnx/models/tree/master/vision/ *7 https://github.com/onnx/onnx/pull/2633
classification/resnet *8 https://reviews.llvm.org/D78076