How to inline a C-function during code generation ?

Hi

I started hacking on clang itself and I have some questions, that I couldn't figure out with the help of the source code or the documentation.

Looking at the IR code generated by -O2 and -O0 it seems, that function inlining is done in clang already. But I haven't figured out, how that is actually done.

What I want to do is - during code generation - something to the effect of calling "GetOrCreateLLVMFunction" and then doing an "EmitCall", but getting the function body inlined instead of a regular call. The function is a static inline C function, it has a known name and known argument/return types, and it would be an error, if that function is not available at that point (or maybe even not declared static inline). GetOrCreateLLVMFunction and EmitCall does not seem to do the work for me, as I hoped. The function is never inlined, regardless of optimization level. The same function is inlined by the compiler itself, with no problem, within the source code itself.

So how can I do that ?

Ciao
   Nat!

Hi

I started hacking on clang itself and I have some questions, that I
couldn't figure out with the help of the source code or the documentation.

Looking at the IR code generated by -O2 and -O0 it seems, that function
inlining is done in clang already. But I haven't figured out, how that is
actually done.

Here's some useful terminology for this discussion:

IRGen: Clang's generation of LLVM IR from Clang ASTs
CodeGen: LLVM's generation of machine code/assembly from LLVM IR
Clang: The Clang frontend, down to the point that it generates and passes
LLVM IR to the LLVM libraries.
LLVM: Everything else. (the middle and backend)

Clang does not do inlining.
LLVM does inlining.
Clang produces code that can be/is encouraged to be/is required to be
inlined by LLVM.

For example, if the user writes source code with the always_inline (or is
it "alwaysinline", I forget) attribute, then Clang produces LLVM IR with
the matching attribute and LLVM obeys the request (where possibly) and
inlines all calls to that function during LLVM's optimization passes.

Hi

I started hacking on clang itself and I have some questions, that I couldn't figure out with the help of the source code or the documentation.

Looking at the IR code generated by -O2 and -O0 it seems, that function inlining is done in clang already. But I haven't figured out, how that is actually done.

Here's some useful terminology for this discussion:

IRGen: Clang's generation of LLVM IR from Clang ASTs

CodeGen: LLVM's generation of machine code/assembly from LLVM IR

OK, clang unfortunately calls IR generation also code generation, which makes it a bit confusing for the newcomer.

Clang: The Clang frontend, down to the point that it generates and passes LLVM IR to the LLVM libraries.
LLVM: Everything else. (the middle and backend)

Clang does not do inlining.
LLVM does inlining.
Clang produces code that can be/is encouraged to be/is required to be inlined by LLVM.

For example, if the user writes source code with the always_inline (or is it "alwaysinline", I forget) attribute, then Clang produces LLVM IR with the matching attribute and LLVM obeys the request (where possibly) and inlines all calls to that function during LLVM's optimization passes.

My mental model of what was happening was:

<file.c> > cc1 > <file.ir> and then llc < file.ir > file.o

and I expected all optimizations to be done after cc1 is complete. This is not the case. The inlining is done in the cc1 stage, during EmitBackendOutput. So that what you would call the "middle" and the code that does the inlining is linked in from the llvm project. The ir that I got from -cc1 -emit-llvm was therefore not created by "pure" clang as I expected but a mix of clang and llvm.

I hope I can get substitute my debugging clang by doing cc1 -emit-llvm -O0 and then llc these with optimization on, as it would be a lot easier to debug, why my code does not get inlined.

Thanks for the helpful reply!

Ciao
   Nat!

>
>
>
> Hi
>
> I started hacking on clang itself and I have some questions, that I
couldn't figure out with the help of the source code or the documentation.
>
> Looking at the IR code generated by -O2 and -O0 it seems, that function
inlining is done in clang already. But I haven't figured out, how that is
actually done.
>
> Here's some useful terminology for this discussion:
>
> IRGen: Clang's generation of LLVM IR from Clang ASTs

> CodeGen: LLVM's generation of machine code/assembly from LLVM IR
OK, clang unfortunately calls IR generation also code generation, which
makes it a bit confusing for the newcomer.

Yep, one day we'll rename it.

> Clang: The Clang frontend, down to the point that it generates and
passes LLVM IR to the LLVM libraries.
> LLVM: Everything else. (the middle and backend)
>
> Clang does not do inlining.
> LLVM does inlining.
> Clang produces code that can be/is encouraged to be/is required to be
inlined by LLVM.
>
> For example, if the user writes source code with the always_inline (or
is it "alwaysinline", I forget) attribute, then Clang produces LLVM IR with
the matching attribute and LLVM obeys the request (where possibly) and
inlines all calls to that function during LLVM's optimization passes.

My mental model of what was happening was:

<file.c> > cc1 > <file.ir> and then llc < file.ir > file.o

and I expected all optimizations to be done after cc1 is complete. This is
not the case. The inlining is done in the cc1 stage, during
EmitBackendOutput. So that what you would call the "middle" and the code
that does the inlining is linked in from the llvm project.

That's sometimes called the "middle end" or "target independent
optimizations" or "IR optimizers/optimizations", etc.

The ir that I got from -cc1 -emit-llvm was therefore not created by "pure"
clang as I expected but a mix of clang and llvm.

Right. If you want literally the IR that Clang produced (& was going to
give to LLVM) you need to pass -mllvm -disable-llvm-optzns. Without that,
even at -O0, you'll get some LLVM optimizations running, including the
AlwaysInliner (the version of teh inliner responsible for inlining
always_inline functions - since that has to be done even at -O0).

>
>
>
> Hi
>
> I started hacking on clang itself and I have some questions, that I
couldn't figure out with the help of the source code or the documentation.
>
> Looking at the IR code generated by -O2 and -O0 it seems, that function
inlining is done in clang already. But I haven't figured out, how that is
actually done.
>
> Here's some useful terminology for this discussion:
>
> IRGen: Clang's generation of LLVM IR from Clang ASTs

> CodeGen: LLVM's generation of machine code/assembly from LLVM IR
OK, clang unfortunately calls IR generation also code generation, which
makes it a bit confusing for the newcomer.

Yep, one day we'll rename it.

> Clang: The Clang frontend, down to the point that it generates and
passes LLVM IR to the LLVM libraries.
> LLVM: Everything else. (the middle and backend)
>
> Clang does not do inlining.
> LLVM does inlining.
> Clang produces code that can be/is encouraged to be/is required to be
inlined by LLVM.
>
> For example, if the user writes source code with the always_inline (or
is it "alwaysinline", I forget) attribute, then Clang produces LLVM IR with
the matching attribute and LLVM obeys the request (where possibly) and
inlines all calls to that function during LLVM's optimization passes.

My mental model of what was happening was:

<file.c> > cc1 > <file.ir> and then llc < file.ir > file.o

and I expected all optimizations to be done after cc1 is complete. This
is not the case. The inlining is done in the cc1 stage, during
EmitBackendOutput. So that what you would call the "middle" and the code
that does the inlining is linked in from the llvm project.

That's sometimes called the "middle end" or "target independent
optimizations" or "IR optimizers/optimizations", etc.

The ir that I got from -cc1 -emit-llvm was therefore not created by
"pure" clang as I expected but a mix of clang and llvm.

Right. If you want literally the IR that Clang produced (& was going to
give to LLVM) you need to pass -mllvm -disable-llvm-optzns. Without that,
even at -O0, you'll get some LLVM optimizations running, including the
AlwaysInliner (the version of teh inliner responsible for inlining
always_inline functions - since that has to be done even at -O0).

I hope I can get substitute my debugging clang by doing cc1 -emit-llvm
-O0 and then llc these with optimization on, as it would be a lot easier to
debug, why my code does not get inlined.

& I'd suggest passing the .ll/.bc back to clang, rather than llc - as llc
and Clang don't necessarily use precisely the same optimization pipeline.

You can add -print-after-all to get llc to show what steps it does to transform the IR into the machine-code.

Also, the whole idea behind LLVM is that the frontend can be quite simple [relatively speaking], and as long as it’s not insanely pessimising the code, the resulting machine-code will be good. I have written a Pascal compiler that, for the benchmarks I’ve done so far, beats the Free Pascal compiler - without any effort to write optimisation code on my side.

OK, I found out that llc doesn't do the inlining either. I was scratching my head why llc -O2 test.ir refused to inline my code. As it turns out, the optimization is done by a different program called "opt".

So for anyone else interested (and for myself looking this up in a year or so :))

  clang -S -emit-llvm -mllvm -disable-llvm-optzns -o test.ir test.c

produces the unoptimized readable IR file

  opt -O2 -o test.opt.bc test.ir

then creates the optimized binary IR, which then can be assembled into an .s file with

  llc -o test.opt.s test.opt.cb

or disassembled into a readable IR again

  llvm-dis -o test.opt.ir test.opt.cb

I am getting closer to the reason, why my code does not get inlined, but I am not there yet. I can put the following through opt -O2 and it does not inline (though "inline_call " is even marked as alwaysinline):

Looks like you are calling a function with no prototype or with the wrong prototype - which is why it’s being bitcast.

32 (i8*, i64, i8*)* is
int (*)(void *, int64_t, char *)

[or void * instead of char * - impossible to tell apart in LLVM IR, but I’m guessing based on str in your inline function] in C.

And your function is declared as
i8* (i8*, i64, i8*)*
so
char* (*)(char *, int64_t, char *)

Do you actually want to return an i32 as a pointer?

Maybe you shoudl declare foo to return a pointer to char (or void).

The IR was the result of me trying to whittle the output down to something tiny. The code by itself makes not much sense :wink:
But my intent is actually to return a void * and then return that casted to int.

In similiar circumstances "clang/opt" has no problems to inline that;

OK, I found out that llc doesn't do the inlining either. I was scratching
my head why llc -O2 test.ir refused to inline my code. As it turns out,
the optimization is done by a different program called "opt".

Well, yes and no. These utilities are wrappers around specific libraries,
but clang actually calls into all those libraries directly too - the
separate utilities are just conveniences for developers of Clang/LLVM, etc.
But, yes, clang + opt + llc is one way to break things down for testing.
But remember, as I said previously, the specific optimization pipeline used
by Clang isn't necessarily teh same one used by opt, etc. So if you want to
test what's actually running, I'd suggest you test with Clang directly.

So for anyone else interested (and for myself looking this up in a year or
so :))

        clang -S -emit-llvm -mllvm -disable-llvm-optzns -o test.ir test.c

produces the unoptimized readable IR file

        opt -O2 -o test.opt.bc test.ir

then creates the optimized binary IR, which then can be assembled into an
.s file with

        llc -o test.opt.s test.opt.cb

or disassembled into a readable IR again

        llvm-dis -o test.opt.ir test.opt.cb

I am getting closer to the reason, why my code does not get inlined, but I
am not there yet. I can put the following through opt -O2 and it does not
inline (though "inline_call " is even marked as alwaysinline):

I don't know the specific reason, but you could break into a debugger in
opt when the AlwaysInline pass is running and see walk through it to see
why it's failing to inline.