I'm currently looking at LLVM as a possible back-end to a dynamic
programming system (in the tradition of Smalltalk) we are developing.
Neat!
I have read most of the llvmdev archives, and I'm aware that some
things are 'planned' but not implemented yet. We are willing to
contribute the code we'll need for our project, but before I can start
coding I'll have to submit to my boss a very concrete proposal on
which changes I'll make and how long they're going to take.
Sounds good!
1. Opcodes and intrinsics
Which are the differences between opcodes and intrinsics? How is it
determined, for an operation, to implement it as an opcode or as an
intrinsic function?
Opcodes are such things that are usually implemented in one or just a
few machine instructions, i.e. things that have a direct correlation to
a machine ISA, such as 'add', 'mul', 'div', etc.
Intrinsics are such things that provide a more complex interaction with
the processor (or the language runtime), but the implementation would be
so different from one platform to the next, that it will be more than
just a "few instructions". An example of such would be the llvm.gc*
intrinsics that you mention, which are for garbage collected languages.
Other intrinsics include debugging support, etc.
As I understand it, compilation passes can both lower intrinsics into
opcodes and also replace opcode sequences, so in the end some of them
are interchangeable.
That's not really correct. The intrinsics such as llvm.frameaddress and
llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of
the intrinsics is specifically machine-dependent, and LLVM (and its
opcodes) are machine-independent, so there is no valid interchange of
these intrinsics with any combination of LLVM opcodes.
The llvm.memcpy intrinsic can be lowered onto an explicit LLVM loop, but
the memcpy intrinsic provides a succinct representation of memory copy
and so stays as such. Plus, it is harder to do the reverse analysis --
prove that a loop is essentially a memcpy().
For example, why is there a store opcode and a llvm_gcwrite intrinsic?
The 'store' opcode is for generic writes, the gcwrite is specifically
for garbage collection. I think someone else would be able to give you
more information about the GC intrinsics than I am, however.
Couldn't the front-end just produce stores/volatile stores and then a
compilation pass transform them into a write-barrier if necessary?
Again, see above: gcwrite is not a "generic store" in the way that the
'store' opcode is, nor is it a barrier.
A possible view of intrinsics could be "operations that don't depend
on the target architecture, but instead on the language runtime". But
then wouldn't malloc/free be intrinsics?
Good question. Due to the amount of pointer/data analysis in LLVM, it
is often necessary to consider memory allocation instructions to see
where memory is allocated and deallocated. As such, the malloc/free
instructions provide that higher-level construct that make it easier to
manipulate memory-operating instructions since that is done so often.
Consider: for an llvm.malloc intrinsic, one would have to say something
like this every time we wanted to analyze memory usage:
if (CallInst *CI = dyn_cast<CallInst>(I))
if (CI->getCalledFunction()->getName() == "malloc") {
// ...
}
whereas with the malloc instruction raising/lowering pass, you would say
if (MallocInst *MI = dyn_cast<MallocInst>(I)) {
// ...
}
Given the prevalence of such code, this makes it more efficient and
cleaner to write the second way. Note that LLVM will work just fine if
you use direct calls to malloc() and never create malloc instructions.
2. Stack and registers
As the LLVM instruction set has a potentially infinite number of
registers which are mapped onto target registers or the stack by the
register allocator, why is there a separate stack?
The stack is used for two things: storing structures allocated with
alloca() or automatic variables declared to be function-local. The
spilling of the registers is done automatically (transparently), but the
allocation of DATA is explicit in LLVM to analyze the pointer aliases.
I would understand it, if the stack was more accessible, as a way to
implement closures, but it has been repeated here that the correct way
to do that is to use heap-allocated structures, as functions can't
access other functions' stacks. Is it to signal locations that need to
be changed in-place?
I'm sorry, I don't quite understand this question.
3. Control transfer
Why are the control transfer operations so high level when compared to
actual processors?
Because LLVM is target-independent. It is the job of the backend code
generator (see llvm/lib/Target/*) to convert high-level control transfer
"call" instruction (see below) to the stack-pop/push or register-passing
method of the target. These details are largely different from one
machine to the next, and have no relevance to LLVM analyses.
Usually processors have instructions to jump to a concrete location
and everything else is managed by the compiler (saving into the stack,
getting result parameters, etc.) depending on the language's calling
conventions.
LLVM has the 'call' instruction that abstracts the target machine's
calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call
for more information.
In LLVM there's just one way to transfer control, and
the only proposal I've seen
(http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps
this high level.
This is to have more options for calling conventions that may be
specified for a target that are non-standard. Mostly these are
optimizations (i.e., some registers do not need to be saved if calling a
particular library routine, etc.).
What are the difficulties in having low level transfer control
operations, with explicitly managed arguments, saving registers, etc?
Choosing any lower-level mechanism will make LLVM no longer be
target-independent. Choosing to have the union of all machines is
nearly impossible and is overly complicated and unnecessary for any
LLVM-to-LLVM analyses and transformations.
What is that you are looking to express that isn't captured by the
`call' instruction?
Hope that helps,