Hi, I spoke with you briefly after your talk at Google; I've since been playing around with LLVM quite a bit and started working on a front end for my experimental programming language, Tart.
Cool, welcome
A couple of questions I wanted to ask (although feel free to answer by saying "go ask on the mailing list".)
The questions you have may be interesting to others, so I'm cc'ing the llvm dev list.
I notice in clang that you define your own set of classes for representing Types, independent of the Type class in LLVM. I'm trying to decide if I should do the same - is it generally recommended that front ends use the LLVM type classes directly during parsing and analysis, or do front ends generally define their own data structures and the convert to LLVM form when they communicate with the back end?
I strongly recommend defining your own type system. The LLVM type system has been built to do one thing well: analyze and optimize programs in a low-level language independent form. You will not get much benefit from extending them, and you'd have to deal with a lot of complexity. I'd strongly suggest defining your own type system, then lowering your front-end types to the LLVM types when doing the "code generation" stage.
There are some concepts in my language which I can't quite grok how to represent in terms of LLVM types.
For example, one of the type modifiers is the 'nullable' modifier. Normally references to objects aren't allowed to be NULL, however in this language you can put a question mark after a type name to indicate that this particular reference can be NULL. (Essentially it declares a disjoint type between the object and the NULL type.) Mixing nullable and non-nullable types produces a warning unless the compiler can deduce that the value is indeed non-null.
What I'd like to know is, is it possible to subclass or otherwise customize an LLVM type to maintain this information, or should I instead create a parallel type hierarchy that can store information like this?
One common question is "how do I expose certain high level information to the optimizer". Unfortunately this answer has many different possible answers depending on the specific information and the constraints on the problem. In LLVM, tend to try to avoid encoding information into the type system, preferring instead to encode it into the operations. In any case I'd suggest working on getting your language up and correct first, we can always worry about extending llvm as needed in the future.
Another question is how to handle builtin functions. In this language, the various operators, such as '+', are just synonyms for builtin functions such as operator.add, which are overloaded by argument type. Generally, one would want to insure that these functions are nearly always inlined. The question is, should the front-end be responsible for the inlining, or should it rely on LLVM to do it?
There are actually multiple ways of doing this. I suspect that the LLVM inliner will have no problem inlining these operations, and defining them as function bodies with a standard C interface has a lot of advantages. A middle ground is to go the route of relying on the LLVM inliner to handle these, but then explicitly call the inliner on each one you want to ensure gets inlined. This way you get the convenience of dealing with calls in your front-end, but don't have to worry about the whims of the inliner (which could be significant when optimization is disabled for example).
-Chris