Pertti Kellomäki wrote:
I managed to compile newlib with llvm-gcc yesterday. That
is, the machine independent part is now basically done, and
the syscall part contains no-op stubs provided by libgloss.
I haven't tested the port yet, but since newlib has already
been ported to many architectures, I would be pretty surprised
if there were any major problems.
A couple of things I noticed when configuring newlib for LLVM.
First, I did not find any preprocessor symbols that I could use
to identify that we are compiling to LLVM byte code. If there is
one, I'd be happy to hear it, but if not, then it might be a good
idea to define __LLVM__ or something like that in (by) llvm-gcc.
Another related thing is that even when I defined -emit-llvm in
what I thought would be a global CFLAGS for all of newlib, it did
not get propagated to all subdirectories.
I solved both of these issues by creating a shell script that is
just a fall-through to llvm-gcc, but passes "-emit-llvm -D__LLVM__"
to it. It might be worthwhile to have a similar thing in the LLVM
distribution, that is, a compiler that would identify the target as
LLVM and produce byte code by default.
There was very little to do in terms of porting. Basically
the only thing I needed to tweak in the source code was to define
floating point endiness, which I randomly picked to be
__IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my
choice.
The next task is to go for the system calls. As I said earlier,
I plan to use intrinsic functions as place holders. Any opinions
how to name them? Currently there are a few intrinsics that have
to do with libc, like llvm.memcpy and llvm.memmove. However, I
would personally prefer less pollution in the intrinsic name space,
so I would propose naming the intrinsics with a llvm.libc prefix,
e.g. llvm.libc.open and so forth. Any strong opinions on this?
I agree with Reid; you should only need an intrinsic if you need to
inline the system call trapping code or want a singular function name
for system calls when performing analysis. Otherwise, the system call
functions (open(), read(), etc) can be implemented in a native code
run-time library.
In the LLVA-OS project, we designed an intrinsic called llva_syscall (in
LLVM, it would be llvm.syscall()) that takes a system call number and a
set of parameters and calls that system call number with those
parameters. It's a slightly higher level trap instruction that
encapsulates most of the OS system call calling conventions. All of the
system calls (open(), read(), etc) are just library function wrappers
around llva_syscall() that provide the right system call number and
re-arrange the input parameters if necessary.
However, you'll notice that we've never implemented it in the LLVM code
generators. That's because there's no need to do so unless you want to
have the system call trapping instruction inlined and you can't use the
LLVM C backend for code generation (i.e. llc -march=c).
What we have done is to implement the llva_syscall() "intrinsic" as an
external function at the LLVM bytecode level. After code generation, we
can then link in a native code library that defines llva_syscall().
Furthermore, if using the C backend, we can define llva_syscall() in a
header file and #include it into the program using GCC's -include
option. This allows the llva_syscall() function to be inlined where
appropriate.
I have an implementation of the x86/Linux llva_syscall() header file
that I can give you, if you need it. I also have a prototype library,
libsys, which implements all of the Linux system calls as calls to
llva_syscall(). It's (mostly) right for Linux 2.4.
<shameless plug>
More information on the llva_syscall() instruction can be found in our
paper at http://llvm.org/pubs/2006-06-18-WIOSCA-LLVAOS.pdf in section III.F.
</shameless plug>
Regards,
-- John T.