Running Lua C modules in a pure Lua environment (1)

RMAG news

Most desktop scripting languages are incomplete without a way to add extra functionality from native code. Modules written in C (or other compiled languages) are used to add extra functionality, expose system APIs, import high-performance libraries, and more. Lua stands out as a language that was built with C extensibility in mind – it has a simple C API which works both to extend the Lua environment, as well as to create the environment itself. As such, there’s a wide ecosystem of Lua modules written in C, often wrapping system libraries in Lua APIs.

However, being targeted at embedded applications, developers often remove the ability to load C modules in the Lua environment, for security and/or compatibility reasons. This can wipe out a whole bunch of useful modules that aren’t available in pure Lua form, such as complex compressors. My favorite Lua environment, ComputerCraft, is one of those – since the VM, Cobalt, is written in Java, it can’t load C modules without a huge amount of work across the JNI boundary; and loading arbitrary code on a shared server is super insecure anyway.

For no particular reason, I wanted to be able to load these C modules somehow – I already managed to run C DOOM on CC in a RISC-V VM, so there had to be a way to load the modules. I had a few main goals in mind:

The loader has to be able to load the modules unmodified – this meant following the standard Lua loading process of calling a shared library’s luaopen_* function.
The module code has to be compiled unmodified, using whatever compile steps the author provides (besides configuring it to target the right compiler). This means implementing the Lua API, plus any system APIs required (such as libc).
The module should integrate seamlessly with the rest of the outside Lua environment – all Lua values should be stored in the same environment that’s loading the module, and not inside the VM (i.e. using another Lua VM on the C side, then copying the values over).

My devious solution: implement a RISC-V VM that emulates a Linux system; implement the Lua API in Lua and expose it to the VM; dynamically link and load the shared module file into the VM; and execute the functions required as one-shot calls on the VM.

RISC-V

If you’re not familiar, RISC-V is a new-ish CPU architecture that focuses on being simple to implement, but highly scalable up to desktop architectures. It has a grand total of 40 instructions in the base 32-bit integer instruction set, and uses a load/store, register-register RISC ISA.

Back when working on DOOM-CC, I chose it as my target because of its low instruction count. I only needed to implement 48 instructions (integer + multiplication extension) to have a fully working system. It also has strong compiler support in GCC, so I didn’t need to fiddle with finding a compiler and runtime library. Because I already had the emulator core written, I figured I’d just scoop it out and bring it over to this new project.

Since I didn’t write about the DOOM project, I’ll quickly go over what I remember of the development process for the RISC-V VM. I first spent some time studying the instruction formats. RISC-V has four core instruction formats (R, I, S, U) and two modified formats (B, J), for which I wrote a decoder function for each. All formats have a 7-bit opcode field, for which each opcode maps to a certain format. To handle this, I made a table mapping each opcode to the decoder function, and the clock function takes the opcode, finds the decoder function, and passes the decoded instruction to the execution function.

After that, I wrote a function for each instruction. The instruction table is indexed by the opcode first, but some instructions are further selected by the funct7 field in the instruction, so those instructions are further indexed by funct7 to get the executor function. Each function takes the CPU instance (self) and the decoded instruction, and executes the instruction requested before returning to the tick function.

For expediency, I decided to use LuaJIT and its FFI library to store the CPU’s memory, which let me have views for 8-/16-/32-bit values over the same memory block, and is much faster than normal tables. This did mean that the code only worked under LuaJIT environments, but I could easily replace it with tables later once I wanted it to work with normal Lua.

Once the instructions were initially written, I needed a way to load binaries into the CPU for testing. I wrote up a little ELF loader that would simply read the required file header fields (like entrypoint), and copy all of the program headers into the memory space. (More on ELF later.) After that, I started running the CPU tests provided by the RISC-V authors. A lot of tweaking was required, but eventually I got it to pass all the tests.

Finally, I had to implement some way to access the CC environment from inside the VM. I used the RISC-V ECALL instruction to do this, which is also used by Linux for system calls (which will become relevant later). The function number is passed in register a7/x17, and arguments in registers a0-a6/x10-x16. I made a table of functions for this as well, with each of them implementing some function like drawing the graphics to the screen, or accessing files on the filesystem. After a bit of tweaking, I got it working… albeit, at such a low framerate that it’s completely unplayable (even with LuaJIT!).

Lua in Lua

The first part I worked on was implementing the Lua C API. The essence of it is that all of the state is stored in Lua outside of the VM as native objects, and a shim C library calls out to the Lua side to do whatever operations are necessary. API calls are made through a single ECALL/syscall with ID 0x1b52, and the function to call is the first argument.

On the Lua side, each lua_State object is stored in a table, with the coroutine it applies to, the CPU object for the system, and the “stack” which stores all of the values that the API uses. There is one lua_State object at the start, which represents the main/current calling thread; and the pointer passed to functions is actually a number indexing into the table. Because some value types that are accessible to the C API aren’t representable in plain Lua (like C functions and userdata pointers), those types are wrapped in tables (as well as tables themselves, so that it can differentiate between wrapped types and table values).

On the C side, most of the functions are simple wrappers around syscall, casting types as necessary. However, some functions return memory blocks or strings, which Lua can’t natively create – instead of expecting Lua to allocate the memory region, the C function allocates the necessary memory, then passes the pointer to Lua to fill up. There’s also a few functions that only use static pointers, so those are entirely implemented in C.

Here’s a brief list of issues I ran into while testing, and how I fixed them:

Lua uses negative numbers to indicate stack positions relative to the top. However, I forgot that the RISC-V VM only deals with unsigned numbers. I had to add an extra function call to convert unsigned numbers to signed numbers before using the index.
There are also certain negative indices that point to special values – LUA_REGISTRYINDEX (-1001000) points to a secret global table, and indices below that point to upvalues in a C function. I eventually had to implement those to be able to use most modules.
Lua uses double floating point numbers for its native number type, but because I didn’t implement the floating-point extension for RISC-V, doubles are passed in two integer registers, as according to the ABI. I had to make a quick set of functions to convert the two integers into a native Lua number, which used string.[un]pack to reinterpret the two integers as a double.
Moving values between tables and the stack involves wrapping/unwrapping the stack values. I had to make a few functions to do that sanely, and it also required rewriting some code I had previously to use them instead.

Setting up a cross-compiler

Now that I had some C code ready, I needed a compiler to turn it into RISC-V code. While I had a compiler set up for compiling 32-bit RISC-V already, it used the embedded Newlib C library and didn’t target Linux systems. Since I wanted to have off-the-shelf modules run unmodified, I wanted to use the canonical C library for Linux, glibc, on top of Linux syscalls. This required building all of those components from scratch in a proper cross-compile toolchain.

I first looked through the Arch Linux AUR to see if anyone had anything for it, but only 64-bit toolchains were available. I wanted to avoid using super heavy toolchains anyway, since I only implemented the IM instruction set, while most Linux toolchains use the full GC/IMAFDCZicsr_Zfencei instruction set. I then proceeded to start building a cross-compiler by hand. I cloned the latest versions of GCC, glibc, and dependencies, but I soon realized that I was in over my head a bit – compiling it manually was gonna be pretty tough.

I then found that RISC-V has a build script for GCC+glibc already available, so I went and cloned that and built it. I had some issues with cloning one of the submodules (it was pointing to an invalid commit), but I was able to get around it by cloning each submodule individually, besides the one that was having problems (I didn’t need it). Then I built the toolchain with:

./configure –prefix=/usr/riscv32-linux-gnu –with-arch=rv32im –with-abi=ilp32
make linux

The build went pretty quickly on my R9 5950X CPU. However, shortly into the glibc build process, I was greeted with an error – glibc requires the A (atomic) extension on RISC-V. No problem, I could implement that once required. After switching the arch to rv32ima, it finished successfully, and I was ready to build the project.

I first went to build “liblua” into a static library, which was just a few gcc -c commands to build the object files, and then an ar to pack them into liblua.a. I then tried to build a test module file against this library. My original plan was to build everything statically, so I wouldn’t need any dynamic linking. I wanted to simply load the code from the module, find the symbol address in the symbol table, and then execute it. But no matter what I tried, I couldn’t statically link glibc because the static library wasn’t built to work with shared library files:

/usr/local/lib/gcc/riscv32-unknown-linux-gnu/13.2.0/../../../../riscv32-unknown-linux-gnu/bin/ld: /usr/local/sysroot/usr/lib/libc.a(malloc.o): relocation R_RISCV_TPREL_HI20 against `a local symbol’ can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/gcc/riscv32-unknown-linux-gnu/13.2.0/../../../../riscv32-unknown-linux-gnu/bin/ld: /usr/local/sysroot/usr/lib/libc.a(sysdep.o): relocation R_RISCV_TPREL_HI20 against `__libc_errno’ can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/gcc/riscv32-unknown-linux-gnu/13.2.0/../../../../riscv32-unknown-linux-gnu/bin/ld: BFD (GNU Binutils) 2.42 assertion fail /home/jack/Downloads/riscv32-gcc/riscv-gnu-toolchain/binutils/bfd/elfnn-riscv.c:2565
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped
compilation terminated.

I was able to successfully build a shared library file for the module with the dynamically linked glibc file – but this now meant I had to write a dynamic linker.

The next post in the series will describe how dynamic linking works, and what I had to do to implement a dynamic linker for the virtual machine in Lua.