Baby's first wasm compiler

Published 2023-08-28

(This is part of a series on the design of a language. See the list of posts here.)

I made a compiler from a toy language to wasm. The code quality is very much plan-to-throw-one-away. I'm just trying to get a feel for the amount of effort involved in a non-optimizing compiler.

The toy language is mostly unsurprising. It has number, strings, maps (hashtables) and first-class functions, and is dynamically typed. The only unusual part is mutable value semantics.

// nested maps
let mut x = [foo = [bar = 1]];

// y is a copy of x
let y = x;

// z captures a copy of x
let z = fn [] x;

// mutate x
set x.foo.bar = x.foo.bar + 1;
assert[x == ['foo' = ['bar' = 2]]];

// y and z are unchanged
assert[y == ['foo' = ['bar' = 1]]];
assert[z[] == ['foo' = ['bar' = 1]]];

// inc-x takes a mutable reference
let inc-x = fn [mut x] set x.foo.bar = x.foo.bar + 1;

// y is changed
inc-x[mut y];
assert[y == ['foo' = ['bar' = 2]]];

I already have a tree-walking interpreter lying around which mostly turns ast nodes into calls into the runtime. So plan is to do the minimum viable compiler:

  1. Compile the runtime to wasm.
  2. Compile toy programs into a series of calls to the runtime.
  3. Link 1 and 2.

runtime -> wasm

The runtime is written in zig, so let's figure out how to compile it to wasm.

> zig version
0.11.0

> cat test.zig
export fn foo(x: usize) usize {
    return x + 1;
}

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall

> ls         
libtest.a  libtest.a.o  test.zig

That's what I'd expect to get from compiling native code. Where is the wasm?

> file libtest.a
libtest.a: current ar archive

> ar t libtest.a
./libtest.a.o

> file libtest.a.o
libtest.a.o: WebAssembly (wasm) binary module version 0x1 (MVP)

> wasm2wat libtest.a.o
(module
  (type (;0;) (func (param i32) (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $foo (type 0) (param i32) (result i32)
    local.get 0))

So libtest.a is just an achive containing a copy of libtest.a.o, which despite the name is actually a wasm file.

That wasm file isn't self-contained - it expects to import memory from elsewhere rather than defining the memory itself.

We can get a self-contained version by, weirdly, asking for a dynamically linked library.

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic

> ls
test.wasm  test.wasm.o  test.zig

> wasm2wat test.wasm.o
(module
  (type (;0;) (func (param i32) (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $foo (type 0) (param i32) (result i32)
    local.get 0))

> wasm2wat test.wasm
(module
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0)))

test.wasm.o looks the same as libtest.a.o did before, but we now also have test.wasm which contains a nice standalone module that defines it's own memory and can be run in a wasm runtime without any further ado.

Why does asking for -dynamic give us this?

As best I can tell, it's because all of this tooling was designed for c whose notions of linking are subtly different from wasm's. Plus, when working on a larger project you might do c-style linking to compile multiple c libraries into a single wasm module, which then takes part in wasm-style linking with other wasm libraries. To make matters worse, the standards for how wasm components will be linked isn't even finished.

So in the meantime people are just doing their best to map existing commands to something useful.

Another surprising outcome is that test.wasm doesn't contain the function foo. I found a comment explaining that it isn't obvious how c notions of symbol visibility should correspond to what is visible during wasm linking, so the currrent solution is to to manually specify which symbols we want to make visible in the wasm module.

The easiest thing is to ask for 'everything' by passing -rdynamic.

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic -rdynamic

> wasm2wat test.wasm 
(module
  (type (;0;) (func (param i32) (result i32)))
  (func (;0;) (type 0) (param i32) (result i32)
    local.get 0)
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "foo" (func 0)))

Now we have "foo" exported from the final module.

Also note that global. We'll come back to that soon.

c abi

So we started with fn foo(x: usize) usize and ended up with (func (param i32) (result i32)). That makes sense.

But wasm only has scalar types, so what happens if we pass a struct?

> cat test.zig
const Bar = extern struct { x: usize, y: usize };

export fn foo(bar: Bar) Bar {
    return Bar{ .x = bar.x + 1, .y = bar.y + 2 };
}

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic -rdynamic && wasm2wat test.wasm -f
(module
  (type (;0;) (func (param i32 i32)))
  (func (;0;) (type 0) (param i32 i32)
    (i32.store
      (local.get 0)
      (i32.add
        (i32.load
          (local.get 1))
        (i32.const 1)))
    (i32.store offset=4
      (local.get 0)
      (i32.add
        (i32.load offset=4
          (local.get 1))
        (i32.const 2))))
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "foo" (func 0)))

Since we're passing and returning Bar by value we might have expected to get (func (param i32 i32) (result i32 i32)) - a function that takes two integers (x and y) and return two integers.

Instead we got (func (param i32 i32)) which means that this function takes two integers and returns nothing. Uh oh.

Let's look at the body to figure out what's going on.

The use of store and load tells us that the locals are being used as pointers rather than values. We load from (local.get 1) and store to (local.get 0), so local 1 is a pointer to bar and local 0 is a pointer to the return value.

So the way this function is expected to be used is that the caller allocates stack space for the return value and then passes a pointer to return value and a pointer to the input value.

A nice way to demonstrate this is to rewrite foo to use that calling convention explicitly. This generates exactly the same code!

> cat test.zig
const Bar = extern struct { x: usize, y: usize };

export fn foo(result: *Bar, bar: *Bar) void {
    result.* = Bar{ .x = bar.x + 1, .y = bar.y + 2 };
}

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic -rdynamic && wasm2wat test.wasm -f
(module
  (type (;0;) (func (param i32 i32)))
  (func (;0;) (type 0) (param i32 i32)
    (i32.store
      (local.get 0)
      (i32.add
        (i32.load
          (local.get 1))
        (i32.const 1)))
    (i32.store offset=4
      (local.get 0)
      (i32.add
        (i32.load offset=4
          (local.get 1))
        (i32.const 2))))
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "foo" (func 0)))

Usually extern functions in zig implement the standard c calling convention for the target platform, but there is no standard c calling convention for wasm! So zig just tries to do whatever clang does.

Rather than trying to maintain this conversion in my head, I found that it was easier to only ever use types that map to a single wasm scalar. So runtime_wasm.zig is full of glue code like this:

export fn equal(result: *Value, a: *Value, b: *Value) void {
    result.* = Value.fromBool(Value.equal(a.*, b.*));
}

mvs abi

So let's start thinking about how we're going to represent values.

We actually aren't faced with much choice. At some point we're going to call some kind of runtime function like:

export fn equal(result: *Value, a: *Value, b: *Value) void {
    result.* = Value.fromBool(Value.equal(a.*, b.*));
}

This expects a pointer to a value. But in wasm there is no way to take a pointer to a variable on the stack. (This is a necessary limitation to allow sandboxes to reason about variables at compile-time).

How does llvm deal with this? Let's try to observe a pointer to the stack.

> cat test.zig
export fn foo(x: **usize) void {
    var y: usize = 0;
    x.* = &y;
}

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic -rdynamic && wasm2wat test.wasm -f
(module
  (type (;0;) (func (param i32)))
  (func (;0;) (type 0) (param i32)
    (global.set 0
      (i32.sub
        (global.get 0)
        (i32.const 16))))
    (i32.store
      (local.get 0)
      (i32.add
        (global.get 0)
        (i32.const 12)))
    (global.set 0
      (i32.add
        (global.get 0)
        (i32.const 16))))
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "foo" (func 0)))

(If you try this for yourself you'll get something slightly more complicated - I manually undid an optimization to make the code a little clearer).

That (global (;0;) ...) is the 'shadow stack pointer'. The shadow stack is a region of the wasm memory that llvm reserves for anything that is stack-allocated and whose pointer might escape or be observed.

Let's go through the body of foo in detail.

// Global 0 is the shadow stack pointer.

// Push the `foo` frame on the shadow stack.
// ssp -= 16
(global.set 0
  (i32.sub
    (global.get 0)
    (i32.const 16))))

// x.* = ssp + 12
(i32.store
  (local.get 0)
  (i32.add
    (global.get 0)
    (i32.const 12)))

// Pop the `foo` frame from the shadow stack.
// ssp += 16
(global.set 0
  (i32.add
    (global.get 0)
    (i32.const 16))))

Not every stack-allocated value ends up on the shadow stack! Many can be optimized into scalar local variables by the SROA pass. Let's try stack-allocating a struct but not observing the address.

> cat test.zig
const Bar = extern struct { x: usize, y: usize };

extern fn random() bool;

export fn foo() usize {
    var bar = Bar{ .x = 1, .y = 1 };
    while (random()) {
        bar.x += bar.y;
        bar.y *= bar.x;
    }
    return bar.x + bar.y;
}

> zig build-lib test.zig -target wasm32-freestanding -O ReleaseSmall -dynamic -rdynamic && wasm2wat test.wasm -f
(module
  (type (;0;) (func (result i32)))
  (import "env" "random" (func (;0;) (type 0)))
  (func (;1;) (type 0) (result i32)
    (local i32 i32)
    (local.set 0
      (i32.const 1))
    (local.set 1
      (i32.const 1))
    (block  ;; label = @1
      (loop  ;; label = @2
        (local.set 1
          (i32.add
            (local.get 1)
            (local.get 0)))
        (br_if 1 (;@1;)
          (i32.eqz
            (i32.and
              (call 0)
              (i32.const 1))))
        (local.set 0
          (i32.mul
            (local.get 1)
            (local.get 0)))
        (br 0 (;@2;))))
    (local.get 1))
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "foo" (func 1)))

Since we never observed the address of bar, the optimizer can split it into two wasm variables (local i32 i32) rather than using the shadow stack.

(It's really hard to come up with an example that actually produces two variables instead of reusing existing variables, or constant-folding away the entire calculation).

Anyway, the upshot of this is that the runtime wants pointers to linear memory. Since mutable value semantics produce values with nicely stack-shaped lifetimes, the path of least resistance is to store all values on the shadow stack and pass all function parameters (including the return value) by pointer. The c abi has metastasized into our language!

binaryen

While it's easy enough to emit wasm by hand, I wanted to try out binaryen anyway for it's optimization passes.

Building it with clang and trying to link it to a zig program caused segfaults. This kinda suprised me - I know that mixing compilers is a bad idea with c++ but it's usually worked fine for me when I'm only consuming a c wrapper.

But I found a gist showing how to tweak the build to use zig cc and got it to work with only minor changes.

Here's the binaryen hello world translated directly to zig:

const c = @cImport({
    @cInclude("binaryen-c.h");
});

pub fn main() void {
    const module = c.BinaryenModuleCreate();

    // Create a function type for  i32 (i32, i32)
    var ii = [_:0]c.BinaryenType{ c.BinaryenTypeInt32(), c.BinaryenTypeInt32() };
    const params = c.BinaryenTypeCreate(&ii, 2);
    const results = c.BinaryenTypeInt32();

    // Get the 0 and 1 arguments, and add them
    const x = c.BinaryenLocalGet(module, 0, c.BinaryenTypeInt32());
    const y = c.BinaryenLocalGet(module, 1, c.BinaryenTypeInt32());
    const add = c.BinaryenBinary(module, c.BinaryenAddInt32(), x, y);

    // Create the add function
    // Note: no additional local variables
    // Note: no basic blocks here, we are an AST. The function body is just an
    // expression node.
    const adder = c.BinaryenAddFunction(module, "adder", params, results, null, 0, add);
    _ = adder;

    // Print it out
    c.BinaryenModulePrint(module);

    // Clean up the module, which owns all the objects we created above
    c.BinaryenModuleDispose(module);
}

And it's output:

> zig run ./lib/binaryen.zig -Ideps/binaryen/src/ deps/binaryen/lib/libbinaryen.a -lc++
(module
 (type $i32_i32_=>_i32 (func (param i32 i32) (result i32)))
 (func $adder (param $0 i32) (param $1 i32) (result i32)
  (i32.add
   (local.get $0)
   (local.get $1)
  )
 )
)

I do wish the binaryen api was somewhat better documented. In particular, whether or not various pointers are allowed to be null is something I had to figure out by trial and error.

It's also kinda verbose - if I was going to use binaryen heavily I would want to write a zig wrapper to turn eg c.BinaryenLocalGet(module, 0, c.BinaryenTypeInt32()) into module.localGet(0, .int32), and to use slices instead of separate pointer/length pairs.

But the api is straightforward and the memory management is really simple. Everything belongs to module and is freed by BinaryenModuleDispose. All the other types are either simple value types or are integer handles to arena-allocated data in module.

linking

So I have the runtime compiled to wasm and I can emit wasm from binaryen. Now I need to figure out how to get one to call the other. After some extended googling I figured I have three options:

  1. Copy clangs linking format, which embeds custom sections in the wasm module to describe relocations etc.
  2. Use wasm imports/exports and either link at runtime or use wasm-merge.
  3. Read in runtime.wasm and directly modify it.

1 seemed like a lot of work. Binaryen doesn't provide support for emitting relocations or creating the custom sections, so I'd have to encode the custom sections myself, change indices to be relocatable as a post-pass on the binaryen outut and figure out any mistakes via trial-and-error with wasm-ld.

So I tried both 2 and 3 and ended up commiting to 2, using wasm-merge. Neither worked particularly well, but mostly because of gaps in the binaryen api (see below).

For approach 3 the entry point is BinaryenModuleRead. This converts a binary wasm module into a mutable binaryen ast. Just one problem:

warning: unknown subopcode 183 (this may be an unsupported version of DWARF)
warning: unknown subopcode 226 (this may be an unsupported version of DWARF)
warning: unknown subopcode 41 (this may be an unsupported version of DWARF)
warning: unknown subopcode 6 (this may be an unsupported version of DWARF)
warning: unknown subopcode 6 (this may be an unsupported version of DWARF)
warning: unknown subopcode 6 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 5 (this may be an unsupported version of DWARF)
warning: unknown subopcode 91 (this may be an unsupported version of DWARF)
warning: unknown subopcode 58 (this may be an unsupported version of DWARF)
warning: unknown subopcode 58 (this may be an unsupported version of DWARF)
warning: unknown subopcode 6 (this may be an unsupported version of DWARF)
Fatal: TODO: DW_LNE_define_file

When I try to read a debug or release-safe build of runtime_wasm.zig I hit this todo which results in a panic. The same happens for approach 2 in wasm-merge.

Not the end of the world. I do want to use release-safe for all the bounds checks etc (I only used release-small for the examples above to make the output fit on the page). But I can strip out the debug info.

> zig build-lib lib/runtime_wasm.zig -target wasm32-freestanding -dynamic -rdynamic -O ReleaseSafe -fstrip

Now binaryen can read the module and we can try actually generating some interesting code.

shadow stack

First thing we need to do is access the shadow stack by reading the stack pointer address from the global variable. To emit a global.get in binaryen we use:

BINARYEN_API BinaryenExpressionRef BinaryenGlobalGet(BinaryenModuleRef module,
                                                     const char* name,
                                                     BinaryenType type);

It takes the name of the global rather than the index - using names makes it easier to parallelize codegen and then calculate the indexes later.

Unfortunately, stripping the debug info also strips the name of the global. There is no way to get around this in the binaryen api! We can get a handle to the global using BinaryenGetGlobalByIndex but this handle can't be used to set a name or to emit code referring to that global. We can add new globals, but we can't remove the existing global first because BinaryenRemoveGlobal also takes a name rather than an index.

Using approach 2 runs into similar problems. The runtime.wasm module doesn't export it's stack pointer global. And there is no way to refer to it in zig code so we can't export getters/setters.

If we go back to runtime.wasm.o, the non-self-contained version, it expects to import the stack pointer rather than defining it itself. But it also imports a bunch of functions which I don't know how to provide (presumably wasm-ld provides these during linking?):

  (import "env" "memset" (func (;1;) (type 5)))
  (import "env" "memcpy" (func (;2;) (type 5)))
  (import "env" "__stack_pointer" (global (;0;) (mut i32)))
  (import "env" "__multi3" (func (;3;) (type 14)))
  (import "env" "__fixunsdfti" (func (;4;) (type 2)))
  (import "env" "__floatuntidf" (func (;5;) (type 15)))
  (import "env" "__ashlti3" (func (;6;) (type 16)))
  (import "env" "__udivti3" (func (;7;) (type 14)))
  (import "env" "__umodti3" (func (;8;) (type 14)))

So, first gross hack of the project: we'll just create a second shadow stack rather than sharing the first one. We define a global stack pointer and call runtime.start to allocate space in the wasm memory:

_ = c.BinaryenAddGlobal(self.module.?, "__yet_another_stack_pointer", c.BinaryenTypeInt32(), true, c.BinaryenConst(self.module.?, c.BinaryenLiteralInt32(data_start)));

// ...

block[0] = self.runtimeCall0(
    "runtime_start",
    &.{
        c.BinaryenConst(self.module.?, c.BinaryenLiteralInt32(@bitCast(@as(u32, @intCast(
            std.math.divCeil(
                usize,
                data_start + self.strings.items.len,
                16 * 1024,
            ) catch unreachable,
        ))))),
    },
);

// ...

c.BinaryenSetStart(
    self.module.?,
    c.BinaryenAddFunction(
        self.module.?,
        "start",
        c.BinaryenTypeNone(),
        c.BinaryenTypeNone(),
        null,
        0,
        c.BinaryenBlock(self.module.?, null, block.ptr, @intCast(block.len), c.BinaryenTypeNone()),
    ),
);
export fn start(pages: usize) void {
    assert(@wasmMemoryGrow(0, pages - @wasmMemorySize(0)) >= 0);
}

The layout now looks like this:

The first shadow stack starts at 1048576 and counts down to 0, so when the stack overflows we'll get a trap.

The second shadow stack starts at 1048576*2 and counts down to, uh, whereever the end of the runtime data section is, at which point it will start corrupting memory. Not a great solution.

constants

Emitting numbers is easy enough.

// mvs
2.5

// wasm
// call runtime.createNumber
(call 1
  // ssp + 0
  (i32.add
    (global.get 0)
    (i32.const 0))
  // 2.5
  (f64.const 0x1.4p+1 (;=2.5;)))
// runtime
export fn createNumber(ptr: *Value, number: f64) void {
    ptr.* = .{ .number = number };
}

Emitting code for maps is also pretty reasonable. We first create an empty map and then set the keys one by one:

// mvs
[1 = 2, 3 = 4]

// wasm
// call runtime.createMap(ssp+0)
(call 3
  (i32.add
    (global.get 0)
    (i32.const 0)))
// call runtime.createNumber(ssp+32, 1)
(call 1
  (i32.add
    (global.get 0)
    (i32.const 32))
  (f64.const 0x1p+0 (;=1;)))
// call runtime.createNumber(ssp+64, 2)
(call 1
  (i32.add
    (global.get 0)
    (i32.const 64))
  (f64.const 0x1p+1 (;=2;)))
// call runtime.mapSet(ssp+0, ssp+32, ssp+64)
(call 11
  (i32.add
    (global.get 0)
    (i32.const 0))
  (i32.add
    (global.get 0)
    (i32.const 32))
  (i32.add
    (global.get 0)
    (i32.const 64)))
// etc...
// runtime
const global_allocator = std.heap.wasm_allocator;
export fn createMap(ptr: *Value) void {
    ptr.* = .{ .map = Map.init(global_allocator) };
}
export fn mapSet(map: *Value, key: *Value, value: *Value) void {
    map.*.map.put(key.*, value.*) catch oom();
}

Emitting code for strings is almost reasonable. We just call runtime.createString with a pointer to some constant data in memory:

// mvs
"foo"

// wasm
// call runtime.createString
(call 2
  // ssp+0
  (i32.add
    (global.get 0)
    (i32.const 0))
  // pointer to string data
  (i32.const 2097152)
  // string length
  (i32.const 3))
// runtime
export fn createString(ptr: *Value, string_ptr: [*]const u8, string_len: usize) void {
    ptr.* = .{ .string = global_allocator.dupe(u8, string_ptr[0..string_len]) catch oom() };
}

But how does that string data get into memory in the first place?

Wasm has data sections for this. Here's one from runtime.wasm:

(data $builtin.panic_messages.index_out_of_bounds__anon_1647 (i32.const 78) "index out of bounds\00")

You can see it has a name, an offset (78) in memory and some bytes to copy at that offset.

Data sections can also be marked passsive, in which case there is no offset but we can use the memory.init instruction to copy the data into memory later.

The api in binaryen for data sections is combined with the memory api:

// This will create a memory, overwriting any existing memory
// Each memory has data in segments, a start offset in segmentOffsets, and a
// size in segmentSizes. exportName can be NULL
BINARYEN_API void BinaryenSetMemory(BinaryenModuleRef module,
                                    BinaryenIndex initial,
                                    BinaryenIndex maximum,
                                    const char* exportName,
                                    const char** segments,
                                    bool* segmentPassive,
                                    BinaryenExpressionRef* segmentOffsets,
                                    BinaryenIndex* segmentSizes,
                                    BinaryenIndex numSegments,
                                    bool shared,
                                    bool memory64,
                                    const char* name);

I really just want to add a single passive segment but, fine, this is doable.

With segmentPasivse set to true, passing any expression for segmentOffset produces:

[wasm-validator error in module] 0x1c18308 != (nil): passive segment should not have an offset, on 
(i32.const 0)

I eventually figured out that I'm supposed pass a null pointer for the segmentOffset expression.

Now we have to get that data into memory using the memory.init instruction:

BINARYEN_API BinaryenExpressionRef
BinaryenMemoryInit(BinaryenModuleRef module,
                   const char* segment,
                   BinaryenExpressionRef dest,
                   BinaryenExpressionRef offset,
                   BinaryenExpressionRef size,
                   const char* memoryName);

It's not clear at first what const char* segment is supposed to be. It can't be the segment data because I set that in BinaryenSetMemory. Maybe it's the segment's name, but BinaryenSetMemory didn't ask me for segment names.

Looking at the implementation for BinrayenSetMemory though I can see that it sets names using Name::fromInt(i) so the name for my string data segment is "0" :|

So with all that put together I get:

error: Uncaught CompileError: WebAssembly.Module(): memory count of 2 exceeds internal limit of 1 @+211
const wasmModule = new WebAssembly.Module(wasmCode);
                   ^

There is a memory in the runtime, and calling BinaryenSetMemory in the compiler creates another memory. We only want one! But there doesn't seem to be way in binaryen to either create a passive data segment without creating a memory, nor to delete a memory once it's created.

So, second gross hack of the project. We'll just turn all the string data into individual store instructions in the start function, byte by byte.

Here's the store intruction:

// Store: align can be 0, in which case it will be the natural alignment (equal
// to bytes)
BINARYEN_API BinaryenExpressionRef BinaryenStore(BinaryenModuleRef module,
                                                 uint32_t bytes,
                                                 uint32_t offset,
                                                 uint32_t align,
                                                 BinaryenExpressionRef ptr,
                                                 BinaryenExpressionRef value,
                                                 BinaryenType type,
                                                 const char* memoryName);

It takes the name of the memory. Which we don't have.

We could import the memory from the runtime, but although wasm-merge resolves this correctly is doesn't delete the import statement, leaving us with both a defined memory and a memory import with the same name, which doesn't work.

BinaryenStore doesn't allow memoryName to be null either - it segfaults.

The second gross hack is going to get grosser. We'll export a setByte function from the runtime and call that intead of store.

(func (;28;) (type 5)
  // Call runtime.start
  (call 0
    (i32.const 129))
  // runtime.setByte(2097152, 'f')
  (call 17
    (i32.const 2097152)
    (i32.const 102))
  // runtime.setByte(2097153, 'o')
  (call 17
    (i32.const 2097153)
    (i32.const 111))
  // runtime.setByte(2097154, 'o')
  (call 17
    (i32.const 2097154)
    (i32.const 111)))
(start 28)
export fn set_byte(ptr: *u8, byte: u8) void {
    ptr.* = byte;
}

This is incredibly wasteful, both in terms of code size and startup time. But it works.

closures

Closures at least were easy.

There is no way to take a pointer to a function. Instead we can put function references into a special table and then call them via integer handles.

_ = c.BinaryenAddTable(
    self.module.?,
    "fns",
    @intCast(self.fns.items.len),
    @intCast(self.fns.items.len),
    c.BinaryenTypeFuncref(),
);
const fn_names = self.allocator.alloc([*c]const u8, self.fns.items.len) catch oom();
for (fn_names, self.fns.items) |*fn_name_c, fn_name| fn_name_c.* = fn_name.ptr;
_ = c.BinaryenAddActiveElementSegment(
    self.module.?,
    "fns",
    "fns_init",
    fn_names.ptr,
    @intCast(fn_names.len),
    c.BinaryenConst(self.module.?, c.BinaryenLiteralInt32(0)),
);

Closures are created much like maps.

// mvs
let x = 42;
let f = fn [y] x + y;
f[4]

// wasm
...
// creating `f`:
// call runtime.createFn(ssp+64, ix=0, arg_count=1, capture_count=1)
(call 4
  (i32.add
    (global.get 0)
    (i32.const 64))
  (i32.const 0)
  (i32.const 1)
  (i32.const 1))
// call runtime.copy(runtime.fnGetCapture(ssp+64, 0), ssp+32)
(call 14
  (call 9
    (i32.add
      (global.get 0)
      (i32.const 64))
    (i32.const 0))
  (i32.add
    (global.get 0)
    (i32.const 32)))
...
// calling `f`:
// call runtime.createNumber(ssp+128, 4)
(call 1
  (i32.add
    (global.get 0)
    (i32.const 128))
  (f64.const 0x1p+2 (;=4;)))
// call_indirect(runtime.fnGetIx(ssp+64), ssp+64, ssp+128)
(call_indirect (type 0)
  (call 10
    (i32.add
      (global.get 0)
      (i32.const 64)))
  (i32.add
    (global.get 0)
    (i32.const 96))
  (i32.add
    (global.get 0)
    (i32.const 128)))
// runtime
export fn createFn(ptr: *Value, fn_ix: u32, mut_count: u32, capture_count: u32) void {
    ptr.* = .{ .@"fn" = .{
        .ix = fn_ix,
        .muts = global_allocator.alloc(bool, mut_count) catch oom(),
        .captures = global_allocator.alloc(Value, capture_count) catch oom(),
    } };
}
export fn copy(ptr_a: *Value, ptr_b: *const Value) void {
    ptr_a.* = ptr_b.copy(global_allocator);
}
export fn fnGetCapture(ptr: *Value, capture_ix: u32) *Value {
    return &ptr.*.@"fn".captures[capture_ix];
}
export fn fnGetIx(ptr: *Value) u32 {
    return ptr.*.@"fn".ix;
}

The function body itself takes a couple of extra parameters:

(func (;27;) (type 0) (param i32 i32 i32)
  // ssp -= 64
  (global.set 0
    (i32.sub
      (global.get 0)
      (i32.const 64)))
  // call runtime.move(ssp+0, runtime.fnGetCapture(fn, 0))
  (call 13
    (i32.add
      (global.get 0)
      (i32.const 0))
    (call 9
      (local.get 1)
      (i32.const 0)))
  // call runtime.move(ssp+32, y)
  (call 13
    (i32.add
      (global.get 0)
      (i32.const 32))
    (local.get 2))
  // call runtime.add(return_value, ssp+0, ssp+32)
  (call 23
    (local.get 0)
    (i32.add
      (global.get 0)
      (i32.const 0))
    (i32.add
      (global.get 0)
      (i32.const 32)))
  // ssp += 64
  (global.set 0
    (i32.add
      (global.get 0)
      (i32.const 64))))

That's it. No gross hacks.

I did represent all functions as closures with indirect calls, which is pretty inefficient, but it should be pretty simple to add a constant folding pass before the compiler which turns statically-knowable call-sites into direct calls instead.

next

Aside from gaps in binaryen, writing the compiler in this runtime-heavy style was pretty trivial. The binaryen cpp internals seems much more capable than the c api, so I could just extend the c api. But it's not actually giving me that much anyway. The main pros of using binaryen to emit wasm are:

So I'm managing to hit a bunch of gaps in binaryen by doing weird linking shenanigans and then missing out on all the good stuff that it provides. For the next compiler I'm tempted to emit my own wasm in immediate-mode style. Something like:

// binaryen style

const block = self.allocator.alloc(c.BinaryenExpressionRef, map.keys.len + 1) catch oom();
block[0] = self.runtimeCall0("createMap", &.{map_location});
for (block[1..], map.keys, map.values) |*set_expr, key_expr_id, value_expr_id| {
    const key_offset = shadowPush(scope);
    const value_offset = shadowPush(scope);
    const key_location = self.shadowPtr(key_offset);
    const value_location = self.shadowPtr(value_offset);
    var set_block = [_]c.BinaryenExpressionRef{
        try self.compileExpr(scope, key_location, key_expr_id),
        try self.compileExpr(scope, value_location, value_expr_id),
        self.runtimeCall0("mapSet", &.{map_location, key_location, value_location});
        },
    );
    };
    set_expr.* = c.BinaryenBlock(self.module.?, null, &set_block, @intCast(set_block.len), c.BinaryenTypeNone());
    shadowPop(scope, value_offset);
    shadowPop(scope, key_offset);
}
return c.BinaryenBlock(self.module.?, null, block.ptr, @intCast(block.len), c.BinaryenTypeNone());

// immediate-mode style

const map_location = self.shadowPush();

self.blockStart();
defer self.blockEnd();

self.runtimeCall0("createMap", &.{map_location});
for (map.keys, map.values) |key_expr_id, value_expr_id| {
    const key_location = try self.compileExpr(key_expr_id);
    defer self.shadowPop(key_location);

    const value_location = try self.compileExpr(value_expr_id);
    defer self.shadowPop(value_location);

    self.runtimeCall0("mapSet", &.{map_location, key_location, value_location});
}

return map_location;

Probably still worth using binaryen for validating and optimizing the output though.

I'll stick with using the wasm linking model instead of the c linking model. If nothing else, I want to reserve the option to reload individual modules rather than having to recompile and reload the entire program. I think I can make that work with wasm linking by emitting indirect calls between modules and editing the indirect function table when the new module is loaded. I do need to measure the overhead of calling between wasm modules in various runtimes though.

This prototype is no good for measuring performance because I didn't do any of the copy-elision or ref-counting optimizations that make mutable value semantics practical (don't tell anyone, but I didn't even emit the code to free values when they're dropped).

The value representation is also terrible. If I was planning on sticking with a fully dynamic language it would make sense to do pointer tagging and NaN-boxing, but I think something closer to Julia is a better bet.

The output size isn't bad: around 48kb for a small program that uses all the different value types. A huge chunk of that is the constant data section from the runtime and I have no idea what's in there.

I've proposed a separate, more explicit wasm abi for zig that only supports wasm scalars but allows using multivalue returns. This would allow eg returning small structs by value rather than being forced into adopting the c calling convention at the runtime boundary.

Probably the next step is to figure out if the performance (both compile-time and run-time) can meet my goals. Maybe measure existing languages that have non-llvm wasm backends, and also try compiling a simple static language to wasm. I'll have to look at different wasm backends too - there's a huge variety in compile-time / run-time tradeoffs.