Notes on compiler IRs

Published 2024-03-27

(This is part of a series on the design of a language. See the list of posts here.)


(Based on reading code.)

Zig has a bunch of IRs:

Ast -> Zir happens in AstGen.

Zir -> Air happens in Sema.

Air -> Mir happens in CodeGen. Nothing surprising, although it's worth looking at how WValue abstractly represents the result of a wasm instruction.

In both Zir/Air:

Specialization (Zir->Air) can happen many times per function, so can think of Ast->Zir as precomputing everything that can be precomputed before specialization.

All of Ast/Zir/Air/Mir are densely encoded for better memory access patterns (see this talk for early results).

Currently the zig compiler is focused on fast debug builds and still relies on llvm for release builds. If they end up also wanting to do their own optimizations they might need to tweak AIR or add another IR - direct references to instructions are fragile under transformations, and the phi-like block will make SRA tricky.


(Based on SIL.rst, this talk and skimming code.)

Swift goes ast -> raw sil -> canonical sil -> llvm.

Sema/type-checking happens at the ast level. (The sil type system erases parts of the swift type system and has more low-level information.)

Sil exists to perform specialization, language-guaranteed optimizations, dataflow analysis, and any optimizations that depend on knowledge of swift semantics/types. Also distributed in libraries for later specialization, allowing a softer tradeoff between specialization and separate compilation.

In raw sil all variables are boxed.

During raw sil -> canonical sil:

Swift is mostly implemented in itself - can use llvm intrinsics and primitive types directly in swift, and everything else is in stdlib. However, the compiler does have special knowledge about some stdlib functions that it uses for high-level optimizations.

Sil uses basic blocks with arguments. Some functions use 'ownership ssa' which enforces that every value is destroyed exactly once in every control flow path.

Optional debug info is stored per-instruction.

COW operations use matching begin/end_cow_mutation operations (which don't have to be in the same function?). I don't understand the need for the end instruction - presumably something to do with verifying ownership?

Stack dealloc instructions point directly at the corresponding alloc instruction. The stack must depend only on the current instruction, not how it was reached, and must be empty at return. Stack slots are deallocated in FILO order.

Primitive values are represented with literal instructions. Presumably more complex values are built out of sequences of literals and calls to stdlib functions. (No constant propagation for collections?)

The encoding of sil seems heavy:


(Based on and skimming code.)

Cranelift is a compiler backend, focused on generating reasonable code as fast as possible.

Basic blocks with arguments.

Stack slots are preallocated in the function preamble. Load/store via dedicated instructions, or explicitly ask for the address (makes escape analysis trivial, I guess).

No aggregate types, so all values represented by literal instructions.

Instructions can return multiple values.

Encoding is interesting:

Presumably the point of the doubly-linked lists is to support mutation (eg inserting instructions) while preserving unique ids. I'm not sure if the dataflow graph is kept in sync during mutation though. I think most of the work is done by egraph rewrite rules in isle so there are a lot of layers to read through.


(Based on and grimacing at code.)

Mir is an experimental compiler backend for jits, written by a gcc dev.

Compiler Performance Goals relative to GCC -O2: 70% of generated code speed, 100 times faster compilation speed, 100 times faster start-up, 100 times smaller code size

The original plan was to avoid ssa because of the cost of creating/destroying it. The current readme says it uses ssa but preserves 'conventional ssa'. I haven't seen that term anywhere else, but maybe it means that they avoid optimizations that would produce phi nodes originating from multiple different variables?

Uses intrusive doubly-linked lists of instructions. I can't find any explicit block structure.

Interesting mostly for the list of optimizations here - those that an experienced compiler developer thinks are worth the compile-time cost even for a jit.

So I'm thinking about ir for zest.

I need an interpreter for comptime evaluation. Both zig and mir have ir interpreters. The mir interpreter is only 6x slower than gcc -O2 in the tiny benchmark in the readme, but it looks like it converts the ir to a separate bytecode first. Looks like it shares the same opcodes etc but replaces the doubly-linked list with ordered instructions.

Almost every compiler I've ever looked at uses some kind of control-flow graph, except zig which preserves structured control flow. But! All those compilers had to support languages with irreducible control flow (eg goto in c) and compile to targets that support irreducible control flow (pretty much anything except wasm). Given that I'm compiling a language with reducible control flow (zest) to a target that only allows reducible control flow (wasm), is there any advantage to using a full control-flow graph in the middle? I definitely will need to do SRA to get structs off the shadow stack, but I think that could be done by allowing zig-style blocks to return multiple variables.

All the compilers that do optimization use doubly-linked lists of instructions, presumably to allow inserting/replacing instructions while still having unique instruction ids for sidetables. But following a double-linked list in an interpreter would be horribly slow. It seems reasonable to use the cranelift encoding where the instructions/blocks live in a vec and the doubly-linked list is a separate vec mapping to next/prev indexes. Before interpreting, the instruction/block vec can be shuffled so that non-jump instructions always fall through to the next instruction.

There is a lot of variety in how stack allocations are represented. I have no idea what the tradeoffs are. Probably the cranelift style is easier for an interpreter to deal with since the number of slots is known up-front. But maybe it makes operations like inlining more fiddly? I'm also a little worried that the sil style is somehow tied to refcount optimization and that I'll back myself into a corner there.

Sil covers the same range as zir and air put together. I'd guess the latter makes it easier to push assertions into the type system - it's easy to be sure that a particular operation gets removed by lowering if air doesn't have a way to represent it! But it means that any shared logic has to be implemented twice. In zig's case there is very little of that, but I imagine sil has a lot of helpers for dealing with types, ownership, aliasing etc. It's a much richer ir.

Zig's dense encoding is appealing, but probably worth waiting until the compiler is fairly settled first.