Notes on compiler IRs

(This is part of a series on the design of zest. See the list of posts here.)

zig

(Based on reading code.)

Zig has a bunch of IRs:

Ast. Basic parse tree. Potentially contains parse errors.
Zir. Untyped IR.
Air. Typed IR.
Mir. Arch-specific IR. I only looked at the wasm mir - it's pretty much just wasm plus some indirections for constants.

Ast -> Zir happens in AstGen.

Function-level names get resolved. No explicit variables in IR - uses address of instruction to refer to results. Variables become explicit stack slots using alloc/load/store.
Type-level names are still lazy - resolving something like Hashmap(...).KV requires comptime evaluation.
Block labels get resolved. break instructions refer to address of block instruction.
ResultInfo gets passed down, deciding where to store results of expressions and what type to cast literals to.
Produces errors for unreachable code.
Makes explicit any captures of comptime values.
Make explicit any syntactic rules eg there is an ensure_result_used instruciton.

Zir -> Air happens in Sema.

Shakes out the lazy const/function graph from some entry point.
Functions get specialized. Types get inferred.
Comptime values and control flow get evaluated.
Values are stored as handles to intern array. (Vs sil which deliberately avoided having a separate notion of values and just uses sequences of literal instructions. Probably the ubiquity of comptime values would make sil's strategy less workable for zig.)
Error return traces become explicit.
Presumably, checks that switches are exhaustive.
Probably a bunch else happens. Sema is huge.

Air -> Mir happens in CodeGen. Nothing surprising, although it's worth looking at how WValue abstractly represents the result of a wasm instruction.

In both Zir/Air:

Debug info is represented by special instructions. I assume each applies until the next debug instruction, or until the end of the block.
Function args are represented as instructions at the start of the body. This means that eg liveness analysis only has to look at instructions.
Control flow is reducible. Blocks can repeat or break to parent blocks. This makes block instructions behave kinda like a phi node. (Probably makes SRA hard? Maybe that happens in Mir instead?)

Specialization (Zir->Air) can happen many times per function, so can think of Ast->Zir as precomputing everything that can be precomputed before specialization.

All of Ast/Zir/Air/Mir are densely encoded for better memory access patterns (see this talk for early results).

Currently the zig compiler is focused on fast debug builds and still relies on llvm for release builds. If they end up also wanting to do their own optimizations they might need to tweak AIR or add another IR - direct references to instructions are fragile under transformations, and the phi-like block will make SRA tricky.

swift

(Based on SIL.rst, this talk and skimming code.)

Swift goes ast -> raw sil -> canonical sil -> llvm.

Sema/type-checking happens at the ast level. (The sil type system erases parts of the swift type system and has more low-level information.)

Sil exists to perform specialization, language-guaranteed optimizations, dataflow analysis, and any optimizations that depend on knowledge of swift semantics/types. Also distributed in libraries for later specialization, allowing a softer tradeoff between specialization and separate compilation.

In raw sil all variables are boxed.

During raw sil -> canonical sil:

Inline functions marked inline.
Unbox variables where possible (also SRA?).
Propagate constants.
Check dataflow (eg every code path returns).
Split critical edges.

Swift is mostly implemented in itself - can use llvm intrinsics and primitive types directly in swift, and everything else is in stdlib. However, the compiler does have special knowledge about some stdlib functions that it uses for high-level optimizations.

Sil uses basic blocks with arguments. Some functions use 'ownership ssa' which enforces that every value is destroyed exactly once in every control flow path.

Optional debug info is stored per-instruction.

COW operations use matching begin/end_cow_mutation operations (which don't have to be in the same function?). I don't understand the need for the end instruction - presumably something to do with verifying ownership?

Stack dealloc instructions point directly at the corresponding alloc instruction. The stack must depend only on the current instruction, not how it was reached, and must be empty at return. Stack slots are deallocated in FILO order.

Primitive values are represented with literal instructions. Presumably more complex values are built out of sequences of literals and calls to stdlib functions. (No constant propagation for collections?)

The encoding of sil seems heavy:

Instructions are objects inheriting from SILInstruction.
Basic blocks are intrusive doubly-linked lists of instructions, plus quite a lot of constant-sized metadata.
Functions are intrusive doubly-linked lists of basic blocks, plus even more metadata.

cranelift

(Based on ir.md and skimming code.)

Cranelift is a compiler backend, focused on generating reasonable code as fast as possible.

Basic blocks with arguments.

Stack slots are preallocated in the function preamble. Load/store via dedicated instructions, or explicitly ask for the address (makes escape analysis trivial, I guess).

No aggregate types, so all values represented by literal instructions.

Instructions can return multiple values.

Encoding is interesting:

Per function, blocks and instructions are allocated in two large vecs.
Instructions are 16 bytes, with large operands stored in a separate side-table and referenced by index. I haven't been able to figure out the encoding of operands - it's buried in codegen and macros.
A separate Layout stores control flow graph as doubly-linked lists of blocks/instructions, as pairs of u32 indexes in a vec. Also child/parent indexes. Total cost is ~16 bytes per block/instruction.
Also stores full dataflow graph per function.

Presumably the point of the doubly-linked lists is to support mutation (eg inserting instructions) while preserving unique ids. I'm not sure if the dataflow graph is kept in sync during mutation though. I think most of the work is done by egraph rewrite rules in isle so there are a lot of layers to read through.

mir

(Based on MIR.md and grimacing at code.)

Mir is an experimental compiler backend for jits, written by a gcc dev.

Compiler Performance Goals relative to GCC -O2: 70% of generated code speed, 100 times faster compilation speed, 100 times faster start-up, 100 times smaller code size

The original plan was to avoid ssa because of the cost of creating/destroying it. The current readme says it uses ssa but preserves 'conventional ssa'. I haven't seen that term anywhere else, but maybe it means that they avoid optimizations that would produce phi nodes originating from multiple different variables?

Uses intrusive doubly-linked lists of instructions. I can't find any explicit block structure.

Interesting mostly for the list of optimizations here - those that an experienced compiler developer thinks are worth the compile-time cost even for a jit.

So I'm thinking about ir for zest.

I need an interpreter for comptime evaluation. Both zig and mir have ir interpreters. The mir interpreter is only 6x slower than gcc -O2 in the tiny benchmark in the readme, but it looks like it converts the ir to a separate bytecode first. Looks like it shares the same opcodes etc but replaces the doubly-linked list with ordered instructions.

Almost every compiler I've ever looked at uses some kind of control-flow graph, except zig which preserves structured control flow. But! All those compilers had to support languages with irreducible control flow (eg goto in c) and compile to targets that support irreducible control flow (pretty much anything except wasm). Given that I'm compiling a language with reducible control flow (zest) to a target that only allows reducible control flow (wasm), is there any advantage to using a full control-flow graph in the middle? I definitely will need to do SRA to get structs off the shadow stack, but I think that could be done by allowing zig-style blocks to return multiple variables.

All the compilers that do optimization use doubly-linked lists of instructions, presumably to allow inserting/replacing instructions while still having unique instruction ids for sidetables. But following a double-linked list in an interpreter would be horribly slow. It seems reasonable to use the cranelift encoding where the instructions/blocks live in a vec and the doubly-linked list is a separate vec mapping to next/prev indexes. Before interpreting, the instruction/block vec can be shuffled so that non-jump instructions always fall through to the next instruction.

There is a lot of variety in how stack allocations are represented. I have no idea what the tradeoffs are. Probably the cranelift style is easier for an interpreter to deal with since the number of slots is known up-front. But maybe it makes operations like inlining more fiddly? I'm also a little worried that the sil style is somehow tied to refcount optimization and that I'll back myself into a corner there.

Sil covers the same range as zir and air put together. I'd guess the latter makes it easier to push assertions into the type system - it's easy to be sure that a particular operation gets removed by lowering if air doesn't have a way to represent it! But it means that any shared logic has to be implemented twice. In zig's case there is very little of that, but I imagine sil has a lot of helpers for dealing with types, ownership, aliasing etc. It's a much richer ir.

Zig's dense encoding is appealing, but probably worth waiting until the compiler is fairly settled first.