0038: cheap compilation, mvs-to-wasm, automatically isolating bugs, mastodone, other stuff

Published 2023-07-28

cheap compilation?

The most popular options for implementing programming languages and query languages are:

  1. Write a bytecode interpreter and try to amortize the runtime overhead through vectorized operations (eg python calling into c libraries or duckdb's vector operations).
  2. Use LLVM and suffer the slow compilation (and the constant dependency churn).

Neither is entirely satisfying. Vectorized interpreters provide a great performance-to-effort ratio but often lead to programmer contortions to avoid falling into slow scalar code. LLVM-backed languages like Julia can avoid that two-language problem but at the cost of horrific compile times. These are two extremes of some pareto-optimal curve and really we'd like to be able to smoothly slide along it without having to completely reimplement our language at each point.

(Tangentially related: In The Death Of Optimizing Compilers djb argues that most code doesn't really benefit from heavy optimization and that we'd be better off prioritizing compilation speed and then using specialized tools for optimizing hotspots. I'm not sure whether this is true for most server-side code - if you run the same binary across enough machines it's always worth spending another hour optimizing it. But it does seem to match the world of client-side code where multi-tier jits are the norm - emit barely optimized machine code as fast as possible and then tier-up on hotspots later.)

If you're Thomas Neumann you slide along this curve by writing your own entire pipeline from scratch. But I've been wondering if directly emitting wasm might be a good enough kludge. It's a stable format, easy to generate, and has a wealth of backend options from interpreters to aot compilers to multi-tier jits.

So the idea is to write the language runtime, including any vectorized operators, in c or whatever and compile it via llvm to wasm. And then for the language itself rather than writing a bytecode interpreter we do a minimum of language-specific optimizations and then emit fairly unoptimized wasm.

Where will this fall on that compile-time/run-time pareto curve? How much effort is emitting wasm compared to writing a fast interpreter? Will writing tooling (debuggers, profilers etc) be much harder than it is for interpreters, or will we benefit from existing tooling (when it eventually exists)? What will interactive usage look like - are there runtimes that can incrementally swap out compiled code?

Answering the performance questions will require some carefully controlled experiments, and the answers will probably depend a lot on the language itself and how much it's current implementation depends on llvm to remove redundant code. I only have the barest sketches of how to approach those experiments, especially when trying to quantify how much is lost by not having embedded metadata like aliasing hints or branch hints.

But the other practical questions can be easily answered with a toy language.


So let's compile last months toy language to wasm. I'm not very fair along with this yet.

I'm using binaryen to emit wasm. It's pretty easy to emit wasm directly, but I'm interested in experimenting with binaryens optimizations passes later anyway.

When I built binaryen with clang following the instructions and linked it to a zig hello world, it segfaulted on the first call to the library. I don't have the skills to debug linking issues but I assume this is something to do with using two different versions of clang. Luckily I found kripken had a gist on compiling binaryen with zig-cc that worked with some minor tweaks.

It took me a while to figure out how to link the runtime to the code that I'm generating with binaryen. There's a separate spec for wasm object files which is what clang's lld expects as input, but those require generating some custom section with a list of relocations and oh no I still haven't read that linker textbook that's been sitting on my bookshelf for years...

Eventually I came up for air and realized that you only need to do that stuff if you're doing c-style linking. If you have the luxury of starting from scratch with a new language you can just use wasm imports/exports.

So my 'runtime' looks like:

export fn add(x: usize) usize {
    return x + 1;

And we can compile it like this:

> zig build-lib lib/runtime.zig -target wasm32-freestanding

> ls *runtime*
libruntime.a  libruntime.a.o

Uh, those aren't wasm.

We have to add -dynamic to get wasm output, which doesn't really make sense to me, but I guess clang cli options predate wasm so all this stuff is weirdly squished into a c-shaped box.

> zig build-lib lib/runtime.zig -target wasm32-freestanding -dynamic

> ls *runtime*
runtime.wasm  runtime.wasm.o

I think runtime.wasm.o is the wasm object file thing that I rabbit-holed on above, but we also get plain old runtime.wasm which is what we want here.

Let's look at what it contains.

> wasm2wat -f runtime.wasm
  (memory (;0;) 16)
  (global $__stack_pointer (mut i32) (i32.const 1048576))
  (export "memory" (memory 0)))

Uh, where's our code?

This actually used to work but was changed for reasons that I would probably understand if I read that linker book that is sighing in the corner. The solution is to either explicitly list symbols that we want to export or use -rdynamic to automatically export them all.

> zig build-lib lib/runtime.zig -target wasm32-freestanding -dynamic -rdynamic

> wasm2wat -f runtime.wasm | grep export
  (export "memory" (memory 0))
  (export "add" (func $add))

Now in our compiler we can import that function:

// Import inc = runtime.add
    var i = [_]c.BinaryenType{c.BinaryenTypeInt32()};
    const params = c.BinaryenTypeCreate(&i, i.len);
    c.BinaryenAddFunctionImport(module, "inc", "runtime", "add", params, c.BinaryenTypeInt32());

// Define add_inc = fn [x, y] inc(x + y)
    var ii = [_]c.BinaryenType{ c.BinaryenTypeInt32(), c.BinaryenTypeInt32() };
    const params = c.BinaryenTypeCreate(&ii, ii.len);
    const results = c.BinaryenTypeInt32();
    const x = c.BinaryenLocalGet(module, 0, c.BinaryenTypeInt32());
    const y = c.BinaryenLocalGet(module, 1, c.BinaryenTypeInt32());
    const add = c.BinaryenBinary(module, c.BinaryenAddInt32(), x, y);
    var operands = [_]c.BinaryenExpressionRef{add};
    const inc = c.BinaryenCall(module, "inc", &operands, operands.len, c.BinaryenTypeInt32());
    _ = c.BinaryenAddFunction(module, "add_inc", params, results, null, 0, inc);

// Export add_inc.
    _ = c.BinaryenAddFunctionExport(module, "add_inc", "add_inc");

Running that code produces:

> zig run -Ideps/binaryen/src/ deps/binaryen/lib/libbinaryen.a ./lib/binaryen.zig -lc++

> wasm2wat -f hello.wasm
  (type (;0;) (func (param i32) (result i32)))
  (type (;1;) (func (param i32 i32) (result i32)))
  (import "runtime" "add" (func (;0;) (type 0)))
  (func (;1;) (type 1) (param i32 i32) (result i32)
    (call 0
        (local.get 0)
        (local.get 1))))
  (export "add_inc" (func 1)))

We could load runtime.wasm and hello.wasm into v8 separately, and that might actually make sense for interactive usage where runtime.wasm will be fixed but hello.wasm will be changing. But we can also combine them in advance using binaryen's wasm-merge tool to cut out the runtime overhead of calling the runtime through js-land.

> ./deps/binaryen/bin/wasm-merge runtime.wasm runtime hello.wasm hello -o merged.wasm      

> wasm2wat -f merged.wasm | grep export
  (export "memory" (memory 0))
  (export "add" (func 1))
  (export "add_inc" (func 2))

(This re-exports runtime.add which is maybe not ideal. Maybe I can read the code for wasm-merge and replicate what it does via the c api, rather than spitting all these files everywhere. But later.)

Finally we have a thing we can actually run.

> cat ./test.js
const wasmCode = Deno.readFileSync("./merged.wasm");
const wasmModule = new WebAssembly.Module(wasmCode);
const wasmInstance = new WebAssembly.Instance(wasmModule);
console.log(wasmInstance.exports.add_inc(4, 2));

> deno run --allow-read ./test.js

There's still a bit more finagling to do before jumping into actual compilation.

I want to implement the optimizations from the swiftlet paper, most of which boil down to replacing copies with pointers to stack-allocated values. But wasm has a protected stack - you can't pass around pointers to local variables.

The usual solution is to create a 'shadow stack' in linear memory which is managed by the language runtime. Any values whose address is taken has to be allocated on the shadow stack - that's almost every value if I do the swiftlet optimizations. This sucks. We can't do SROA on those values, and the wasm backend won't know that values on the shadow stack aren't aliased so it won't be able to move them into registers.

The other option - one that isn't available in c because it doesn't limit aliasing - is to only elide the heap copies but keep the stack copies. So foo(inout x) would copy the stack value of x into foo and copy the modified version back into x afterwards. This potentially means more memory traffic and limits us to backends which support multivalue returns, but on the other hand it's much more legible to the backend optimizer.

Probably I'll end up doing both and comparing them. I also have some vague plans later to include julia-style layout and type inference, which might swing the results more in favor the shadow stack because of the larger values. If you know of any experiments trying to measure the cost of the shadow stack in other languages, please let me know.

For the shadow stack, I'll likely fork WasmAllocator.zig and tweak it to reserve the lower end of the memory space.

I still have to figure out closure representation too, but I think that should be more standard. After that the actual compilation should be fairly mechanical.

automatically isolating bugs

This is fucking genius. Binaryen has a flag that records all the calls you make to their libarary and then emits a c program which makes the same calls. This means that they can easily reproduce any bug that you might report without having to deal with building and running your project.


I've very tentatively made a mastodon account... Not sure yet that I'll keep it, but worth a go.

I ended up deleting it.

Drew Devault argues that twitter is optimized for parasocial relationships but mastodon is optimized for social relationships. But I don't really see the difference - they both encourage the same identity-performing, context-collapsed interactions. Not that it's not possible to have a social experience, but the design doesn't make it easy.

Both are very effective at consuming time and attention though, which I'd rather spend actually socializing.

Programming with union, intersection, and negation types

Lots of gradual typing systems sort-of treat types as sets of possible values and allow various set operations on them, because those operations correspond nicely to various primitive operations in dynamic languages. But then subtyping is still done syntacically, leaving weird holes where the set-theoretic interpretation leads one to expect that code should type-check, but limitations of the type-checker cause it to fail.

This paper is a deep dive into what happens if you take the set-theoretic interpreation all the way. This yields a type system that is simple and intuitive and whose subtyping is decidable (although describing the subtyping algorithm is deferred to a different paper, and it sounds like it might be kind of expensive).

They give a general type system which is sound but for which inference is not very tractable.

One problem is that for a given function there may be an infinite number of types which it satisfies, but the set of types is not closed under greatest-lower-bound of an infinite set so there isn't a unique most-specific type to infer. Eg the function x -> x can have the type (int -> int) & (string -> string) & ((int, string) -> (int, string)) & .... Since the number of concrete types is infinite we can always add more cases but the type system cannot represent the intersection of an infinite number of cases. (I didn't run into this problem in imp because the number of concrete types was finite - no tuples and only a restricted form of higher-order functions.)

Another problem is that the combination of type-cases and recursive functions allows functions to ask about their own type, which can lead to contradictions eg let rec f = \x (f isa (true -> true)) ? false : true. This is much gnarlier than it first appears - the above problem tells us that we can't give a most-specific type to a function at compile time, so f isa T might require type-checking f against the type T at runtime! (Imp avoided this by monomorphizing all higher-order functions before type-checking. But this prevents dynamic dispatch on functions at runtime.)

They present 3 different restricted variants of the type system that each avoid the above problems, but with different tradeoffs.

In variant 1 all functions must have type annotations. Type-case expressions if (e isa T) ... require that e is a single variable rather than an expression and that T does not contain type variables. Type inference is now simple (unless polymorphism is added - in which case the details are deferred to two additional papers).

In variant 2 function types can be inferred, but inference won't infere intersection types or negated function types. Type-case expressions if (e isa T) ... additionally require that T is not a function type, to avoid the interaction between runtime and type inference. HM-style type-schemes and environments are used, but with some modifications to handle subtyping that I didn't follow at all. In comparison to regular HM systems this variant is able to give precise types to inexhaustive pattern matches, rather than approximating them with a supertype and failing at runtime on unmatched inputs.

In variant 3 intersection types can be inferred for functions and full occurrence typing is supported. Inference looks at the type-cases in the function to determine what splits of types might be useful in finding a proof that the function is well-typed. This type inference is performed over an intermediate language where all duplicate expressions have been combined, IIUC this to avoid invalid proofs where two copies of the same expression were given different types by the search? This variant does not support polymorphism and cannot be trivially extended to handle side-effects!

Combining any of these variants with gradual typing is tricky. I believe that the root of this is that applying a function to the Any type should always type-check (albeit with an inserted runtime type assertion) but that applying a function to an incorrect non-Any type should fail to type-check. This means that they can't just treat Any as the set of all values. Instead, type expressions containing Any are treated as denoting an interval of types. An alternative I'm considering exploring might be to syntactically distinguish function calls which are not expected to type-check ie f(x!) will insert a runtime assertion that x is the right type and will only fail to type-check if type inference can proove that that assertion cannot succeed.

It's interesting that they interpret the type a->b as (roughly) P(~(a x ~b)) which makes the set-based subtyping contravariant for function domains, as is usual in other type systems. Whereas in imp the type a->b is P(a x b) which gives covariant subtyping for function domains - a little weird. But imp has to go this way, because imp treats functions in every way as infinite relations so they are forced to have the same variance as finite relations. You can even take the union of two functions - typeof(f|g) = typeof(f) | typeof(g) whereas in cduce typeof(f|g) = typeof(f) & typeof(g).

I have no idea how to combine any of these ideas with Julia-style typing where types are first-class, attached to values and drive layout. For example in Julia T1 and Union{T1,T2} have different layouts in memory - it's not safe to approximate the former with the latter.

Julia also doesn't try to assign types to functions, instead relying on monomorphization before type inference. Monomorphizing before code generation is very expensive, but I wonder if monomorphizing only for type inference would be reasonable. It might be worth trying to measure how much time Julia typically spends in type inference.

It's also notable that some of the difficulties come from the presence of recursive types. It might be worth exploring whether banning recursive types is enough of a simplification to be worthwhile.

The Design Principles of the Elixir Type System

Applying the above to the elixir language.

Like with the Any type in the previous paper, when inferring types from guards they track an interval of types - the 'true' type being contained somewhere in that interval.

They can't insert dynamic checks, so they have an interesting trick where they distinguish functions types which throw an error on out of domain values from those which don't. The latter, when applied to an Any type will still type-check but will return an Any type intersected with the usual inferred return type. IIUC the intersection allows them to make use of the inferred return type for later type-checking, but also track that it's not a sound inference.

The work isn't finished yet, so it's not clear what approach they're going to take in regards to the various restrictions in the paper above.

Using computers more freely and safely

Prefer software with thousands rather than millions of users, that doesn't change often, that seems to get forked a lot, that can be modified without specialized tools, and, ideally that you can make small changes to. Yourself. In a single afternoon.

We implicitly assume that all software benefits from efficiencies of scale. The more users we can serve with a single piece of software, the more they'll all benefit from the pooling of developer time.

But this ends up being disempowering. Software with millions of users tends to be incredibly complex and is difficult to understand, maintain and modify. So even if the software is open-source, you don't have to economic freedom to maintain your own fork. You can only petition the maintainer team to satisfy your needs.

This call isn't to all to go back to hand-writing bits with a magnetized needle, but just to recognize that big software ecosystems have costs as well as benefits, and that sometimes you might be better off with small, easily-tweaked, low-maintenance software.

See also an app can be a home-cooked meal.

Or the metaphor I've used before: the existence of skyscrapers and cranes didn't make home depot irrelevant. Sometimes you just want to fit a tiny bunk-bed in the back of your van and noone is gonna run a business serving that niche. It's useful to keep home tools in existence.

The internet didn't kill counterculture - you just won't find it on Instagram

Argues that counter-cultures are about refusing the underlying value system of the mainstream society. The hippies refused to be professional and patriotic. The punks refused hierarchal systems and deference to authority. But a society based on neoliberalism and funded by advertising doesn't care how you dress or how you cut your hair. You can behave however you want as long it gets eyeballs on ads. You can't rebel against youtube on youtube, because it doesn't care.

But behind 6ix9ine's self-loyalty is an unwitting loyalty to the platform and, by extension, to the shareholders of Alphabet and Facebook, Inc. And this is where it gets tricky. To be truly countercultural today, in a time of tech hegemony, one has to, above all, betray the platform, which may come in the form of betraying or divesting from your public online self.

So today's countercultures exist in private groups, or offline entirely. Because it isn't possible to resist the values of adtech while living in it's spaces.

The crux of Liu Cixin's book is the creed, when called by the clearnet: "Do not answer! Do not answer!! Do not answer!!! But if you do answer, the source will be located right away. Your planet will be invaded. Your world will be conquered."

It hang together as a theory, but how would you verify it?

The idea of countercultures does remind me though of SerenityOS. Not that they're hidden away in the dark forest, but that they resist in place by pouring their attention into a creation that can't possibly serve the hyperscaling industry and won't ever make anyone rich. "This is a system by us, for us, based on the things we like."

It's not that we should never let capitalism eat anything, but that it's worth understanding how to stop it from eating everything. We're the ones hiding in boxes now.

I Have America Surrounded

I really like that John Higgs can write about a controversial character and focus on describing them rather than judging them. It's an important skill for writing about someone like Timothy Leary, who inspired such affection and devotion from so many people, all of whom he appears to have treated with a sociopathic disregard.

Certainly the surrounding history is fascinating. Learning, for example, that the CIA regularly dosed their own agents with lsd without warning or consent certainly explains a lot of the weirder stuff they got up to in that period. Or, for another example, that at least part of the motivation for making lsd illegal was that Nixon was convinced that recreational drugs (and homosexuals) were part of a communist plot to weaken the US.

Saving Time

Opens with an examination of how industrial/capitalist notions of time and value made their way into general culture and affect the way we think about our own lives. Early industrialists regularly had to resort to violence to increase the amount of work per person. Much easier to employ workers who were raised with work as their source of self-worth and a deeply seated belief in the moral value of never being satisfied. It's particularly obvious in the contrast when colonists try to extract maximum labour from natives who haven't been soaked in that culture and so don't default to individualist, zero-sum thinking.

The rest of the book meanders around different ways to experience time and how notions of time relate to various social justice issues. I felt it held together less coherently - each individual idea made sense but trying to tie them togethe as different understandings of time didn't feel helpful. Perhaps the idea itself just isn't ready yet, as if the book was published before it finished baking.

[...] unlike the Ancient Greeks, who imagined that, someday, machines might replace slave labor so that everyone might enjoy some free time, capital only "frees time in order to appropriate it for itself." In other words, the goal of capitalism is not free time but economic growth; any time freed up goes right back into the machine to increase profits. Thus the paradox: The factory is efficient, but it also produces "the drive toward the consumption of the person's time up to its outermost, physical limit."

Although it answers to no one (else), an achievement-subject nonetheless "wears down in a rat race it runs against itself": "The disappearance of domination does not entail freedom. Instead, it makes freedom and constraint coincide. Thus, the achievement-subject gives itself over to compulsive freedom - that is, to the free constraint of maximizing achievement. Excess work and performance escalate into auto-exploitation. [...] This same limitlessness is what leads the achievement-subject toward burnout. Trained to set her sights on infinity, she never experiences the feeling of having actually reached a goal and, instead, exhibits the "auto-aggression" of the master and mastered rolled into one.

As Taylor's years-long battled showed, workers would have some control over the pace of work as long as they held knowledge about the work process.

...workers found they actually wanted more than just money and the trappings of leisure - they wanted not to have to sell their time in the first place.

If time management is not simply an issue of numerical hours but of some people having more control over their time than others, then the most realistic and expansive version of time management has to be collective.

In positing all of human existence as an endless striving toward market society, neoliberals had to erase not just the possibility of a future but all memory of a past when humans managed to organize themselves in other ways. The kinds of tools needed to navigate out of the climate crisis - things like public ownership, full employment, or even just tough regulations - have receded into memory.

The Boy Who Could Change the World

A collection of blog posts from Aaron Swartz. For all that I admire his work and his values, it was dull reading. Maybe I just saturated on that material in my teenage years.

Stranger Than We Can Imagine

Trying to explain the 20th century by linking together special relativity, modern art, freudian psychology, neoliberal individualism etc. The theme works superficially but I don't think it means anything.

Finite And Infinite Games

My second attempt to read this because people keep recommending it. Still couldn't get through it. I just don't think that redefining words and cleverly juxtaposing rearranged sentences is a path to insight about reality.

The contradiction of finite speech is that it must end by being heard. The paradox of infinite speech is that it continues only because it is a way of listening. Finite speech ends with a silence of closure. Infinite speech begins with a disclosure of silence.

Bunch of deepities.