0045: unexplanations, business things, zest progress, internal consistency repro, why murat blogs, compiler books + papers, compiling sql to wasm, other books

Published 2024-03-28

I wrote a lot this month.

More unexplanations:

Some musings about small software businesses:

And some notes from work on zest:

zest progress

I kept chipping away at the single-pass compiler for zest-kernel. It basically works, including function specialization, but the strain is showing. I don't think I'm going to be able to add anything else to it. I tried adding closures and it got gnarly. Hence the notes on IR design :D

I have a pretty poor understanding of the tradeoffs involved in various IR styles so I'm suffering a little paralysis by analysis at the moment.

Attempting to Understand Internal Consistency

They reproduced the results from my internal consistency test, and also found that when they enabled parallelism (which I didn't do in the original) they lost eventual consistency too.

I heard separately that enabling minibatches for joins hides many of the intermediate inconsistent values. But it doesn't actually solve the problem of synchronizing joins, so it's probably just making the inconsistency harder to demonstrate with a simple example.

why murat blogs

I think my blog reviews of papers hits a good niche. Research papers are written for the wrong audience (or rather maybe the right audience but for the wrong reason): they are written to please 3 specific expert reviewers who are overwhelmingly from academia. Thus much of the benefit from the research and writing goes wasted. If we didn't have this objective of having to look impressive for peer-reviewing (and the resulting costly signaling effect), I believe we would be able to learn way more from the research papers. The authors would aim to educate rather than impress. They would not need to be defensive about their work, and would introspect about their learnings and their thought processes. In effect, this is what I do on their behalf when I write a blog review for my understanding of the papers.

compiler books

Essentials of compilation. Pedagogically excellent, as expected from racket folks. Probably not a practical way to actually build a compiler (the chez scheme nanopass compiler reportedly has >50 passes - without fusion that must soak up a ton of memory bandwidth). But it seems like a helpful way to think about compilation, even if I have to fuse the implementation into just a few passes.

Static program analysis. Clear and straightforward explanations (at least if you already have some math background). I don't have to deal with first-class functions or aliasing, so most of the book is probably not relevant right now. But it's nice to have a composable framework to think about the simple analyses that I was just intuiting so far.

Skimmed:

compiler papers

Intermediate Representations in Imperative Compilers: A Survey was useful for understanding the pros and cons of different IRs, and also which IRs have actually been used in practice (I assume the others will contain surprises). At least as of 2013.

Simple and Effective Type Check Removal through Lazy Basic Block Versioning. Given an IR using basic blocks with arguments, can jit-specialize each block once the types of it's arguments are known. Since block terminators are often branches which encode type-checks, this works pretty well with no additional tracing needed and with very little warmup time. Code explosion is avoided by limiting the number of versions of each blocks and compiling a non-specialized fallback if this number is exceeded.

Compiling without Continuations. Implemented in ghc. In a functional IR you can express something like basic blocks by using closures, but the compiler might later not be able to prove that the closure doesn't actually need to be allocated. Instead they add a 'join point', a closure-like thing that can only be tail-called. This unlocks some optimizations that are easy in cps, without dealing with the headache of trying to read cps. I'm not sure whether this style of IR makes sense for a strict language though, so not sure how applicable it is elsewhere.

Fast Compilation and Execution of SQL Queries with WebAssembly

I'm hoping to piggyback on wasm backends to get fast compilation and reasonable performance for zest. It's promising to see good results from similar experiments.

Details are mostly unsurprising. They do full specialization of everything directly in their compiler. Hashtables etc are baked in.

They had to patch v8 to allow running wasm against existing memory allocations. Then provide callbacks that use virtual memory mappings to map chunks of their tables into the 4gb memory space. Output has to be chunked similarly. Wouldn't be necessary with wasm64.

Evaluated on some synthetic queries. Seems competitive with duckdb. But we're comparing both execution methods and query planning. Where did their query plans come from?

Compilation times in the ms. Comparable to duckdb planning times in the tidy tuples paper!

Not clear whether they're measuring this fairly though. Could be reporting compilation time for first tier, and runtime after tiering up. Would look to see a graph of numbers of queries executed vs time elapsed, to characterise the warmup time. I guess the queries they're using take long enough that they tier up during the query though.

other books

A life of meaning. I abandoned this pretty quickly.

Passion paradox. It didn't really make sense to read this after not liking the other two books by the same author last month, but I already have it on hold at the library...

Deadlift dynamite. Recommended to me by a tiny old lady at the gym who was deadlifting twice her bodyweight. Possibly out of concern that I was going to hurt myself. It's pretty chaoticly written but the sections on technique and programming seem useful. Also in the mobility section they talk about opening your pelvic girdle which... isn't that just a solid bone? TIL.

The man from the future. A fun biography of von Neumann, written by someone who has a decent understanding of many of the subjects von Neumann pioneered. I enjoyed it. Fun fact: someone tried to patent the idea of the stored-program computer, and failed mainly because von Neumann had been mailing detailed technical reports to dozens of universities during the course of designing the computer. Johnny played innocent, but given that his express goal for much of his career was to accelerate the rise of computing as much as possible, it seems likely that he deliberately sabotaged the patent. I can't imagine how much the industry would have been held back if such a patent had been granted.