0011: dida release, DD reading, antirez and small tech, reactive ui, how to test, doom vs memory safety, state of academia, wasm-bindgen, apl compilers, relational.ai and salsa.jl

I made the dida repo public. I still have a lot of work to do but:

It works!
I came up with a slightly different way of handling progress tracking that I think simplifies the implementation. DD's algorithm encapsulates subgraphs so that they look like single nodes in the parent graph. This requires nodes to be able to have multiple outputs and report complex transformations on timestamps over each of the internal edges between their inputs and outputs. My tweaked version of the algorithm requires neither and also dispenses with the entire notion of path summaries.
I took a first pass at the docs explaining how everything works.
I added nodejs bindings. One of the goals is to be a better target for bindings to other languages than DD, so I figured I'd get that in from the beginning in case it revealed any major design issues. All good so far. The bindings are also full of neat examples of zig's metaprogramming abilities.
Porting strongly-typed apis between different languages is really difficult, so the plan for handling language bindings and being a better interpreter target is to separate the library into a minimally-typed data-centric core and then layering per-language sugar on top. I sketched out a proof-of-concept sugar layer for zig (example here).

The readme has a detailed list of the remaining things that need doing before reaching a usable version. There are many things on the list :S

For understanding how DD works, I relied heavily on:

Shared Arrangements: practical inter-query sharingfor streaming dataflows which is nominally just about the indexes, but incidentally has to explain how the latest version of DD works, albeit very briefly.
Verified Progress Tracking for Timely Dataflow which has an Isabelle proof of the progress tracking algorithm. I didn't quite get the right intuition the first time around - it's easy to overlook the single sentence "Importantly, only minimal timestamps may be propagated..." which is in fact the key part of the algorithm.

But in the end I still had to spend a lot of time puzzling over the source code.

In small tech I wrote:

One of the most interesting things about tarsnap is that it could easily be making more money. If tarsnap had investors or shareholders, they would have already forced Colin to do so. In other words, the only reason that Colin gets to continue running the kind of business that he wants to run is that he owns it entirely and noone can force him to maximize profit instead.

It seems like this is kinda what happened to antirez? Although maybe not so much by a heavy hand as by sustained social pressure and internalized responsibilities.

Now I'm asked more and more, by the circumstances created by a project that became so important, to express myself less and to maintain the project more. And this is indeed exactly what Redis needs right now. But this is not what I want to do, and I stretched myself enough during the past years.

A few years back Raph Linus wrote an excellent overview of possible approaches to incremental maintenance of functions that define UIs. It's an old post but I keep coming back to it, because my mental map of incremental systems doesn't quite capture all the variation on display here. It's not as simple as unstructured vs structured - many of the systems in Raph's post display a mixture of both.

How to test.

This video about an unlikely memory-safety bug in doom is interesting in it's own right, but I've also been using as an example of two different contexts for memory-safety. I see a lot of comments like:

C++ programming is not the morass of use-after-free that Rust people seem to think it is.

I think the disconnect here is the difference between memory safety bugs that a regular user will encounter by accident and memory safety bugs that an adversary can discover by deliberately forcing your program into extreme parts of the state space.

Doom is a pretty reliable game. It's programmers probably didn't struggle with memory safety day-to-day. But if there was something to be gained from breaking it, it wouldn't have been long before people were building piles of zombies.

Honk

Two recent takes on the state of academia:

If the world worked the way it does in the picture books, I would do research by getting a grant in a nice lab somewhere instead of panhandling on the internet. But computer science academia seems to put a lot of obstacles in the way of actually doing good work.

My plans for wasm bindings for dida (which also informed the node.js bindings, because I want to share most of the code between them) were made a lot easier by this incredibly informative talk on how wasm-bindgen works.

The release notes for zig 0.8.0 talk about the reworked memory layout for the IR in the new compiler.

The ensuing discussion lead to A data parallel compiler hosted on the GPU:

This work describes a general, scalable method for building data-parallel by construction tree transformations that exhibit simplicity, directness of expression, and high-performance on both CPU and GPU architectures when executed on either interpreted or compiled platforms across a wide range of data sizes, as exemplified and expounded by the exposition of a complete compiler for a lexically scoped, functionally oriented programming commercial language. The entire source code to the compiler written in this method requires only 17 lines of simple code compared to roughly 1000 lines of equivalent code in the domain-specific compiler construction framework, Nanopass, and requires no domain specific techniques, libraries, or infrastructure support. It requires no sophisticated abstraction barriers to retain its concision and simplicity of form. The execution performance of the compiler scales along multiple dimensions: it consistently outperforms the equivalent traditional compiler by orders of magnitude in memory usage and run time at all data sizes and achieves this performance on both interpreted and compiled platforms across CPU and GPU hardware using a single source code for both architectures and no hardware-specific annotations or code. It does not use any novel domain-specific inventions of technique or process, nor does it use any sophisticated language or platform support. Indeed, the source does not utilize branching, conditionals, if statements, pattern matching, ADTs, recursions, explicit looping, or other non-trivial control or dispatch, nor any specialized data models.

I've never had any interest in APL before, but this paper makes me want to spend some time in that world.

It's also of interest because I've been musing about how to write an imp compiler in imp. The tree representation used in this paper might be a good fit.

I worked at RelationalAI a while back (see a recent overview here).

Last week they put up a demo video. It looks like their language is coming along nicely.

They also released the library they use for incremental maintenance - salsa.jl, based heavily on salsa-rs. I have salsa pegged as an unstructured system - sacrificing some performance to allow writing arbitary imperative code. So it's an interesting choice for a declarative relational language, which makes me think they know something I don't. I look forward to figuring out what it is.