0024: HYTRADBOI postmortem, HYTWACFI?, preimp, emergent ventures, data and reality, merkle search trees, readyset, julia compilation times

All the HYTRADBOI videos are now up at hytradboi.com (click on the talk titles).

I'm still not sure whether I want to do another HYTRADBOI, since it took a huge chunk of energy and attention out of my year. But one option I'm considering is to aim for dramatically less polish:

Don't do ticketing at all - make everything open access.
Ask the speakers to provide their own captions.
Publish all the talks on the website.
Use a federated matrix space for the official chat and require attendees to sign up on a public homeserver.

I think I could probably get the total cost down to ~2 weeks of work and a few hundred dollars, which is a lot more appealing.

It looks like HYTRADBOI is going to have cousins!

OK, since I seem to have some real interest from my weekend musings, would you be interested in a compiler/runtimes-oriented conference in the vein of https://hytradboi? @sc13ts wdyt? I'd likely call it Have You Tried Writing a Compiler For It?
— Gerred (@devgerred) May 16, 2022

Working on getting at least the "Have You Tried Writing a Compiler For It?" CFP up for tomorrow. I have the domain, but it doesn't quite flow as well as @hytradboi. Does anyone care about phonetics here? Is there a more flow-y name? I'm fine with how it is now but figured I'd ask
— Gerred (@devgerred) May 23, 2022

Now that all that HYTRADBOI craziness is over I'm focusing on preimp. The goal is to work towards the programming experience that I envisioned for imp, but hack it together in clojure to avoid getting bottlenecked forever on language design.

So far I have a simple clojure notebook with coarse-grained incremental maintenance. There are data cells that can be mutated by other code, with the changes being persisted back into the cell. The notebook is backed by a simple crdt, so once I finish hooking it up to the server it will allow collaborative editing of code and data. Next step after that is rendering the output values as interactive widgets, and then adding bidirectional editing so that interacting with the widgets for derived views can push changes to upstream data.

It's all very hacky and I don't think I'll be able to achieve the same quality of experience on top of clojure (eg a live repl is out of reach), but I do think I'll be able to actually finish a usable demo. Quantity has a quality all of it's own.

I have a short list of simple CRUD apps that I want to build for specific people. I hope to be able to build everything on the list in preimp within the next few months, at which point I'll have some real experience to feed back into the design of the imp language.

I applied for an Emergent Ventures grant to work on imp. I have no idea how competitive it is but the initial application only took a few hours so it's low cost.

Also once I'd spent the time to explain the vision all in one place I was excited to get back to work.

I have been wanting to read Data and Reality for a long time but I've only been able to find the 3rd edition in print, which I've heard was ruined by the editor cutting out large sections and inserting his own heavy commentary throughout. Jonathan Edwards pointed me at this pdf of the 2nd edition.

It's slow reading, but I think it's helpful. I've been thinking a lot about the fine details of how to model data in imp, and this book is a cornucopia of counter-examples and tricky edge cases.

It's also lending weight to my belief that it's a tactical mistake to think of data-modelling as being about modelling the world, which quickly leads to being buried neck-deep in philophical questions. Better to focus on modelling the desired behavior of the computer system, which is at least tractable.

So rather than asking eg "when are two things actually the same thing", you can ask "when will our system need to treat these two things differently".

Merkle search trees are a kind of btree where the shape is entirely determined by the contents rather than by the order of operations. This means that you can efficiently diff them. The authors apply this to merging state-based CRDTs, but I think it could also be useful in DBSP. One of the pros of DBSP is that it allows easily mixing incremental and non-incremental operations, but at the cost of having to diff successive values at the boundaries. With most representations of collections that could be very expensive but with Merkle search trees the cost should be proportional to the size of the diff output.

The noria folks are working on a commericial version. It's not clear from the docs whether it's still not internally consistent (I'll test it once released) but I can imagine it being useful anyway - internal consistency issues are hard to trigger with simple CRUD queries.

Some criticism of julia and some more criticism of julia. I actually didn't run into many bugs myself (maybe because I didn't use many advanced libraries) but the compilation times are brutal and interactive development is not a panacea. Eg I have a script that reads a 1300 line csv, does some simple calculations and draws a graph. The first run in a repl takes 27s and subsequent runs still take 0.5s (of which 91% is compilation time). For more complex projects I would start seeing multi-second pauses when reloading a single module in the repl.

I really like the design of the language though, especially the way that it feels very dynamic but has predictable performance and catches type errors much closer to their source than other dynamic languages do. I've been wondering if it's possible to find a sweeter spot along the tradeoff between performance and compilation time. I suspect (with no measurement/evidence) that specialization (especially removing dynamic dispatch and pre-computing sizes on the stack) is responsible for a non-trivial percentage of the perfomance difference vs eg python, but that LLVM IR -> native code is what takes up most of the compilation time. So maybe a Julia-like language that specializes functions to interpreted bytecode rather than compiling everything to native bytecode would still be able to get a significant performance boost vs other dynamic languages.

Other things:

Tickets are up for Handmade Seattle 2022. I can't commit yet, but it's likely I'll go in person this year.
"...what I'm going to do is use a simplified machine model that allows us to make quantitative predictions about the behavior of straightforward compute-bound loops"
I somehow didn't know that SQL has asserts.
windbg lets you write linq queries over the debugger state instead of wresting with some adhoc dsl.
magic-trace seems neat but I don't have any perf problems to point it at at the moment.
Why is modern architecture so ugly?
ZNG is a neat data notation with first-class types. Comes with a query language and browser. At first glance it seems fairly well thought out.
A detailed history of websql. I can't say I would have liked the whole of sqlite to end up in the web spec, but I'm not wild about how hard it is to build anything useful on top of IndexededDB either. Among other limitations, making access async made it impossible to read from the db in event handlers, which means that any important state has to be replicated in memory.
Chiselstrike is another backend-as-a-service company along the same lines as eg convex. Notable for being open source and self-hostable.
One of the authors of Malloy started a series explaining the design decisions that went into it.