0024: HYTRADBOI postmortem, HYTWACFI?, preimp, emergent ventures, data and reality, merkle search trees, readyset, julia compilation times

Published 2022-05-28

All the HYTRADBOI videos are now up at hytradboi.com (click on the talk titles).

I also wrote a post-mortem.

I'm still not sure whether I want to do another HYTRADBOI, since it took a huge chunk of energy and attention out of my year. But one option I'm considering is to aim for dramatically less polish:

I think I could probably get the total cost down to ~2 weeks of work and a few hundred dollars, which is a lot more appealing.

It looks like HYTRADBOI is going to have cousins!

Now that all that HYTRADBOI craziness is over I'm focusing on preimp. The goal is to work towards the programming experience that I envisioned for imp, but hack it together in clojure to avoid getting bottlenecked forever on language design.

So far I have a simple clojure notebook with coarse-grained incremental maintenance. There are data cells that can be mutated by other code, with the changes being persisted back into the cell. The notebook is backed by a simple crdt, so once I finish hooking it up to the server it will allow collaborative editing of code and data. Next step after that is rendering the output values as interactive widgets, and then adding bidirectional editing so that interacting with the widgets for derived views can push changes to upstream data.

It's all very hacky and I don't think I'll be able to achieve the same quality of experience on top of clojure (eg a live repl is out of reach), but I do think I'll be able to actually finish a usable demo. Quantity has a quality all of it's own.

I have a short list of simple CRUD apps that I want to build for specific people. I hope to be able to build everything on the list in preimp within the next few months, at which point I'll have some real experience to feed back into the design of the imp language.

I applied for an Emergent Ventures grant to work on imp. I have no idea how competitive it is but the initial application only took a few hours so it's low cost.

Also once I'd spent the time to explain the vision all in one place I was excited to get back to work.

I have been wanting to read Data and Reality for a long time but I've only been able to find the 3rd edition in print, which I've heard was ruined by the editor cutting out large sections and inserting his own heavy commentary throughout. Jonathan Edwards pointed me at this pdf of the 2nd edition.

It's slow reading, but I think it's helpful. I've been thinking a lot about the fine details of how to model data in imp, and this book is a cornucopia of counter-examples and tricky edge cases.

It's also lending weight to my belief that it's a tactical mistake to think of data-modelling as being about modelling the world, which quickly leads to being buried neck-deep in philophical questions. Better to focus on modelling the desired behavior of the computer system, which is at least tractable.

So rather than asking eg "when are two things actually the same thing", you can ask "when will our system need to treat these two things differently".

Merkle search trees are a kind of btree where the shape is entirely determined by the contents rather than by the order of operations. This means that you can efficiently diff them. The authors apply this to merging state-based CRDTs, but I think it could also be useful in DBSP. One of the pros of DBSP is that it allows easily mixing incremental and non-incremental operations, but at the cost of having to diff successive values at the boundaries. With most representations of collections that could be very expensive but with Merkle search trees the cost should be proportional to the size of the diff output.

The noria folks are working on a commericial version. It's not clear from the docs whether it's still not internally consistent (I'll test it once released) but I can imagine it being useful anyway - internal consistency issues are hard to trigger with simple CRUD queries.

Some criticism of julia and some more criticism of julia. I actually didn't run into many bugs myself (maybe because I didn't use many advanced libraries) but the compilation times are brutal and interactive development is not a panacea. Eg I have a script that reads a 1300 line csv, does some simple calculations and draws a graph. The first run in a repl takes 27s and subsequent runs still take 0.5s (of which 91% is compilation time). For more complex projects I would start seeing multi-second pauses when reloading a single module in the repl.

I really like the design of the language though, especially the way that it feels very dynamic but has predictable performance and catches type errors much closer to their source than other dynamic languages do. I've been wondering if it's possible to find a sweeter spot along the tradeoff between performance and compilation time. I suspect (with no measurement/evidence) that specialization (especially removing dynamic dispatch and pre-computing sizes on the stack) is responsible for a non-trivial percentage of the perfomance difference vs eg python, but that LLVM IR -> native code is what takes up most of the compilation time. So maybe a Julia-like language that specializes functions to interpreted bytecode rather than compiling everything to native bytecode would still be able to get a significant performance boost vs other dynamic languages.

Other things: