0013: till death do us part, minimum wage, dida free, implicit ordering in relational languages, ultralearning, responses to against sql, oracle decorrelation, gede improvements, antisponsoring, convivial design heuristics, knowledge transfer, crafting databases, rust complexity, antitrust, gelly, shakti, lumosql, anti-marketing, NAAL, ledger of harms, tonsky icfp, debugging stories

New stuff:

I got married! Great for me, sucks for anyone hoping for more code / blog posts this month :)
At some point this month my income from sponsorships passed minimum wage. I'm about 2/3rds of the way towards meeting all my expenses and being able to do this indefinitely. Six months ago that seemed like an impossibly unrealistic goal.
Dida updates:
- The core now has actual memory management. All the tests are running with leak and double-free detection turned on.
- More tests. More bug fixes. Several known bugs remaining.
- Debugging this with printlns is very difficult, so I decided to leave the known bugs for now and start work on the graphical debugger. The initial version will just let me step forwards and backwards through text dumps with collapsible sections, which would already be a big improvement. So far I have code to capture debug events/state with a compile time switch and I put down all the glue code for running the debugger in the browser, but I've yet to generate any actual html.
Implicit ordering in relational languages. See also similar thoughts from Frank McSherry the day before.
Notes on Ultralearning. One of those books that would have been incredibly useful if I had read it 10 years ago, before learning much of this the slow way. (I read this two years ago - I'm lagging behind just a little on transcribing notes.)
Responses to comments on 'Against SQL'. Clearly mentioning json at all was a mistake - I should have used windows as an example instead. I also added a link to an article on how hasura relies on using json with subqueries on postgres to solve the multiple roundtrip problem.
In How materialize and other databases optimize SQL subqueries I didn't test oracle because it seemed like a hassle to setup and instead just relied on the documentation. Someone from oracle got in touch this month to explain that the documented limitations are out of date and that recent versions do much better. They also tried to get me access to the oracle cloud but the free tier still requires a credit card and rejected mine, so no luck there so far. But apparently the documentation will be updated at some point.
The author of gede got in touch about the problems I ran into in Looking for debugger. Some have been fixed, some I can no longer reproduce in recent versions and some are inherent to the terrible gdb interface. I've updated the article accordingly. They also mentioned that they're planning support for lldb which has a much saner interface. Gede is the closest to actually working of any of the debuggers I tried on linux, so I'm looking forward to seeing what they can do when they aren't hamstrung by gdb.

I added this line to my sponsors profile:

Please don't sponsor me if it feels like a lot of money! I'm doing just fine and you have no obligation at all to support me. The sponsor button is for those folks who are making silly big-tech money and want to give some back to basic research.

Prompted by someone apologizing for cancelling their sponsorship.

And also by the uncomfortable feeling of seeing grad students as sponsors - y'all need that money for ramen.

Convivial Design Heuristics for Software Systems

As we push against the frontier of absolute capability of our systems, and increase the capability of their most expert and advanced teams of users, we tend to shrink both the relative and absolute capabilities of the 'unit individual'. I assert this without proof, but believe it is a phenomenon increasingly observed and accepted. It is familiar, for example, to the generation who grew up programming early microcomputers and nowadays wrestle to achieve comparably simple feats using a 'modern' web framework. Although the modern technology is more sophisticated in many ways, this rarely translates to less work being needed to accomplish a simple human-meaningful task.

...A further weakness of the movement is that many free software projects have proven amenable to corporate capture, and not coincidentally, this has most often occurred in those projects where a huge industrial-strength team is required simply to tread water. Therefore, perhaps we should start to recognise software projects not only on their quality or completeness, but on their tractability to individual contributors and customisers.

I think about this a lot.

Programming tools and practice are increasingly optimized for large, industrial-scale efforts. This isn't uniformly bad for individual efforts eg the widespread availability of reliable open-source libraries benefits efforts at both ends of the spectrum. But the needs of large companies not only directs funding, it also heavily influences what is even considered justifiable research. It's very difficult to remain focused on increasing the leverage of individual people when the surrounding culture barely understands the concept.

Perhaps the only way to avoid unconsciously absorbing the values of industrial software is to avoid contact with it entirely, and instead try to immerse oneself in communities with different values.

A related idea that keeps coming back to mind:

Knowledge isn't accretionary by default. Everything we know about computing, the bulk of which is tacit, has to be transmitted to or relearnt by the next generation. If all we do is keep adding layers, eventually parts of the body will fail to be transmitted. Someone has to be working on collapsing the layers.

The problem is similar to what distill calls research debt. I see efforts like nand to tetris, crafting interpreters and handmade hero as successes on this front.

But just as important as transmitting knowledge is figuring out what knowledge to transmit. I don't think much of the chances of trying to reduce complexity by going backwards in time to the 'good old days' - we've made real advances. But I want good value for money out of any complexity we adopt. For any given problem domain, what are the 5 most important ideas? Is there an 80/20 version of modern computing? One of the few projects that comes to mind along these lines is QBE whose mantra is '70% of the performance of advanced compilers in 10% of the code.'

Speaking of 'Crafting Interpreters', there seems to be very little information out there about how to build a database. I know of some collections of important papers about database design, but nothing that explains how to actually build one. Database Management Engines is the closest I could find and it's wildly out of date.

...Is it just too hard to implement sql in a single book?

Two recent posts on unstable rust features and stabilizing GATs fanned my worries about the growing complexity of rust.

I have a very hard time predicting what is and isn't possible in the rust type system. The LendingIterator in the GATs post is an example I ran into myself in the initial design for the Row in materialize. It took me a full day to understand that it simply wasn't possible to write an iterator over serialized Rows as initially designed, because there was no way to reference the lifetime of the buffer used for unpacking them. Even after years of full-time experience, it's not unusual for me to design an interface in rust and only days later realize that it will make some important usecase un-typeable.

It also seems increasingly likely that there will never be another rust compiler. At this point it needs a full-time team of engineers just to keep the wheels on. That isn't unusual - there are only 2 usable C++ compilers in the world, and 2-ish browser families. For a mega-scale company like google having a full-time team just to maintain one of your tools is trivial. But that level of complexity makes it much harder for an individual to understand and own their toolchain. At the other end of the spectrum are projects like zig or sqlite, both of which explicitly aim to keep the entire project maintainable by a small team indefinitely.

In case you missed it, Biden is trying to revive antitrust. Antitrust has had an anemic few decades in the US, culminating in judgments like allowing facebook to buy instagram because facebook is the underdog in the camera-app market. I have zero understanding of the obstacles facing the new antitrust appointees, but I hope they can do something about the current model where the big tech companies get away with things like not paying any taxes and justify it by pointing at their amazing innovations, most of which were actually created by other companies that they bought using all the spare cash they have from not paying their taxes.

Among the email responses to 'Against SQL' were:

Gelly - graphql extended with relational algebra operations.
Shakti - a database apparently written in k. The documentation is about what you'd expect from someone who likes k, so it's hard to figure out much about it. The benchmarks are interesting, but it's silly that they compare OLAP queries against a bunch of OLTP databases. I'd like to see numbers in there for memsql or duckdb.

LumoSQL is a project benchmarking sqlite with various different backends. This is beautiful thankless work.

The National Assessment of Adult Literacy says that half of US adults fail problems like "Determine whether a car has enough gasoline to get to the next gas station, based on a graphic of the car's fuel gauge, a sign stating the miles to the next gas station, and information given in the question about the car's fuel use."

The Center of Humane Technology keeps a Ledger of Harms - links to experiments purporting to show the harms of social media. I'm torn by this - on the one hand I see a lot of evidence around me that social media is harmful, but on the other I'm wildly dubious of the validity of most psychology research so a page full of single-study results is not compelling.

When some people treat the minds of other people as a resource, this is not 'creating wealth' it is a transfer.

From The World Beyond Your Head by Matthew Crawford.

Tonsky wrote about his ICFP contest experience. I was struck by how much value he got out of various off-the-cuff gui tools, and by the fact that I wouldn't have been able to build any of those in a week, let alone one weekend. And I do often have problems that would benefit from this kind of tooling.

The dida debugger is going to have to be web-based because I want to put interactive examples in the documentation, but that adds a huge amount of complexity. It took me a few days just to get all the glue code working and I haven't even started on the actual ui yet.

For future projects I think it might be well worth my time getting really fluent with something like imgui that I can easily drop into an existing project while debugging, without any elaborate glueing or build-code-editing.

I've also been thinking about how to get better at debugging, since that occupies the bulk of my time lately.

Part of the difficulty is that it's hard to practice. There isn't any nice body of practice problems. Artificial codebases are likely too small to test key skills, but learning a real body of code takes more time than most people would want to devote to a practice session. Also, I can't supply my own problem set because by the time I have the problem nicely isolated I already know the answers.

One thought I had is to make a tool that subtly mutates existing code eg deleting a free at random and then challenging the programmer to find the leak.

I've also thought about trying to isolate example bugs from my own code - record my own debugging process and then see if other people can demonstrate more efficient processes for the same bug.

Andy Matuschak wrote about the idea of anti-marketing - focusing on talking about the parts of his projects that are uncertain or don't work well.

I think it's going to be very difficult to predict or understand the performance of imp code. Especially the space usage - there is no indication of when indexes are being created for incremental maintenance.

Making everything a relation makes it easy to have bugs where eg something was expected to always have 1 element and instead has 0 or more than 1 elements in some edge case.

Set semantics make it easy to accidentally drop rows in aggregates.

The language is built around incremental maintenance, but I'm not confident I can get incremental maintenance to perform well enough or predictably enough to make it worth using.

More than anything else, I'm intimidated by the amount of work required to bring both imp and dida to a usable level, especially when compared to my typical rate of progress and level of motivation. Can I actually finish this thing?