0021: hytradboi schedule + tickets, imp v3 ideas, real world of technology, changing minds, essence of software, typed image-based programming with structure editing, fosdem 2022, introspecting async

Published 2022-02-21

HYTRADBOI

I just published the schedule and you can now buy tickets!

Imp

In imp v2, the database was accessible as a value. You could write code that described a transaction and then use a keyboard shortcut in the editor to apply it to the database (demo). These transactions formed a crdt, so you could make changes to offline copies of the database on different devices and merge them later. The database itself was schemaless and self-describing but the language allowed building up gradually typed views over the database.

So far so good. But where does the code go? I had just sort of assumed that I would figure out a way to insert code as a value in the database, but nothing I came up with felt satisfying. The problem is that the code would have to be stored unevaluated, but this led to all sorts of questions about evaluation order and environment that were not answered by the language itself. Tacking those on after the fact felt clumsy.

After taking a step back and thinking about the problem from scratch I had an epiphany - why is there even a database? If you have a declarative language which can describe data and computation, why use that to compute an imperative action to apply to a totally different system for describing the same data. This is implementation driven thinking.

Let's just mutate the source code instead.

This totally solves the question of evaluation semantics, because it's just the same old code. And it means both data and code can live together in a single file with a single textual representation.

The only hard question is how to describe applying a mutation to a program.

I decided to start out easy with a simple datalog implementation. Right now I have a very basic interpreter and some cli tools. The history of the program lives in a sqlite database. You can use the cli tools to spit out the current version of the program into a text file, edit it in a text editor and then commit the diff back into the database.

> zig build run -- checkout tmp/db1 tmp/code1

> cat tmp/code1                              
#2280651848495541
parent("Bob", "Eve").

#3378495017200132
ancestor(x, z) <-
  parent(x, y),
  ancestor(y, z).

#3531648068531767
ancestor(x, y) <-
  parent(x, y).

#4337023847553130
parent("Alice", "Bob").

> cat > tmp/code1                            
#2280651848495541
parent("Bob", "Charlie").

parent("Charlie", "Eve").

#3378495017200132
ancestor(x, z) <-
  parent(x, y),
  ancestor(y, z).

#3531648068531767
ancestor(x, y) <-
  parent(x, y).

#4337023847553130
parent("Alice", "Bob").

> zig build run -- checkin tmp/db1 tmp/code1 

> cat tmp/code1                              
#2280651848495541
parent("Bob", "Charlie").

#3378495017200132
ancestor(x, z) <-
  parent(x, y),
  ancestor(y, z).

#3531648068531767
ancestor(x, y) <-
  parent(x, y).

#3610246625719134
parent("Charlie", "Eve").

#4337023847553130
parent("Alice", "Bob").

> zig build run -- run tmp/db1               
ancestor("Alice", "Bob").
ancestor("Alice", "Charlie").
ancestor("Alice", "Eve").
ancestor("Bob", "Charlie").
ancestor("Bob", "Eve").
ancestor("Charlie", "Eve").
parent("Alice", "Bob").
parent("Bob", "Charlie").
parent("Charlie", "Eve").

You can also make multiple copies of a database, edit each copy and then merge the changes back together. Each rule get assigned a random id when it's first seen. The programmer never has to type these themselves, only avoid deleting them when editing code. When merging changes these rules are used together with causal information to detect concurrent edits of the same rule.

> cp tmp/db1 tmp/db2                                                        

> cat > tmp/code1                                                           
#2280651848495541
parent("Bob", "Charlie").

#3378495017200132
ancestor(x, z) <-
  parent(x, y),
  ancestor(y, z).

#3531648068531767
ancestor(x, y) <-
  parent(x, y).

#3610246625719134
parent("Charlie", "Eve").

#4337023847553130
parent("Alice", "Qube").


> cat > tmp/code2                                                           
#2280651848495541
parent("Bob", "Charlie").

#3378495017200132
ancestor(x, z) <-
  parent(x, y),
  ancestor(y, z).

#3531648068531767
ancestor(x, y) <-
  parent(x, y).

#3610246625719134
parent("Charlie", "Eve").

#4337023847553130
parent("Alice", "Zed").

> zig build run -- checkin tmp/db1 tmp/code1                                

> zig build run -- checkin tmp/db2 tmp/code2                                

> zig build run -- pull tmp/db1 tmp/db2                                     

> zig build run -- run tmp/db1                                              
WARNING: Conflicting definitions for id 4337023847553130.

First rule:
#4337023847553130
parent("Alice", "Zed").

Second rule:
#4337023847553130
parent("Alice", "Qube").

ancestor("Alice", "Qube").
ancestor("Alice", "Zed").
ancestor("Bob", "Charlie").
ancestor("Bob", "Eve").
ancestor("Charlie", "Eve").
parent("Alice", "Qube").
parent("Alice", "Zed").
parent("Bob", "Charlie").
parent("Charlie", "Eve").

With the cli tools alone this is a horrible workflow but with a little editor support 'checkin and run' could be a single keypress.

For any reasonable size database converting the entire thing into text in a single file is probably not gonna fly either, so I imagine the workflow would have to involve picking subsets of the database to checkout and edit. The unison folks have spent a lot of time thinking about this and have what looks like a reasonable developer experience.

Recordings

For one week I recorded all the time I spent coding. I can't share these because I also looked at email etc while recording but they're useful for checking where my time goes.

For the toy datalog implementation, for example, it looked like this:

40m parser
27m parser tests and debugging
  formatting these tests was time consuming and I deleted them later, so mostly a waste of time - would have been better off just getting to end-to-end tests quicker
32m planner
29m planner tests and debugging
  switched to tests in a separate file - less time formatting
  but still manually edited the tests instead of adding an automatic rewrite
  couldn't remember @embedFile, spent 3m looking it up because when it wasn't under the name I expected I checked the zig issues to see if it had been removed - should have just read the names of all the builtin functions first
30m interpreter
23m interpreter test setup
  mostly just mucking around with formatting and trimming whitespace from the tests
10m debugging a segfault caused by self-assignment
10m being confused about sort order
  the rows in the test output weren't printing in sorted order
  took me a while to realize that it was because in my generic comparison function I copied over from imp v2 I compare slices by length before contents
  this is fine for internal data-structures, but not nice for user-visible output

First thing I noticed is that I spent a bunch of time writing tests that I later deleted. I would have been better off writing the whole thing up-front and just doing end-to-end tests.

I also spent a lot of time, maybe 30-50m total, just on formatting objects. Makes me appreciate edn more. The existing formatting machinery in zig has many flaws and is due to be overhauled at some point before 1.0. In the meantime I've been copy-pasting my own hacks between several projects without ever cleaning up. I could probably save a lot of time in future by turning those hacks into something sensible.

The segfault was due to a nasty cluster of footguns in the current version of zig - 1, 2, 3. All involve pointer aliasing that is not obvious when reading the code. I also ran into #6043 at the same time and it took me a while to realize I had two separate bugs.

Books

The real world of technology. The notion of holistic vs prescriptive technologies is useful. Much of the rest of the book discusses replacement of various holistic technologies by prescriptive technologies over the last century, and how this ties in with globalization and the transfer of many government functions to private companies.

It is characteristic of prescriptive technologies that they require external management, control, and planning. They reduce workers' skill and autonomy.

What the Luddites and other groups of the period clearly perceived was the difference between work-related and control-related technologies.

In order to operate successfully, the industrial production technologies require permanent transportation and distribution structures. In all countries the public sphere has supplied these infrastructures and has adjusted itself accordingly. Arranging to provide such infrastructures has become a normal and legitimate function of all governments.

The political systems in most of today's real world of technology are not structured to allow public debate and public input at the point of planning technological enterprises of national scope.

Public planning for the needs of private industry and for the expansion of technology has gone well beyond the provision of physical infrastructures. There are tax and grant structures, and there is the impact of the needs of technology on the preparation and training of the labour force.

The early phase of technology often occurs in a take-it-or-leave-it atmosphere. Users are involved and have a feeling of control that gives them the impression that they are entirely free to accept or reject a particular technology and its products. But when a technology, together with the supporting infrastructures, becomes institutionalized, users often become captive supporters of both the technology and the infrastructures. (At this point, the technology itself may stagnate, improvements may become cosmetic or marginal, and competition becomes ritualized.) In the case of the automobile, the railways are gone - the choice of taking the car or leaving it at home no longer exists.

...assumed that the introduction of the sewing machine would result in more sewing - and easier sewing - by those who had always sewn. They would do the work they had always done in an unchanged setting. Reality turned out to be quite different. With the help of the new machines, sewing came to be done in a factory setting, in sweatshops that exploited the labour of women and particularly the labour of women immigrants. Sewing machines became, in fact, synonymous not with liberation but with exploitation.

Changing minds. An older book in the same vein as Mindstorms. 20 years later, computers have failed to transform education. Hard to say whether the vision is just wrong or just incompatible with the current education system, but either way I wasn't excited by it.

The essence of software. Describes a design/modeling methodology for software. It's far from fleshed out, but it is the first methodology I've come across that actually felt like it would help me design better software. You can see an early case study in the design of gitless (paper, website) - tackling hard problems and actually producing results lends it a lot of credibility in my mind. I think I will actually try to apply this.

Papers

Typed Image-based Programming with Structure Editing. Dealing with type/schema migration by recording changes to types in a structural editor and using an OT-like process to reconcile conflicts. I'm not sold on this approach - it spends a lot of complexity dealing with changes to anonymous product types, but given that you're editing in a structural editor already there is no need for anonymous product types - just insert ids under the hood. But I think the paper is still valuable for elucidating the problem. Version control of code and schema migration of persistent data are clearly two facets of the same problem but our current tools treat them as entirely separate domains.

FOSDEM

My favorite talks from FOSDEM 2022:

Introspecting async

I'm still thinking about this async gui pattern.

If I convert code that uses explicit state machines to use async/await then it's easier to read, easier to write, and I can use defer to manage lifetimes. But I can no longer just print out the state, or make debugging tools that tell me eg which users have requests that are currently waiting on database io. Are there any implementations of async that let you inspect the closed-over state of awaited futures/promises/frames?