There are no strings on me

(This is part of a series on the design of a language. See the list of posts here.)

Long ago Steve Yegge wrote about software that feels alive - emacs, smalltalk, lisp machines etc - and lamented that the industry prefers to create dead things. Puppets on strings, not real boys.

I'm sympathetic. There is a kind of magic to those systems that is worth experiencing. But it's also worth examining why we prefer to build puppets.

Because I've had days where I've had to debug my surly emacs boy, and I've quickly discovered that his behaviour has very little to do with the code that I'm reading. Methods overridden at runtime, traces that end with a call to a closure that no longer exists, event handlers whose execution order depends on side-effects during module loading, stack-traces which contain multiple different versions of the same function. On the worst days I find myself debugging code that doesn't even exist on disk but was evaluated in the repl weeks before.

While such systems may be alive, they are only pretending to be a real boy. Lurking inside is something much more hostile to human understanding.

At some point the only option is to ~~kill it with fire~~ turn it off and on again. Extinguish that spark of life and turn it back into a puppet.

Why are dead systems easier to wrangle?

You can understand the behaviour by reading the code. For a start, you can actually find the code as a single artifact rather than it being the product of a log of mutations. The code is written in a language that supports local reasoning - if you see foo() you can assume it calls the code you see for foo without having to read the entire codebase to check if the definition might be overridden at runtime. You can start a new process using the same code and it will behave the same way, without having to try to reproduce whatever sequence of actions mutated the code of the old process.

You can partially reset data to get back to a known state. Bugs are inevitable. It's really useful for data to be divided by some kind of bulkhead so that we can recover from mistakes without resetting everything. Eg if your app stores all persistent data in a database then you can recover from many bugs by restarting the process and resetting all in-memory state. This works because we already have code to initialize the in-memory state to some reasonable default state, and because the database cannot contain references to in-memory state that would be left as dangling pointers when that state is reset.

Most live systems gain their interactivity from mutable environments and late binding. Code loading is an imperative process that can cause arbitrary side effects. The behaviour of the system can depend on what order code was loaded in, and when. There isn't even any guarantee that loading the same code in the same order will reproduce the same system.

if ((new Date()).getDay() == 1) { 
    Date.prototype.getDay = function() { return "funday" } 
}

A saner option is to recompile and reload the entire codebase whenever a change is made, while preserving the state of the heap. Recompilation might be implemented by some sophisticated incremental change tracking, or by just writing a really fast compiler - the implementation details don't matter so long as they always produce the same result as compiling from scratch. This guarantees that the current behaviour of the system can be understood by reading the current version of the code.

Removing the possibility of mutating the environment at runtime also aids reasoning about the code, both for the person writing the code and for the IDE that is trying to help them.

Reloading code in the middle of a function call is gnarly. If we return to the old version of the caller, then we're stuck executing old code. But the new version of the caller might not even call the current function.

Simpler to insist that we can only reload code when we're at the top of the stack. For gui programs, this might be the end of a frame. For servers, the start of a request handler. This makes it easy to predict the effect of reloading.

It's not obvious what to do with long-running background tasks though.

Erlang handles live upgrades by providing the programmer with primitives to control when each task switches to new code. If a task hasn't managed to switch after two upgrades then it gets killed. This works pretty well for carefully planned and tested deployments, but I suspect it might work less well for live coding on the fly. It also relies on not sharing memory between tasks, which is ruinous for performance in some domains.

It might be feasible to just cancel all concurrent tasks and require that the heap contain enough information to correctly restart them. For tasks that are pure functions it's easy to wrap them in a polling interface that also handles restarts, but for more complex stateful tasks this could be tricky.

What if we have closures on the heap? Does calling them call the new version or the old version? What if that function has been deleted in the new version of the code?

Again, erlang solves this by allowing old closures to be called, but only across one upgrade - the process crashes if you call a closure from two upgrades ago. Again, this works well with careful planning and tesing but will probably work less well for live coding.

A simpler solution would be to not put closures on the heap.

This would make functions second-class. They can be passed as arguments to other functions, but not returned from functions or stashed inside data-structures. We can still use 2nd-class functions to abstract over control flow with old favourites like each/map/reduce. But we can't register event handlers by adding a function to a mutable list.

Is that really much of a loss? When writing an event handler you have to be aware that it could be called at any point in time, maybe even recursively. And when reading the code that fires an event you have to be aware that literally any other code could run in response to the event. The result is that most event handlers end up just putting an event in a queue and then actually handling the event at some other well-defined point in time where we can reason about the state of the world. Without first-class functions we can still have events and event queues.

It's not just humans that struggle to reason about this kind of situation either. Almost every useful static analysis is confounded by first-class functions. They're the reason we can't have nice things at compile time.

Second-class functions can also be implemented very efficiently. Since they can never outlive the scope they were created in, we don't have to copy the variables they close over - a second-class function can always be represented by a pointer to the code and a pointer to the stack frame.

Suppose our old code had type foo struct { x int } and our new code has type foo struct { y float }. What happens when we reload and the new code finds an old foo on the heap?

The type system didn't create this problem - we changed our data model and now we need to do some kind of migration. But most type systems make it difficult to express that migration because we can't manipulate data without knowing its type at compile-time. Nominal types and type erasure both assume a closed universe of types which are fully known at compile-time.

In many type systems, inference also influences semantics (eg dispatch on return type in haskell/rust). This means that we can't run programs at all until the type-system is fully satisfied, which is painful for live editing.

On the other hand, gradual structural type systems are good at dealing with open systems which might encounter new types at runtime. But typically they are either unsound (eg typescript, mypy) or require wrapping values in expensive runtime contracts (eg nickel). The root problem is the combination of subtyping and mutation:

function foo(x: (number | string)[]) {
  x.push("foo");
}

function bar(x: number[]): number {
  foo(x);
  return x[0];
}

let y: int = bar([]);

// Surprise, y is a string!

I've only seen one satisfying solution to this problem. In Julia, types are first-class and every value has a type:

julia> x = Int64[]
Int64[]

julia> typeof(x)
Vector{Int64} (alias for Array{Int64, 1})

julia> typeof(x).super
DenseVector{Int64} (alias for DenseArray{Int64, 1})

Putting a string in a Vector{Int64} is simply not allowed.

julia> push!(x, "foo")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64

Because types belong to values and not to expressions, and because the type of a value can never change, once we have a Vector{Int64} we know that it will only every contain Int64 and so we don't need to check on every access that it hasn't suddenly acquired a string. And thanks to type inference we can usually prove that the things we are putting in the vector are Int64s so we don't need to check those either.

julia> function inc(x)
           for i in keys(x)
               x[i] += 1
           end
       end

julia> @code_typed inc([1,2,3])
CodeInfo(
...
│    %18 = Base.arrayref(true, x, %16)::Int64
│    %19 = Base.add_int(%18, 1)::Int64
│          Base.arrayset(true, x, %19, %16)::Vector{Int64}
...
) => Vector{Int64}

Since there is no subtyping relationship between Vector{Int64} and Vector{Union{Int64, String}} and we can't turn one into the other without making a new value, Julia is free to use different representations for each type. A Vector{Int64} in Julia is actually an array of contiguous integers in memory, not an array of pointers to Int64 objects. This avoids the constant pointer-chasing which is one of the biggest performance hits in most dynamic languages.

This suggests a rough plan for typing a live system:

First-class structural types.
Every value has an associated type which cannot be changed.
The program semantics are defined entirely in terms of run-time operations on types.
The compiler uses type inference to elide run-time checks in most code.
The type-checker reports the locations of any remaining run-time checks as errors, unless they are flagged as intentional.

This provides sound structural type checking, allows running programs that don't fully type-check yet, makes it easy to write code that migrates data between different versions of a program, and yet doesn't require the performance sacrifices of classic dynamic languages.

Part of the reasons systems become incomprehensible is that it's difficult to look at their data. Consistent notation is a start, but the object graphs prevalent in most system resist good notation.

Dave Abrahams argues pretty convincingly that graphs of mutably aliased pointers are not tractable to local reasoning, and that we should replace them with something that more carefully separates ownership vs reference.

With mutable value semantics, every value is a tree rooted somewhere in the call-stack. References between values are represented by handles rather than pointers. The type system allows mutable pointers to exist on the stack, but statically prevents aliasing.

(I see mutable value semantics as the natural endpoint of the facilities for local mutation that seem to evolve in every pure functional language eg ST in haskell, transients in clojure. Baking them into the language enables more fine-grained checking, interior pointers, refcount elision, copy-on-write etc.)

To handle live reloads we can make a slight tweak to these rules - pass a mutable pointer as an argument to the programs entry point. Any changes made to this root pointer will be preserved across reloads.

The improved local reasoning is the main selling point relative to other imperative languages, but mutable value semantics also solve a host of minor problems almost by accident:

Printing and inspecting - everything is a tree so we don't have to worry about how to print pointer cycles.
Pasting - we can guarantee that copy-pasting a printed value back into code will produce an equal value.
Mutable inline data (eg julia doesn't allow mutable structs to be stored inline because the language doesn't provide a way to disinguish values vs pointers-to-values).
Safely using compound values as map keys (unlike eg javascript or go, which insist on stringifying mutable types).
Trivial heap snapshots - just increment the refcount of the root pointer.
Safe(ish) panic recovery - the compiler can trivially determine which mutable state might be affected by an expression so it can check that none of that state is reachable after catching a panic.

The dream workflow is to be able to pause a app in the middle of using it, open the dev tools, edit the code, try out the changes, roll back to a previous heap snapshot when we inevitably screw up, run unit tests, commit the changes and push them upstream, all without restarting the app and without sacrificing all the language affordances that we've achieved in dead systems.

Trying to design a system that is this malleable, but that remains comprehensible, lead to an unusual combination of ideas:

2nd-class functions.
1st-class structural types.
Mutable value semantics.

Each of these is individually under-explored, so what could possibly go wrong when we combine them?