0037: dynamic mutable value semantics, interior pointers, uninterning, functionless effects, papers, books

Published 2023-06-27

dynamic mutable value semantics

I worked through a simple implementation of mutable value semantics here (tree-walking interpreter, no optimizations). The main difference from swiftlet/val is that it's a dynamic language - demonstrating that nothing about MVS requires static typing.

The swiftlet paper goes into some depth on optimizations to reduce copying. I've also sketched out some alternatives. But all of them make it easy to accidentally copy a large object. So I like the general direction, but I really want to find a way to make copying vs sharing explicit, if it's possible to do that without introducing lifetimes.

The semantics given for inout parameters in swiftlet are that the argument is passed by value, possibly mutated inside the callee, and then on return copied back into the original location. So x = foo(inout y) becomes something like [x, new_y] = foo(y); y = new_y;. But what if foo throws an exception? In swiftlet/val these semantics are optimized into calling foo with a mutable pointer to y, so if foo throws an exception then all the changes to y so far remain applied. But the semantics hint at another option - we could roll back the changes on error. This is harder to implement efficiently but would lead to something like the transactional mutation in noether.

interior pointers

Most high-level languages don't support interesting memory layouts. Julia can control layout somewhat eg if MyStruct is immutable and does not contain any pointers then in Vector{MyStruct} will be a continguous array-of-structs rather than an array of pointers to separate heap allocations. This is a big deal because on todays machines access to memory is often the bottleneck. Programs with dense layouts and predictable access patterns can be orders of magnitude faster than the pointer-chasing produced by most high-level language runtimes.

But in julia, to mutate one of those struct in that vector, rather than writing something like vec[i].z += 1 you have to write old = vec[i]; vec[i] = MyStruct(old.x, old.y, old.z + 1) and then hope that the optimizer will notice that this is silly. If your update to vec[i] is complicated enough that you want to wrap it in a function vec[i] = foo(vec[i]) then the chances of avoiding copying become much slimmer.

Julia behaves like this because it's really hard to expose interior pointers like &vec[i].z in a memory-safe way. If some other code changes the size of vec while that pointer exists then we're left with a dangling reference. Also when the garbage collector encounters &vec[i].z it has to somehow figure out where the beginning of vec is.

This imo explains the appeal of rust. Other languages offered either memory safety or efficient access to memory. Rust was afaik the first production-quality language to offer both.

MVS offers an appealing alternative. It's a much simpler mental model than rust and it's easy to implement (and can even be implemented in a dynamic language) but it still allows the use of interior pointers in many situations.

uninterning

Many dynamic languages have an immutable interned string type. Keywords in clojure, atoms in erlang, symbols in julia etc. Interning strings reduces equality checks to a pointer comparison, which makes these really useful as keys in hashtables.

Clojure and erlang both have a serialization format that supports almost all values in the language (edn, etf). This is incredibly useful for glueing systems together.

The serialization format for both clojure and erlang does interning of keywords/atoms on read. Which means that if you read data from an untrusted source you might end up interning more strings than you bargained for.

In erlang atoms are interned in a global table which support a maximum of 2^10 atoms. Filling it up will crash or hang the node. EEP-0020 proposes that only atoms found in compiled code be interned in the global table and that atoms from read data are not interned. This makes comparisons much more complicated though.

Clojure has it easier thanks to the jvm having concurrent garbage collection. Keywords are interned in a weak hashmap and are garbage collected when no longer used, like any other value.

Julia's symbols are not a big problem because they're generally only used for representing julia code rather than external data. But julia also interns types and jitted code, so I wonder if it's possible to dos a julia system by tricking into jitting too many different types or function specializations.

I don't see a good solution for a runtime without garbage collection. Interning is not safe. Reference counting would introduce far too many cache misses. Comparing hashes instead of pointer values works, but requires also touching the string data when comparing two equal atoms.

functionless effects

Most of the difficulties of effects seem to be in the interaction with higher-order functions. Parametric effects, row types, lexical scoping, closing over handlers/effects etc. But if you have effects, do you even need higher-order functions?

A common motivating problem is how to type map - usually something like map: (a -> b / e) -> [a] -> [b] / e where the / e is the effect being propagated. This quickly gets complicated if you call multiple functions or handle some effects but propagate others.

But we can write map using effects directly instead (excuse the terrible notation) map: [a] -> [b] / (a -> b). This version of map yields an a and expects to be resumed with a b. The handler for that effect might itself yield some effects, but map doesn't have to know about that.

In fact whenever you would take a function as an argument and use it in a 2nd-class way, you could instead yield an effect with the same types. The only thing we can't replicate with effects is 1st-class functions eg mutably attaching an event handler to an object which will be called later in a different context. But I don't like to use 1st-class functions anyway, so it would be interesting to see how far you could push a language using just effects instead.

Counting Immutable Beans

The new compiler for the lean language infers whether function parameters should be owned or borrowed, optimizes away reference count updates and reuses allocations when not shared.

All heap allocations are refcounted.

Reusing allocations is handled by looking for paths where one variable is no longer used, and another of the same size is allocated. A reset instruction is inserted after the former variable goes out of scope, and a reuse instruction is added to the constructor of the latter. If the variable turns out at runtime to have a refcount of 1 then the reuse instruction will overwrite, otherwise it will allocate.

A param is inferred to be owned if if a reset instruction has already been inserted for that param or if the param is ever passed to a function that takes owned references. Otherwise it is borrowed. This inference is monotonic, so for mututally recursive functions it can just be run to fixpoint. This heuristic sometimes breaks tail calls, so there is some additional tweak but the examples contain a bunch of typos (peer review ftw).

Refcount inc are inserted for every function call taking an owned param and dec after the function call, unless that call is the last use of that variable.

It's not totally clear to me how the owned/borrowed params work when calling a closure. I think some sort of wrapper is generated to create a uniform calling convention?

I'm also not sure why there is no consideration of whether or not a variable escapes. I would expect that returning a param would prevent it from being borrowed because it may outlive the reference from which it is borrowed, but the definition collect(ret x) = {} seems to explicitly contradict that.

This scheme seems to require that all pointers are to individual rc-ed heap allocations. It's unclear whether they allow interior pointers at all.

Since it's all heuristic, writing efficient code required some tooling support to show the authors which constructors would reuse memory.

Perceus: Garbage Free Reference Counting with Reuse

Builds on top of 'Counting Immutable Beans'.

All functions take ownership of their arguments. The drop function is used to free an owned value immediately after it's last use (usually just decrements the refcount). The dup function is used to copy an owned value (usually just increments the refcount).

On finding a matching pair of drop and a constructor of the same size, the drop is replaced by drop-reuse which doesn't free the heap allocation so it can be reused by the constructor.

The implementation of drop is inlined. This allows pushing dup calls down into branches, where they may meet and cancel out nested drop calls.

If a reuse constructor uses some of the same fields as the reused allocation, the constructor is specialized to only update the changed fields.

The precise reference counting is not compatible with implicit control flow like exceptions, continuations etc. But koka compiles effects to explicit control flow before doing all the above analysis.

Like the previous work this doesn't seem to handle interior pointers.

The ability to write one function and effectively get both mutable and persistent versions is appealing, but the reuse being based on heuristics makes me nervous about it's reliability in more complex programs.

Destination-Passing Style for Efficient Memory Management

Given x = f(y) compile it into var x = undefined; f(y, &x). (Somewhat similar to result location semantics in zig.)

This is trivial if the size of x is known, but the paper also handles the case where it must be determined by analysis of f. That requires a very restrictive language, which I'm not that interested in.

A weird side-effect of this transformation is that more functions become tail recursive:

// original
map(f,xs) = case xs of 
  nil => nil
  cons(x,xx) => cons(f(x),map(f,xx))

// transformed
map(f,xs,out) = case xs of
  nil => out := nil
  cons(x, xx) => out.head := f(x); out.tail := alloc_cons_cell(); map(f,xx,out.tail)

This feels like it should be applicable to languages with mutable value semantics, where the return value becomes just another inout parameter.

Go 1.18 Implementation of Generics via Dictionaries and Gcshape Stenciling

Generics are typically implemented either by dictionary passing (java, haskell) or monomorphization (rust, julia). Monomorphization opens way more optimization opportunities but slows compilation, bloats code and makes separate compilation difficult.

Go does a mix of both. Types are grouped by gcshape, which provides enough information for allocation and garbage collection.

IIUC all the types in a given gc shape are passed the same way in the abi, so this approach avoids needing to box all compound values but without generating too much code bloat. But most dynamic dispatch is still present and there is little opportunity for inlining. An interesting tradeoff.

Generalized Evidence Passing for Effect Handlers

Effect handlers are usually implemented like delimited continuations, relying on being able to dictate and read/write the representation of the call stack. This isn't always possible eg when compiling to wasm.

Instead koka uses a monadic-ish internal representation. When calling an effectful function the list of handlers is passed as an implicit argument. If the function needs to perform an effect, it returns a Yield containing the id for the handler (found by searching the list) and a closure representing the continuation.

By inlining the checks for these Yield results the compiler is able to generate a fast path for the non-yielding case (although this requires some care to avoid an explosion in code size). And handlers which always immediately resume can be executed in place, avoiding the Yield process entirely.

The construction of the closure is optimized by representing it as an array of closures - rather than composing all of them and running the result, the runtime can pop them off and run them one at a time. This feels awfully like just returning a shadow stack, so I'm not really sure what is gained vs using a traditional implementation of delimited continuations but applying them to a shadow stack in wasm. Perhaps this gradual transformation is easier to reason about correctly.

I wonder how this affects debugging. After all the transformations the resulting call stack bears no resemblance to the source so native debuggers will be useless. Even with a custom debugger, it might be difficult to recreate the stack.

All of this also applies only to pure languages. A language with mutable state would have to be careful about copying the stacks correctly, in which case I'm not sure if the performance benefits would still hold.

Compiler and Runtime Support for Continuation Marks

Support for (something like) a subset of effects, but unfortunately requires control of the stack representation so iiuc is not applicable when targetting wasm.

Compiling Effect Handlers in Capability-Passing Style

An efficient compilation scheme for effects, but requires monomorphizing all code.

WASM Typed Continuations proposal

So there is an official proposal to just add effects to the wasm vm itself. Not sure how likely it is to be accepted or how long it will take or how to keep up to date on the state of wasm proposals ¯_(ツ)_/¯

Effects as Capabilities: Effect Handlers and Lightweight Effect Polymorphism

Effekt avoids the complexity that comes with parametric effect polymorphism simply by omitting the feature from the language.

Type systems for effects get gnarly when higher-order functions are involved. The authors argue that this comes from the requirement that effect types bubble up, so that the types of higher-order functions have to reflect how they propagate effects.

If we instead treat effects as values then higher-order functions don't need to know what effects their arguments close over. Only functions that actually perform effects need to declare them.

This leaves us with a safety problem though - how to stop the effect value from escaping the handler? The authors solution is to make both effects and functions 2nd-class values, so that by construction they can never escape their lexical scope. Functions can take values, functions and effects as arguments. Effects can take only values as arguments.

To ease the syntactic burden, effects are nominal and are passed implicitly.

Because functions and effects can't escape, they can also close over mutable stack-allocated variables! Unlike the other effect papers above, this style of effects apparently plays well with imperative programming.

The monad for delimited control is extended by also storing state frames on the stack. On continuation capture, the state is copied and on a call to the continuation it is restored. This way, it is guaranteed that we get the expected interaction between mutable state and delimited control.

Of course, that requires a sane semantics for depp-copying values. Seems like it might blend well with MVS.

They note several limitations from 2nd-class functions eg:

Effect handlers can express structured concurrency with user defined schedulers [Dolan et al . 2015]. In Effekt, this is not possible since it would require to store the continuation in a data structure (such as a queue).

This seems possibly too strict? They require that functions can't escape their lexical scope, but they enforce this by making them entirely 2nd class values. It seems that there might be space to allow placing a function in a queue, so long as the queue is also confined to the lexical scope?

Effects, Capabilities, and Boxes

Ok, yeah, this extends the above paper to allow 1st-class functions.

Basically a 2nd-class function can be put in a box. The type of this box reflects all the capabilities the function closes over. The only thing you can do with a box is unbox it, and only in a scope where the same capabilities are present.

I don't see how this is compatible with their treatment of mutable variables as effects in the previous paper. It surely matters which variable you closed over, not just that there is a similarly named variable in the current scope.

Some of the complexity seems to result from their effect arguments being implicit. If they were explicit, we could just say that a function is first-class whenever it doesn't close over a capability. Then the boxing operation is really just ... not closing over ambient capabilities. And the unboxing operation is currying some capabilities from the current scope. In that light, it seems like the treatment of mutable variables should be similar - the boxed version isn't actually closing over the mutable variable, it just expects to be passed a mutable variable of the same type when unboxed. Which doesn't seem that useful?

Second-class modules for the Effekt programming language

The main thing here seems to be that modules use the same implicit argument passing as effects, meaning that most use of modules doesn't require any extra typing vs typeclass/trait system.

Life Inc

The subtitle is 'How the World Became a Corporation and How to Take it Back' but the 'How to Take it Back' is confined to ~10 pages at the end that are essentially 'maybe local currencies'.

I'm getting tired of how many books promise to say something and are actually just a bunch of shallow summaries of other books. And then when I go read those they're the same, and it's just half-assed summaries all the way down. Maybe I should stop reading books written by journalists.

The one citation that seems interesting is Money, whose author has actually helped establish several local currencies and so probably has something interesting to say about them.

Donut Economics

I couldn't finish it. Go read The Value of Everything instead.

How to Do Nothing

My third read of this book. It keeps growing on me.

It's notable that one of the authors main points is the need to disconnect and find solitude in order to be able to actually think. And then they wrote a book about the attention economy which actually contains novel, interesting ideas rather than just repeating the same memes that everyone else is writing. So maybe there is something to that :)

...the villain here is not necessarily the Internet, or even the idea of social media; it is the invasive logic of commercial social media and its financial incentive to keep us in a profitable state of anxiety, envy, and distraction. It is furthermore the cult of individuality and personal branding that grow out of such platforms and affect the way we think about our offline selves and the places where we actually live.

To resist in place is to make oneself into a shape that cannot so easily be appropriated by a capitalist value system. To do this means refusing the frame of reference: in this case, a frame of reference in which value is determined by productivity, the strength of one’s career, and individual entrepreneurship. It means embracing and trying to inhabit somewhat fuzzier or blobbier ideas: of maintenance as productivity, of the importance of nonverbal communication, and of the mere experience of life as the highest goal.

This kind of resistance still manifests as participating, but participating in the 'wrong way'.

Anyone who has ever tried any funny business in a faux public space knows that such spaces do not just script actions, they police them. In a public space, ideally, you are a citizen with agency; in a faux public space, you are either a consumer or a threat to the design of the place.

...colonization of the self by capitalist ideas of productivity and efficiency. One might say the parks and libraries of the self are always about to be turned into condos.

Just as public space gives way to faux public retail spaces or weird corporate privatized parks, so we are sold the idea of compromised leisure, a freemium leisure that is a very far cry from 'what we will.'

'[y]ou are marinating yourself in the conventional wisdom. In other people's reality: for others, not for yourself. You are creating a cacophony in which it is impossible to hear your own voice, whether it's yourself you're thinking about or anything else.'

This is why it's even more important for anyone who does have a margin - even the tiniest one - to put it to use in opening up margins further down the line.

Practices of attention and curiosity are inherently open-ended, oriented toward something outside of ourselves.

When the language of advertising and personal branding enjoins you to 'be yourself,' what it really means is 'be more yourself,' where 'yourself' is a consistent and recognizable pattern of habits, desires, and drives that can be more easily advertised to and appropriated, like units of capital.

For a brand as for a public figure - which, as we now know, any Twitter user can accidentally become overnight - change, ambiguity, and contradiction are anathema. 'You have one identity,' Mark Zuckerberg famously said. 'The days of you having a different image for your work friends or co-workers and for the other people you know are probably coming to an end pretty quickly.' He added that 'having two identities for yourself is an example of a lack of integrity.'

I THINK OFTEN about how much time and energy we use thinking up things to say that would go over well with a context-collapsed crowd - not to mention checking back on how that crowd is responding.

Subprime Attention Crisis

Arguing that programmatic advertising is ripe for a 2008-style crash. Brand safety is still a problem - it's hard to tell where (or if) your ad will actually appear. Much of the customer data being sold is inaccurate. There's moderate evidence that targetted advertising isn't very effective at actually generating sales. Adblocking is pervasive and on the rise. Etc.

It's held up against the 2008 crash where most of the big players knew that the market was becoming dangerous but for structural reasons were incentivized to keep playing. In the case of ads, one structural problem is that the user-side dominance of google/facebook/etc has collapsed most other advertising options, and the ad spend has to go somewhere.

If the ad market did crash it would take much of the web with it. We've become very used to free ad-supported infrastructure. There isn't an obvious alternative to transition to right now, and whatever we ended up with would probably exclude a big chunk of the world that can't afford a bazillion subscriptions.

I was more interested in the passing side argument that advertising shaped the web we have today. People reminisce about when the web was younger and cozier and weirder, and the transition to everyone spending all their time on the same 5 megacorp sites is typically put down to maturity. But it's probably as much to do with the need to make user activity legible to targetted advertizing. The author posits a thought experiment - a social network made up of multiplayer whiteboards where people can doodle and scribble notes for each other. Might be fun, might not be fun, but we'll never know because handwritten doodles are totally illegible from the point of view of ad targetting.

Another Now

Using fiction to share visions for the future can be really compelling if the author is good at writing fiction. Yanis is not. I only made it about a third of the way through.