0054: zest namespaces, store tags after payloads, go allocation probe, everyones got one, pprof labelguns, go value types, go perf probe, tpde, anyblox, books

I forgot to write a devlog for a while :)

I finished consulting, but then I got covid and was sick for a while. Now I'm slowly getting back to working on zest, starting with figuring out namespaces/imports (design doc).

Assorted writing:

Store tags after payloads. Tiny musings on improving the space usage of sum types.
Go allocation probe. Using bpf uprobes to count allocations in a go program by type.
Everyones got one. Inevitable opinioning on llms.

pprof labelguns

The go api for pprof labels is mostly footgun.

I wanted to label samples by customer id, so that when we're looking at a big spike in the profile in datadog we can tell which customer caused it.

The api lets you set labels on a goroutine, which is where the profiler reads them from, and also on a context. But you can only read labels from a context. So if you want to label a particular function call, you have to:

read the labels from a context
add your label to label list
set the new label list on the goroutine
call the function
reset the old label list on the goroutine

But what if the labels on the context don't match the labels that are currently on the goroutine? Or what if you don't have the matching context? Sucks to be you.

I made a note when writing that code that we'll have to be careful reviewing other uses of labels, because it's really easy to accidentally overwrite the label set. What's that on the mantelpiece? Oh, that's just my old footgun. Don't worry about it.

The labels work fine locally, but when we ship it and look at the profiles on datadog they're all messed up. Turns out that datadog themselvs, in their own client library, are erasing our labels whenever we report a span with no context.

		// For root span's without context, there is no pprofContext, but we need
		// one to avoid a panic() in pprof.WithLabels(). Using context.Background()
		// is not ideal here, as it will cause us to remove all labels from the
		// goroutine when the span finishes. However, the alternatives of not
		// applying labels for such spans or to leave the endpoint/hotspot labels
		// on the goroutine after it finishes are even less appealing. We'll have
		// to properly document this for users.

Spoiler alert - they did not document this for users.

But it's not really their fault. If go just provided a way to read labels from the goroutine, datadog would have used it to restore my labels.

go value types

C# and julia split types into reference types (always heap-allocated, passed by reference) and value types (may be allocated inline or on the stack, passed by value). Coming from rust, it's kind of annoying to have to decide in advance how each type is used, rather than having the flexibility to do different things in different situations.

So I expected to like having this flexibility in go. But after a few months of shooting myself in the foot, I think go is actually in a pretty bad point in the tradeoff space.

The first problem is that there is no equivalent to rust's Copy trait, so it's really easy to accidentally copy a big struct. This is bad enough when it's just a performance hit, but it also causes bugs when you accidentally mutate a copy of a thing instead of the original. This is exacerbated by the fact that most of the builtin apis default to pass-by-value eg:

for _, thing := range things {
  // If things has type []*Thing, this does what you expect.
  // If things has type []Thing, this is a no-op.
  thing.mutate()
}

// If things has type map[Key]*Thing, this does what you expect.
// If things has type map[Key]Thing, this is a no-op.
things[key].mutate()

The second problem is that you can't reliably stack-allocate a value and pass around references (because the reference might escape).

thing := Thing{}
// This will probably force thing to be heap-allocated, depending on inlining and escape analysis.
thing.mutate()

The way people seem to deal with these problems is to adopt a coding style where, for each struct, either:

The struct has a constructor that immediately heap-allocates it, and the struct is always passed by reference.
The struct is small, is usually not mutated, and is always passed by value.

So we're back to the c#/julia world, except without the compiler support. There are linting tools that will catch some of these mistakes, but not all of them, and they produce false positives too.

In the codebase I was working on I found plenty of examples messing this up. Mistakes that wouldn't happen in either rust or c#/julia.

go perf probe

You can also use uprobe to create new perf events.

# Create an event that triggers when GetEntityForID is called.
perf probe -x ./run_snapshot_test --add 'GetEntityForID=github.com/runway/runway/api-server/app/calculator.(*RunwayCalculator).GetEntityForID'

# Count the number of calls to GetEntityForID.
perf stat -e probe_run_snapshot_test:GetEntityForID

# Record a stacktrace every time GetEntityForID is called.
perf record --call-graph fp -e probe_run_snapshot_test:GetEntityForID

Using the stat is faster than adding a counter and recompiling, and the record is nice for functions which aren't called often enough to get accurate results from random sampling. Sadly hotspot can't open the results of record though, even though it can open the results of record for builtin events.

I also tried to get processor trace working, but intels viewer is painful to install on nixos and neither magic-trace nor perf2perfetto could succesfully translate a 1gb trace.

TPDE: A Fast Adaptable Compiler Back-End Framework

For compiling quickly, the compelling options are 1-2 pass compilers like baseline compilers in most js jits or template jits like copy-and-patch. Copy-and-patch doesn't allow doing any inter-operator optimizations or register allocation, so it tends to generate worse code (cf the cpython jit), but it has the advantage of requiring very little target-specific effort.

TPDE combines some of the advantages of both. It allows implementing most of your opcodes in C for portability, but it also does baseline-style optimizations and register allocation against the resulting llvm mir. As a bonus, it can also operate directly on your own internal ir rather than requiring a conversion pass.

It seems like a strict improvement on llvm o0 and is competitive with DirectEmit while easily supporting more platforms (I hear the arm backend written for this paper ended up being too hard to merge). But their wasm backend seems barely competitive with the winch interpreter, which is hard to reconcile. Maybe a lot of the possible optimizations where already done when compiling to wasm, compared to the opportunities available in unoptimized llvm ir?

AnyBlox: A Framework for Self-Decoding Datasets

I love the idea for this. There have been a ton of improved data encodings coming out of both academic and industrial research. But everyone still uses parquet because everything supports parquet. The proposed solution is to standardize on a new data format where the decoding logic is embedded in the data itself as a small wasm binary, so that new encodings can be adopted without requiring everyone to add support for them.

Wasm support for simd isn't nearly as good as native, so the wasm version of REE is something like 4x slower to decode than the native version using avx512. They point at flexible-vectors as a potential solution but that proposal doesn't seem to be moving at the moment.

By far the biggest weakness though is that they don't support filter pushdown. That would require standardizing an expression language across multiple query engines, which seems unlikely. (They can't just write the expressions in wasm because different encodings will shortcut the filter expression in different ways - we need an abstract representation.)

books

The unaccountability machine. Part history of cybernetics, part cheerleading. The history was interesting, but I found the actual cybernetics ideas to be vague to the point of being unfalsifiable. Something something feedback loops.

The black swan. I liked the basic idea of noticing that in some areas results are mostly dictated by out-of-distribution events and so you can't really protect yourself there by forecasting everything as a normal distribution. Adam Mastroianni's idea of strong-link problems seems closely related, and also didn't get padded into a whole book so it's much easier to read.

Change your diet, change your mind. The basic premise of nutritional pyschiatry - that some mental illnesses might be caused by, or at least exarcarbated by, metabolic illnesses - is really exciting. There is a plausible mechanism - insulin resistance prevents insulin from crossing the blood-brain barrier, preventing your brain cells from burning glucose - and it would explain why mental illnesses have soared over the last few decades and also provide an avenue for actual treatment rather than just ameliorating the symptoms. Unfortunately this book reads less like a new area of research and more like a cross between a fad diet book and one of those psychiatry books where every single patient is enlightened in a single conversation. Not the most trustworthy overview.

The multivitamin lie. This probably could have been just a blog post, but it's at least a convincing argument. Big diet surveys in the US show that almost everyone has enough of every micronutrient in their diet. Big blood tests show that most people have no deficiencies, and the exceptions are in a few specific nutrients that you would be better off testing and treating individually (eg iron/calcium in women, vitamin b12 in vegans, vitamin d in general). In high-powered epidemiological experiments we don't see multivitamin intake cause any reduction in illness or all-cause mortality. All micronutrient stores take weeks to months to deplete and your body upregulates absorbtion for anything you are low on, so your diet only needs to be balanced on the scale of a week or so rather than each day. Finally, the author puts together several week-long diet plans using typical western foods that hit all the RDAs in only 1350 calories per day, leaving a lot of slack for junk food.

Antimemetics. Incredibly vague - antimemes seem to include ideas that are boring, ideas that are exciting but not immediately actionable, ideas that are dangerous, ideas that are unpolite to talk about in public, ideas that are actively suppressed by authorities, and anything that is discussed in a private chat because town-square social media is incapable of nuance. I didn't find any explanatory power here. Also the author says that her previous book working in public, which I thought was pretty insightful about the realities of maintaining a small open-source project, was actually a covert critique of democracy. Which would make sense if the country was run by three people in their spare time and strangers kept barging into their office and throwing half-completed paperwork at them. But if we look at more complex projects like linux, which is surely still much less complex than running an entire country, we see a big assortment of long-term expert maintainers divided into different (occasionally squabbling) departments, and a foundation that dictates funding and provides feedback/direction from stakeholders. Looks awfully like a civil service and an elected executive branch. And any big company has a similar split into employees and board, who are elected by shareholders. It seems like the structure we always land on for large institutions.

Kind of a disappointing summer as far as non-fiction goes.