0005: internal consistency in streaming systems, MMIO in zig, a small matter of programming, rxi, martin kleppmann's new patreon, redpanda benchmarks

Published 2021-02-27

New stuff:

I'm also making the map of incremental and streaming systems public.

Work on reproducing inconsistency in kafka streams is not going well.

I have been able to stand up their page views example and feed it data, despite the best efforts of their documentation. Unfortunately the left join always returns nulls regardless of the input data. I'll continue banging my head against it, but if anyone would like to join in you can find the code here.

One thing to note is that when I asked for help on the confluent slack, I got suggestions like "maybe it's a race condition" or "maybe you need to tweak this config variable". What I didn't get was "wow it's really surprising that the examples don't work correctly out of the box".

I think one of the best ways to make a complex idea understandable is to be able to boil it down to an implementation small and clear enough that a motivated student can read and understand it. rxi has a knack for doing this:

Martin Kleppman is starting a patreon. He's maybe best known for Designing Data-Intensive Applications but I'm more excited about his work on making CRDTs practical.

This paragraph resonated with me:

For me it is important to have this mixture of research, open source software development, and teaching (through speaking and writing), because all of these activities feed off each other. I don't want to just work on open source without doing research, because that only leads to incremental improvements, no fundamental breakthroughs. I don't want to just do research without applying it, because that would mean losing touch with reality. And I don't want to just be a YouTuber or writer without doing original research, because I would run out of ideas and my content would get stale and boring; good teaching requires actively working in the area.

In case you missed it, Vectorized released a set of benchmarks comparing Redpanda and Kafka. The article itself is not great - just enough detail to sound impressive but not enough to be educational - but I'm struck by how Redpanda's line in every graph is totally flat.

I'm also struck that they mention running >400 hours of actual benchmarks. With the machine configurations they mention that adds up to ~$5k. Chump change for a business but kind of intimidating when you're living on sponsorships.

I'm going to be very distracted for the next two weeks. If I get a chance I'll write up some more old work, but more likely the next update will be in late March.