0004: map of incremental/streaming systems, draft of thoughts on benchmarking streaming systems, the mature optimization handbook, various dataflow and database talks

Published 2021-02-16

New:


I'm making the subquery optimization post public. Feel free to share it, tweet it, print it out and give copies to strangers on the street etc.

I'm not settled yet on how to decide which posts will be public or not, but this one I feel I owe to the Materialize engineers since I wrote this gnarly code and then left without fixing the downstream optimization issues.


A lot of the complexity of building eg web applications at scale is in propagating changes between various different layers and reasoning about the consistency of the system as a whole. Database consistency models are primarily about data at rest which makes it very difficult to compose them with the rest of the system. Much of my interest in incremental/streaming/continuous systems is because they provide a way to describe the motion of data over time across the entire system. I find timestamps and watermarks much easier to understand and compose than the zoo of database consistency models, and they also translate naturally into reasoning about other distributed systems.

So I was excited to see in Joe Hellerstein's POPL keynote that his lab is going to be revisiting the CALM work that got me so excited about this field in the first place.


This lead me on a talk-watching binge. Highlights:


What I'm working on: