Somehow I am organizing a conference called Have you tried rubbing a database on it?
This isn't what I planned to do this year. How did this happen?
The answer is that I drank too much coffee before breakfast and ended up on twitter, where I saw someone joking that the solution to thinking that everything is a compiler problem is to learn about databases - now you have two problems.
Which hits a little too close to home. But while we're thinking about it...
Are these actually two separate problems?
Fundamentally, every computer system is about storing, moving and transforming data. The line between operating system, database and programming language is somewhat arbitrary - a product of specific problems, available hardware and historical accident.
But today the problems and the hardware have changed dramatically, and as a result we're starting to see people experimenting with redrawing the lines.
Here are some of the changes:
- Cheaper ram, faster storage hardware and better IO apis. It used to make sense to think of the database as the thing that works with data on the slow block-addressable persistent memory and the programming language as the thing that works with data on the fast byte-addressable volatile memory. Those two different kinds of hardware require very different techniques. But over the last decade that distinction has narrowed, in-memory databases have crossed over to the other side, and with byte-addressable non-volatile memory finally starting to be commercially available the tradeoffs are going to change radically.
- While storage speeds increased, memory latency and cpu speed stagnated. Taking full advantage of today's hardware requires being aware of data locality, prefetching, pipelining, branch prediction etc, not to mention taking advantage of increasing core counts. This requires writing code in a style that is very foreign to most programmers and for which most programming languages don't offer much assistance. Which is why eg a SQL query in a modern analytics database will often outperform a hand-written C++ program.
- Distributed systems have become much more prevalent, in large part due to everything moving to the web. On the server side we have to deal with horizontal scaling already, and possibly the disaggregation of the server in the future. On the client side most people now have multiple devices, each of which is moving towards more hetoregenous hardware (eg arm's big.little). We now spend much more time describing the movement of data between systems than in the past, and we've discovered that expecting programmers to be able to reason about consistency by hand is a losing battle (see eg jepsen breaking everything, google's heavy investment in spanner for global consistency).
- The table stakes have risen for both for user-facing applications and dev-facing apis - real-time updates, collaborative editing, sync across devices and with cloud servers, interop between different services etc. In particular the complexity of both distributed webapps and complex GUIs makes change management appealing (eg react hooks, realm's live objects).
- The increasing ops burden on the server side means people are open to radical changes eg 'serverless' computing.
- The growing value of big data analysis and the fragmentation of it's ecosystem results in needing to be able to easily share data between many different languages and across many machines (eg via arrow).
Here are some of the directions people are exploring to deal with these changes:
- Pushing runtime state and logic into the database eg facebook messenger moved much of their coordination and caching logic into sqlite, our machinery stores their entire runtime state in a custom in-memory database, fossil writes large parts of its logic in SQL, unity encourages storing all runtime state in their entity component system.
- Abandoning global control flow in favor of scheduling systems which understand dataflow and dependencies eg streaming systems like flink or kafka streams, incremental systems like react, salsa or nix, parallel execution systems like the unity job system, event-triggered systems like aws lambda functions or airtable automations.
- Moving data-intensive code into specialized systems eg vectorized libraries like numpy or gandiva, specialized compilers like unity's burst, weld or futhark.
- Taking advantage of uniform data models to provide generic solutions to hard problems eg automerge provides offline collaboration, ultorg turns databases into crud apps, the our machinery database provides undo, online collaboration and automatically generated editor UI, matrix piggybacks all their other functionality on top of their generic message sync protocol.
- Unbundling the monolithic database server into separate tools eg turning the database inside out, turning the database into a toolkit.
I don't know what the future is going to look like, but a rewarding avenue of experiment is to take the tools and techniques developed in the database world and recombine them in new ways and on new problems - redrawing the lines between operating system, database and programming language.
But those lines are still encoded in the structure of our fields. Database people go to database conferences. Programming language people go to programming language conferences. Game engine developers are not going to either of those conferences.
What if we tried to get all these people in the same room?
Well, 'room'. This isn't exactly the best year for international travel. So it will have to be online.
And at first that felt like a problem, because I usually go to conferences to meet people and only later watch the recorded talks at 2x speed, because most people talk too slowly and rehearse too little. A conference where I just watch the talks at regular speed and then don't get to talk to anyone doesn't sound that appealing.
Passive lectures aren't an effective way to convey information anyway. So I prefer the model used by eg !!con - all talks are ten minutes long, and they're treated as a starting point to get people talking rather than as a complete lecture.
You can pack a lot of ideas into a heavily edited, well rehearsed 10 minute recording (I've been using medc as my favorite example). And we can show a lot of those 10 minute recordings in one day and still reserve half the time for discussion over text and video chat.
Many tech conferences end up being a series of hour-long ads from software vendors. Which is fine. Sometimes people need to buy software. But that means that it's all about the immediate future - things that you can already sell. It doesn't leave much room for people doing things out on the fringes. So I wanted to run a conference that wouldn't accept talks that are just 'doing boring task X with expensive SaaS Y'.
But then how do I get sponsors to pay for the conference?
This year I 'went' to handmade. It had no ads and no sponsors - supported entirely by ticket sales. How did they do it? They just... didn't spend loads of money.
And handmade was a hybrid event. They had to book a location, hire staff to sign people in etc. Plus they live-streamed and live-captioned everything. That's expensive and difficult.
If I cut out the live-streaming too and do entirely pre-recorded talks, what costs am I even left with? Video hosting. Captioning. Text and video chat. Maybe hiring moderators, if there are a lot of attendees. That's it.
So I expect the costs to be low enough that I can go out on a wing and set the ticket price to pay-what-you-want.
(There's also labour. A lot of labour. But that's already been paid for.)
I thought that the hard part would be finding speakers, but a week after publishing the site the lineup is already pretty exciting. Turns out you can just email strangers and ask them to give a talk at your just-recently-invented conference and they'll agree. (Track record so far - 8 yes, 3 maybe, 1 no reply).
There is also a form to propose a talk. There have already been a few solid proposals, some of which for projects that I never would have heard of otherwise.
One of the nice things about being an purely online conference is there isn't a capacity limit. So rather than thinking about accepting or rejecting talks, I'm thinking about it as a curation problem. I'll add the talks that I think people will be most excited about to the main schedule, but I'll also intersperse it with choose-your-own-adventure blocks where attendees can individually choose which of the remaining talks to watch. And all of the talks will be available publicly after the event.
If this sounds like something you'd enjoy you can sign up here to be notified when tickets are available:
Or you can submit a talk proposal here: