HYTRADBOI 2022 postmortem

Genesis

One morning I drank too much coffee, got over-excited and on a whim proposed a conference - Have you tried rubbing a database on it?.

Would you watch a "have you tried rubbing a database on it" conference?

Thinking short demo videos of weird and non-traditional uses of database ideas eg writing a dvcs in sql (https://t.co/qM8Js39hvs) or building a game engine around a shared database (https://t.co/eU9NTxJZqv)
— Jamie Brandon (unresponsive here, email in bio) (@sc13ts) November 13, 2021

(I talked more about the reasons in Why start a new database conference.)

~80 people said they would attend, which was enough to make this seem worth doing.

A few twitter polls later I set the date for Friday (most preferred a weekday) Apr 29 (avoids major DB conferences) 0900-1500 PDT (as early as I was willing to start, but finishes early enough to be just about accessible in Europe).

Format

In my mind the main point of conferences is to get people talking to each other - the talks just act as jumping-off points for discussion. I'd also been to a few online conferences that year and found that I often got bored during hour-long live talks, and missed the ability to watch on 2x speed or skip around. So the format I decided on was blocks of 10 minute pre-recorded talks, where the speakers were asked to try to make the talk dense enough that noone wants to watch them at higher speed, interspersed with lots of social time to discuss the talks. Each talk also included links to further information if viewers wanted to go into more depth.

I also set the requirement that talks can't just be a thinly-veiled ads for some SaaS, which is my major pet peeve at database conferences. It would be ok to talk about work that came out of some commercial project, but it had to be something that was worth watching even to someone who wasn't interested in buying the product behind it.

I find Q&A at a lot of conferences is low-value - everyone hates 'this is less of a question and more my life story' and even when there are good questions the speaker doesn't have time to answer in depth. So I decided to combine Q&A with the general discussion in the text chat.

Together these decisions also totally removed the need for live broadcasting, which is one of the most technically difficult parts of most online conferences.

I looked at a lot of existing platforms for running online conferences. They were all limited to ~100 people for broadcast. And all had great support for video chat but had text chat as an afterthought.

Picking a text chat as the main focus meant better a11y and easier lurking - the latter being important if people are attending part-time while at work. I decided to provide captions on all the videos for the same reasons.

I wavered back and forth between discord (for the voice channels) and matrix (for the admin api and out of a general preference for open infrastructure). Slack and zulip weren't in the running at all because I've had so many showstopping bugs in both (eg for a few months slack messages didn't render for me at all - I just saw empty rooms). Matrix won out in the end mostly because I'd already seen it used succesfully at both Handmade Seattle and FOSDEM.

So the rough plan at this point was:

pre-recorded talks on vimeo
chat in matrix, hosted by EMS
some sort of optional video chat, tbd later

Talks

I started by asking potential speakers who I know well, and then used their names for credibility when asking people I knew less well. Once I had ~10 speakers confirmed I put up a conference website at hytradboi.com with links to propose talks via a google form and to sign up for a mailchimp mailing list to be notified when tickets are available.

I ended up with 15 speakers that I had explicitly asked, plus another 36 submissions from the website. My original plan was 6 blocks of 3 talks, so I had to expand this to two tracks and even then still reject a lot of proposals.

Both on the proposal form and when reaching out to speakers I gave Mar 29 as the deadline for submitting videos (and I later shifted this to Apr 1 by accident). I expected some people to miss it and thought that a month would be enough buffer before the conference. In the end, ~1/2 the speakers submitted videos on the deadline and the last video came in 3 days before the conference. It's hard to blame the speakers for this when they're volunteering to give talks in the first place - I should have just put a lot more buffer between the deadline and the conference.

Two speakers dropped out between announcing the schedule and the day of the conference, so the final total was 34 talks.

Tickets

I really liked that Handmade Seattle had been able to completely avoid ads and sponsors. They ran the conference over matrix and hosted videos on vimeo, embedded in the handmade website so that they could change providers without breaking links.

Other conferences seemed to split into two groups. Community-run conferences ran to around $60-80 for an online-only ticket. For-profit conferences tended to more like $400-2000.

Some back-of-the-envelope math told me that if 100 people bought $64 tickets then that would cover the expected costs of running HYTRADBOI. If I wanted to pay the speakers too I'd have to charge closer to the for-profit conferences, which I think would change the vibe a lot.

I wanted to offer pay-what-you-want with $64 as a recommended amount, but:

on stripe this requires running my own billing server instead of using payment links
on square I can only charge tickets in CAD, which nobody knows the conversion rate for

To keep things simple I compromised on having stripe payment links for $64, $32, $16 and $8.

The final distribution looked like:

$8 => 70
$16 => 30
$32 => 36
$64 => 393

This is been my experience of pay-what-you-want elsewhere too. Most people paid full price but we also let in 136 people who might not otherwise have attended - win-win.

Registration

I wrote a registration script that would:

Read completed checkout sessions from stripe
Generate a username/password based on the attendees email
Register that username/password on chat.hytradboi.com and add them to all the conference rooms.
Send an invitation email from info@hytradboi.com

Delivery rate on the invitation emails was terrible - in many cases not even landing in spam. Maybe I should have used a commercial service rather than sending them straight out of fastmail, but delivery rates on the mailchimp mailing list were also reportedly poor.

Linking the username to email was also a pain when someone bought multiple tickets. I dealt with these by hand - asking the attendee what email addresses they wanted to send the invites too and triggering the registration by hand. This worked but was tedious.

Captions

I wanted captions on all the talks. Captions make life easier for viewers who have hearing problems, viewers who don't speak english fluently, speakers with strong accents, viewers watching while working or skimming the video etc.

I had assumed I could just use a commercial service and not worry about it. I picked one talk as a sample and sent it out to 4 different services (2 human, 2 machine) and none of the results were legible enough that I felt good about using them.

I tried captioning the same talk by hand using amara and it took me ~50 minutes, compared to ~20 minutes to correct the machine transcription. So I settled on paying for machine transcription on rev and then hand-correcting. I was able to get through ~3 talks per hour so this was only a few days of work, but it was deeply boring.

Also the rev editor is supposed to preserve timestamps when making small edits, but in some cases 20-30s of captions ended up getting mashed together into one screen.

Some speakers volunteered their own captions and these were typically much higher quality.

Video hosting

About 6 weeks before the conference vimeo announced that they were changing their pricing for anyone using >2TB of video from a fixed monthly fee to 'call us'. This was accompanied by horror stories of relatively small users being faced with multi-thousand dollar bills per month.

I asked around for recommendations and by far the most common was bunny. They're very cheap - $5/TB for the volume service.

Bunny offers a simple rest api but I ended up doing everything by hand on the web gui. Because I asked speakers to submit their talks via google forms I ended up with a spreadsheet containing a bunch of links to google drive files. I couldn't figure out how to download these via the drive api without attaching it to a google cloud account and giving them a credit card. So I ended up downloading the videos individually by hand, then uploading them in the bunny web gui and attaching the captions. If I did this again I would ditch google forms and just write a 50 line flask app, so that I can automate this kind of work.

I also uploaded everything to the internet archive as a backup. (I searched the forums and found some threads where they explicitly stated that they're ok with people uploading conference talks). On firefox, unfortunately, the links to individual videos within an archive instead always redirect to the beginning of the playlist.

Text chat

I set up a test server on EMS. I found out that I couldn't make any changes to the synapse or element configs directly. Support offered to make the changes for me but I didn't want to be pinging them constantly while I figured out the setup.

So I moved the test server to vultr, copying the nixops config I use for my own server.

I made these changes to the default config:

Disable presence indicators (I was warned this can generate a lot of load in some cases)
Disable guest access
Enable threads (still in beta, but I thought 500 people in one room without threads would be chaos)
Change the permalink prefix from matrix.to to chat.hytradboi.com (matrix.to is really confusing if you've never used matrix before)
Turn off federated login and disable registration (I would have liked to allow federated login but this would have complicated the registration flow, which was already committed to using payment links)
Switch the integration manager to dimension (because the default registration manager required explict action before showing widgets, which I thought would confuse people)

Once I had everything setup I couldn't be bothered to move it all back to EMS. (This was a bad decision).

I started sending out logins a few weeks before the conference, thinking it might help to shake out any problems in the login flow, but everything went smoothly.

I expected 100-300 attendees. In the last few days before the conference started the total number jumped to 568. This was much more than I was expecting to handle so the night before the conference I upgraded the server to 24 vcpus, 94 gb ram and fast nvme drive - wildly overprovisioned compared to the hardware estimates that I'd heard from other conferences but at $1/hour a small price to pay for peace of mind.

The chat was broken down into 4 main rooms: announcements, hallway and a room for each of the two tracks. The announcements channel had a big pinned post explaining how everything was going to work. The talks appeared on schedule in widgets embedded in the corresponding track rooms.

Video chat

This ended up being something of an afterthought. Ideally I wanted something like the voice rooms in discord where people can see at a glance how full each room is. Matrix has video rooms in progress but they're not ready yet.

Instead I just setup up a jitsi server. I couldn't find any estimates on hardware requirements at all except that it it's cpu-heavy so I again wildly overprovisioned - 32 vcpus and 64gb ram. I also set last_n to 1 to limit the bandwidth requirements. In the end, for reasons discussed below, hardly anyone used the video chat so I have no idea whether this was adequate.

I also made a little chatroulette -style service - a button embedded in the hallway room that assigned people to random jitsi rooms.

Misc

I made sure that I had a backup laptop and backup internet connection for the day.

I wrote down a rough schedule - each action that I had to take at a specific time - and did a quick test run of each.

Outage

On the morning of the conference I had originally planned to shut down the ticket links at 0800, but a lot of people were still trying to buy tickets so I left the ticket links up with a note that the conference was already starting.

Everything looked good until around 0850 when the chat server started falling apart. This caused multiple problems:

Some people couldn't login at all.
The widgets wouldn't load reliably so people couldn't see the talks.
The registration script couldn't add new users, so people were buying tickets and then not getting invites.
I couldn't reliably connect to explain what was going on.
Because the chat was unreliable people just stayed in the track rooms and didn't look at announcements, so even when I did explain things most people didn't see it.

I ended up being split in many directions: trying to keep the conference limping along, responding to emails and twitter dms from confused attendees who didn't receive invites or couldn't log in, manually completing registrations and invites, and trying to diagnose the performance problems. The constant context-switching made me pretty bad at all these tasks.

When it became clear that the perf problems weren't going to be fixed quickly I tried to take the ticket links down to stop the flow in. But I didn't notice that the netlify cli failed with 'cannot read property 0 of undefined' - an non-deterministic error that I see occasionally and usually just retry. So the ticket links stayed up for a few extra hours.

From my notes written down that evening, this is what I did to diagnose the perf issue. Much of it doesn't make sense in hindsight but that's panic for you.

Check htop, iotop, iftop. Utilization of ram and network is low. Cpu is low overall, but one synapse process is regularly hitting 100%. There is heavy disk traffic is from journald, much more than from postgres.
I found an issue suggesting that logging might be a bottleneck for synapse. Rate limiting journald is an easy change so I try it. Doesn't help much.
Cpu usage for that one synapse process is high so maybe I've misconfigured synapse somehow. I find a guide on troubleshooting synapse using prometheus/grafana. I'm not familiar with these so I sink a bunch of time trying to hook them up and never manage to get data out.
Take a deep breath. Step away, make a cup of tea, and think.
Check logs. I see a lot of messages about the kernel refusing connections.
Pull out Systems Performance and start working through the network checklist.
Throughput is low. Number of connections is low. No errors.
Check system limits (fs.file-max etc). All are set high and we're not near the limits.
Kernel is probably fine, so check the next layer. Look at nginx logs. See warnings that '512 worker_connections is not enough'. Nginx defaults to only allowing 512 simultaneous tcp connections!
Change worker_connections to 10000 (picked arbitrarily) and everything starts working smoothly.

The actual debugging itself didn't take a long time, but juggling everything else slowed me down a lot. It would have been a lot easier with two people.

Realistic load-testing would have caught this in advance. I didn't find any existing load-test code. I did attach 500 browser sessions to the test server and type a bunch without any problems, but this probably didn't generate as many simultaneous connections.

One idea I had considered earlier was having a pre-conference a week earlier where we just show a few videos and test everything out. This would only have caught the issue if most people attended - the server was doing fine at ~60-80% of attendees and then fell off a cliff once the connection limit was reached.

I imagine EMS wouldn't have had this problem.

There were 571 attendees total, including speakers. 108 never logged in. I reached out to all of these to offer refunds if they didn't receive their invite or couldn't login because of the outage. So far only 2 people have told me that they tried to attend but couldn't - the other replies have all said that they were just busy and couldn't make it.

Experience

Aside from the initial chaos...

People seem to really like the talks and the dense 10 minute format.

Prerecording talks let people watch them at their own pace. Some people just watched a few talks and paused a lot to think. Others tried to watch all of them at 2x speed. It's nice to be able to go to the bathroom and not miss anything.

34 talks in 6 hours was maybe too much too handle. I had wanted to spread it over two days but didn't want to ask the speakers who had already made room in their schedule to make a last-minute change.

The chat was very active but difficult to follow. In hindsight one room per track makes no sense. The speakers quickly re-organized into one room per talk which worked much better.

Turning on threads didn't work out great in the end. 500 people was chaos either way, and the the notification UI for threads isn't finished yet so I think a lot of conversations got dropped in the chaos.

Since most people didn't read the announcements they didn't know that the video chat existed, so it didn't see much use. But we did have an afterparty call that apparently continued a few hours after I left.

People wanted to hop around but the widgets are specific to room they're in. Some people accidentally deleted them and then didn't know how to get them back. Around halfway I deleted all the widgets and just posted direct links to the schedule page.

The chat is still active a few days later. I'll leave it up as long as the conversation is still going.

A lot of people expressed a surprising amount of support for the conference. I think there is a lot of pent up demand for a database conference that isn't just SaaS ads. That support meant that people were very willing to help promote the conference and were forgiving of the many technical issues. Many people bought tickets knowing that they wouldn't be able to attend, because they wanted something like this to exist.

Expenses

(All the below is in CAD, not USD).

tickets: 34243
stripe: -1547
vultr: -232
rev: -116
bunny: -13/130
netlify: -52
google drive: -13
namecheap: -9
donation to matrix.org: -1005
donation to nixos.org: -1005
donation to archive.org: -1028
(jitsi.org doesn't take donations)

I plan to serve the videos indefinitely. So far I've only racked up 10 USD on bunny, but I've deposited 100 USD and I'll top it up if needed.

I didn't use a time tracker. I think the total labor was ~6-8 weeks spread over the last 4 months, but much of that was exploratory and if I did the same again I'd be able to reuse much of the existing decisions and code.

Encore

I'm not sure if I'll try to do this again next year. If I do, there are a lot of changes I'd make.

Make a static landing page with status, instructions, links to talks and chat. Auto-refresh every minute. This takes chat (the most complex moving part) out of the critical path and gives me a way to message attendees that they are more likely to actually see.
Talk to someone with experience with mass email and figure out what I need to do to improve delivery rates.
If possible, take email out of the critical path entirely. Eg go directly from the tickets page to the landing page, with the chat login embedded and filled out so it can go straight into the password manager. Send the landing page via email as a backup.
Only one track, and split over multiple days if necessary.
Organize chat into one room per talk.
Maybe take advantage of the new matrix video rooms.

I'd still want to think more about the format too. Conferences serve many different goals - education, disseminating new results, building community, being a Schelling point for attention etc. I don't feel like we've figured out yet how to adapt these to an online world, and I'm not even sure that it still makes sense to try to serve all those goals in one event.