0017: hytradboi updates, imp stonks, misparaphrasing oracle, technical books, rum, creator economy, friend groups, ub, omg design principles, zig build, fossil and indexes, flatpak, skiplang, convex, fuzzing beyond testing, tigerbeetle dev videos, wafl

HYTRADBOI now has 10 confirmed speakers and a couple more maybes. The submissions will stay open until 2022 Feb 28 - if there is someone you would like to see speak, send them to hytradboi.com to submit a talk.

Also, if anyone can intro me to someone involved with fossil, project lightspeed or our machinery, I'd love to invite them.

I'm leaning towards hosting the conference on matrix and jitsi, which were used succesfully for handmade and fosdem this year. Platforms like gather or rally have much slicker UX for video chat but have anaemic text chat - no code blocks, no images, no threads etc.

Also considering treating the main schedule as a curation problem rather than accept/reject, and having a 'bonus talks' section where we accept every remaining submission and let attendees pick and choose which to watch.

Some tricky bugfixes in imp as a result of adding a new example.

The interesting thing about this example is that order books don't have a natural id column, so normalizing feels very unnatural.

The planned addition of modules/structs might make this somewhat nicer, and I've also been considering switching from sets to bags after repeatedly messing up aggregates over duplicate values.

In a previous post I said of my conversation with an oracle dev:

In the last month I asked for an up-to-date description of the current capabilities and limitations. They replied that it can now decorrelate any subquery except those which are not possible to decorrelate correctly, giving an example. I demonstrated that both cockroachdb and materialize decorrelate the example correctly.

I ran across that email again recently while looking for examples, and realized that I read it unfairly.

Here is the actual text:

For the most part the "restrictions" are non-bypassable checks. These ensure that converting the subquery to a join is guaranteed to give the same result. Unnesting can take place if this check passes.

For example, converting count(*) in a scalar subquery in the select to a join can give different results in the presence of nulls (see example below). Whereas it's safe to unnest many other aggregates (min, avg, etc.)

In hindsight, it seems clear that they were talking about the specific transformations used in oracle, not about what is possible in general.

Prompted by this poll, I've been trying to find more technical books and experience reports to learn from.

I read The Performance of Open Source Applications but the vast majority of the chapters didn't have enough detail to really learn anything at all. For a start, only the chrome chapter established baselines to determine whether their final performance was anywhere near reasonable upper limits.

Others currently on the list, in no particular order:

https://craftinginterpreters.com/ (for the bytecode section)
https://www.goodreads.com/book/show/44647144-database-internals (for the storage engine section)
https://github.com/tpn/pdfs/blob/master/The%20Design%20and%20Implementation%20of%20Modern%20Column-Oriented%20Database%20Systems%20(abadi-column-stores).pdf
https://www.goodreads.com/book/show/56124593-invisible-learning
https://www.goodreads.com/book/show/48816586-software-engineering-at-google
https://www.goodreads.com/book/show/27968891-site-reliability-engineering
https://www.goodreads.com/book/show/18058001-systems-performance (part way through, waiting till I have some perf problems to apply it to)
https://github.com/dendibakh/perf-book
https://www.goodreads.com/book/show/2913411-modern-processor-design

https://dl.acm.org/doi/pdf/10.1145/2882903.2912569

A nice overview of the RUM tradeoff.

https://nadia.xyz/creator-economy

When I imagine a cultural renaissance that inspires me, I think about working together to address unsolved questions, tugging on threads in conversations that need unraveling, creating enduring artifacts for generations to pore over and iterate upon. The 'publish or perish' model that nudges people to rack up more followers is not the pinnacle of creative freedom; it's indentured spiritual servitude.

https://nayafia.substack.com/p/27-friend-groups

To try on a more normative version of friendship, perhaps we could say: "A group of friends who enjoy each others' company ought to build something together."

This isn't a shill for startups; "building something together" doesn't have to mean starting a company. In its most primitive form, it might mean acknowledging the existence of a shared group identity, one that can exist outside of any individual member, and that everyone is building towards.

Reminds me of krewe.

https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html

I will look at this topic from a PL perspective, and argue that Undefined Behavior (or UB for short) is a valuable tool in a language designer's toolbox, and that it can be used responsibly to convey more of the programmer's insight about their code to the compiler with the goal of enabling more optimizations.

https://web.archive.org/web/20220127004837/https://ourmachinery.com/files/guidebook.md.html#omg-design%3Adesignprinciples

Design principles from Our Machinery, ranging from high-level to tiny details of writting c++.

https://zig.news/xq/zig-build-explained-part-1-59lf

Zig comes with an (optional) programmatic build system. You write a small zig program that constructs a graph of build steps and then the zig build command can run any of those steps.

It's also completely undocumented atm. This series explains how it works and how to handle more complex tasks like repackaging c++ projects.

https://www.youtube.com/watch?v=ghtpJnrdgbo

A list of problems that are hard to solve in git, but easy in fossil. Many of these come down to that fact that fossil uses sqlite for storage, so tasks that require a full scan of the repo in git are just a 'CREATE INDEX' away in fossil.

https://ludocode.com/blog/flatpak-is-not-the-future

All of these technologies are essentially building an entire OS on top of another OS just to avoid the challenges of backwards compatibility. In doing so, they create far more problems than they solve. Problems of compatibility are best solved by the OS, the real one, not some containerized bastardization on top. We need to make apps that run natively, that use the system libraries as much as possible. We need to drastically simplify everything if we have any hope of attracting proprietary software to Linux.

Personally, I'm much more interested in how to get Excel and Photoshop on Linux rather than untrustworthy drive-by apps and games, so I don't really care about sandboxing, permissions, portals, app stores, alternate runtimes or really any of the stuff Flatpak does. Those are all counter-productive to convincing Microsoft and Adobe to port their software suites to Linux. Attracting these vendors will only happen by empowering them with a stable platform, not locking them in a box.

I wonder if we're looking at the decline of desktop linux (to the extent that it ever had a rise). Things were approaching actually usable around 2010 but since then I've experienced a steady fall in the number of working features. Restarting pulseaudio has become a daily routine for me. Screensharing on wayland still regularly crashes my entire desktop. Automounting USB drives stopped working for me at some point this year - I'm stuck with bashmount now.

http://skiplang.com/blog/2017/01/04/how-memoization-works.html

Sounds somewhat like embedding salsa directly into the language. Unfortunately most of the key steps are only hinted at, and it doesn't seem like the project is still active.

https://m.youtube.com/watch?v=iizcidmSwJ4&list=PLSE8ODhjZXjbDOFN4U4-Uv95-N8sgzs5D&index=12

I'm still sad that google made such a mess of firebase. Convex seems like an attempt to provide the same experience as OG firebase while also providing a real query language, transactions, online schema migrations and direct integration into react.

https://kripken.github.io/blog/binaryen/2019/06/11/fuzz-reduce-productivity.html

Examples of non-testing uses of fuzzers, like finding examples of why a given line of code needs to be present.

https://www.twitch.tv/videos/1229342192

A series of detailed explanations of TigerBeetle (which is one of the upcoming talks at HYTRADBOI). Too much to summarize but highlights include adapting consensus algorithms to account for storage faults, designing a storage engine that runs without any dynamic memory allocation, using zigs comptime to calculate on-disk layout at compile-time and make assertions about the resulting layout...

https://github.com/fgsect/WAFL/blob/main/roots21-3.pdf

A while ago I linked to a demo of fuzzing using a risc-v emulator, which has some overhead vs fuzzing native code but makes up the difference by allowing easy memory snapshots and emulating syscalls to enable fuzzing parallel.

This paper does something very similar, but reduces the initial overhead by fuzzing wasm in a fork of WAVM.

Also lead me to:

https://aflplus.plus/

https://github.com/AFLplusplus/LibAFL