How (memory) safe is zig?

I keep seeing discussions that equate zig's level of memory safety with c (or occasionally with rust!). Neither is particularly accurate. This is an attempt at a more detailed breakdown.

This article is limited to memory safety. See Assorted thoughts on zig and rust for a more general comparison.

I'm concerned mostly with security. In practice, it doesn't seem that any level of testing is sufficient to prevent vulnerabilities due to memory safety in large programs. So I'm not covering tools like AddressSanitizer that are intended for testing and are not recommended for production use. Instead I'll focus on tools which can systematically rule out errors (eg compiler-inserted bounds checks completely prevent out-of-bounds heap read/write).

I'm also focusing on software as it is typically shipped, ignoring eg bounds checking compilers like tcc or quarantining allocators like hardened_malloc which are rarely used because of the performance overhead.

Finally, note the 'Updated' date below the title. Zig in particular is still under rapid development and will likely change faster than this article updates. (See the tracking issue for safety mechanisms).

I see two major categories of safety mechanisms:

Adhoc runtime checks. These appear in all zig and rust codebases but are very rare in idomatic c. Many of these checks are also idiomatic in modern c++ codebases but are hamstrung by backwards-compatible interfaces. These checks are easy to implement and probably sufficiently non-controversial that any new systems language will have similar features. Examples include:

Pervasive use of a slice type (pointer + length) and bounds-checking reads/writes of those slices.
Disallowing null pointers, except via an 'optional' type which cannot be derefenced without checking for null.
Builtin support for tagged unions which cannot be accessed without checking the tag.
Automatic checking of over/underflow in arithmetic and when casting between numeric types.
Using a separate type for null-terminated strings, to prevent accidentally passing a non-null-terminated string when a null-terminated string was expected (usually when interfacing with c/c++ code).
Tracking pointer alignment in the type system and checking for correct alignment when casting between pointer types.

Composable compile-time proofs. These are unique to rust and are novel, non-trivial to implement and add a significant amount of complexity to the language. The 'composable' part is key. 'Unsafe' code allows adding new axioms to the system, and the compiler verifies that those axioms are composed in valid ways. All rust code is built out of combinations of a small number of such axioms. This is why it has been possible to write complex systems in rust with very little unsafe code and a high level of memory safety, whereas after-the-fact global static analysis has been limited to much more restrictive coding styles. This is also why I don't expect to see a post-hoc static analysis tool for zig that approaches the same level of safety and flexibility that rust achieves - the library apis have to be designed with the proof system in mind.

Zig also has some improvements over c which don't fit into either of these categories:

The standard library includes a set of allocators which catch use-after-free and double-free at runtime. It's not yet clear how high the runtime and memory overhead will be. Similar allocators do exist for c and are not widely used, which makes me somewhat pessimistic, but I'd be happy to be proved wrong.
The pervasive allocator api makes it easier to use arena allocation or garbage-collected pools to simplify lifetime management.
Using defer and errdefer simplifies resource cleanup inside complicated control flow, reducing the possibility of mistakes.
Support for generics almost entirely eliminates the need to cast to/from void pointers.
In zig creating an unitialized variable also requires using the undefined keyword which helps flag such cases for review. In debug/release-safe, assigning undefined to a variable or pointer also fills that memory region with 0xAA, increasing the chance of an immediate crash on access and making debugging easier.
In c unitialized variables are often used when the variable can't easily be initialized by a single expression. In zig it's often possible to avoid uninitialized variables by using a labeled block that returns the initial value, or using an optional type and initializing it to null.
@bitcast produces compile errors if you try to cast between two types with incompatible or undefined representations.
When compiling c code, the zig compiler has safer default options than gcc or clang (eg asan is enabled by default).

To summarize: Zig removes some of the most egregious footguns from c, has better defaults, makes some good practices more ergonomic, and benefits from a fresh start in the standard library (eg using slices everywhere). But it does not nearly approach the level of systematic prevention of memory unsafety that rust achieves. It is still trivial to violate memory safety in zig. Here are some categories I often run into:

// use after free
var hello = try allocator.dupe(u8, "hello world");
allocator.free(hello);
std.debug.print("{s}\n", .{hello});

// use after realloc / iterator invalidation
const init_queue = [5]usize{ 0, 1, 2, 3, 4 };
var queue = try std.ArrayList(usize).initCapacity(allocator, init_queue.len);
try queue.appendSlice(&init_queue);
for (queue.items) |*item| {
    item.* += 1;
    try queue.append(item.*);
}
std.debug.print("{any}\n", .{queue.items});

// invalidating an interior pointer
const Value = union(enum) {
    string: []const u8,
    number: usize,
};
var value = Value{ .number = 42 };
const number = &value.number;
value = Value{ .string = "hello world" };
number.* -= 42;
std.debug.print("{s}\n", .{value.string});

I've seen claims (not from the zig team) that zig has 'complete spatial memory safety'. I suspect this is based on misreading earlier versions of this article where I used 'spatial safety' and 'temporal safety' as the names of the two groups above. I don't know of any formal definition of 'complete spatial memory safety', but any reasonable definition would surely be violated by the interior pointer example above.

How do these differences in mitigations actually translate to numbers of bugs?

In materialize we wrote ~140kloc of rust in the first 14 months while growing the team from ~3 to ~20 people. It's a complex system with high demands on both throughput and latency. We reached that point with (IIRC) only 9 unsafe blocks, all of which were in a single module and existed to work around a performance bug in the equivalent safe api. Despite heavy generative testing and fuzzing, we only discovered one memory safety bug (in the unsafe module, naturally) which was easy to debug and fix.

By comparison, in several much smaller and much simpler zig codebases where I am the only developer, I run into multiple memory safety bugs per week. This isn't a perfect comparison, because my throwaway research projects in zig are not written carefully (=> more bugs added) but are also not tested thoroughly (=> fewer bugs detected). But it does make me doubt my ability to ship secure zig programs without substantial additional mitigations.

In at least one of those codebases, the memory safety bugs are outnumbered 20:1 by bounds-check panics. So I assume that if I wrote that same project in idiomatic c (ie without bounds checks) then I would encounter at least 20x as many memory safety bugs per week.

In an older version of this article I tried to use CVE reports to guesstimate how many CVEs would have been prevented by using zig instead of c or c++. This involved far too much guesswork to be informative so I have removed it.

I work mostly on query languages, database engines, streaming systems etc. Latency, memory usage and memory access patterns are critical. Until recently almost all of these systems were written in c, c++ or java.

In java, the typical strategy is to have the data plane operate on hand-packed off-heap buffers (eg arrow) so that the gc only has to traverse the much smaller heap in the control plane. This does work, but it's painful and the performance ceiling is usually lower than c++ (see eg redpanda vs kafka, scylladb vs cassandra, java vs c++ implementations of aeron).

On the other hand, it seems impossible to secure c or c++. Even a codebase as heavily tested as sqlite is vulnerable to code execution from untrusted sql. This isn't a fatal problem for traditional database deployments hidden behind a trusted backend server (as long as the backend prevents sql injection attacks). But it's a big deal for backend-as-a-service companies, multi-tenant cloud databases, and even operating systems like android/ios where apps aren't trusted but still need to access shared databases.

In this context rust is wildly appealing. The performance ceiling is similar to c++, the effort required for security is similar to java, and all of this comes packaged in a clean-slate design that had the chance to avoid the biggest mistakes of both. There are still some major pain points: the allocator api is still unstable and used by very few libraries which means that eg arena/slab allocation requires rewriting libraries, self-referential objects are still very limited, pinning is error-prone, there's no equivalent to placement new etc. But it's not as painful as manually packing bytes in java and trying to reason about performance of the jit and gc, or trying to expose a c++ program to untrusted input.

Despite that, I don't think that the future of data systems is rust, only rust and nothing but rust.

First, managed languages continue to push the performance ceiling:

Garbage collectors are constantly improving eg azul zing consistently trounces hotspot in latency benchmarks.

Historically, garbage-collected languages have been pointer-heavy and provided little control over memory layout, because a) it's easier to write a garbage collector for a language with simple layout and b) several decades ago, when most of these languages were designed, the relative cost of a cache miss was much lower. But this is changing. C# has value types and java is working on them. Julia's combination of value types and parametric struct types provides even more control - allowing writing eg btrees which store data inline. Newer languages also make it easier to provide zero-cost abstractions over manual byte-packing (eg blobs.jl).

There are also promising experiments with non-garbage-collected managed languages. Eg both val and roc feature deterministic memory management and c/rust-like levels of control over memory layout while still preserving the simple experience of a garbage-collected language. I think it's likely that we'll see at least one production-ready language in this vein within the next decade.

Second, there are several niches where rust's memory safety isn't as huge an advantage:

In contexts where memory safety bugs are harder to exploit there is less pressure on language-level guarantees of memory safety. Eg TigerBeetle is a single-tenant database which talks only to trusted clients using an easily-parseable binary protocol over a private network and doesn't dynamically allocate memory. Memory safety bugs are still bugs and so need to be prevented, but they are less likely to occur and it's hard to see how they could be exploited. Writing TigerBeetle in rust instead of zig might make some bugs easier to catch but would also make other areas more error-prone eg the compile-time configuration would have to be replaced by adhoc code generation.

There may also be a niche for zig in untrusted plugins that are already sandboxed. Eg in a serverless http handler where every request is a fresh wasm sandbox, a zig program that has runtime checks turned on and mostly relies on arena allocation (or even static pre-allocation) might be reasonably secure. Zig's ability to produce very small wasm binaries, start quickly and keep memory usage low seems appealing here. I'm also interested in using comptime configuration to aggresively specialize eg html templates and database queries. (My experiments in julia were promising but the julia runtime is not well-suited to wasm. My experiments in rust were a Turing-tarpit puzzle session. But zig is practically tailor-made for this.)

Larger zig programs might potentially be secured by building programs out of multiple wasm sandboxes (see eg rlbox, wasmboxc). This is untested as yet - we don't know how much safety vs how much performance overhead we'll get - but I expect to see this explored anyway for hardening legacy code and protecting against supply chain attacks, and then maybe we can extrapolate the results to zig. Similarly for hardware-assisted mitigations like cheri.

In the long run, if rust (and other memory-safe languages) can be used everywhere but zig is only usable in certain niches, then network effects will give rust a huge advantage even in those niches. So I suspect zig's future, at least in my field, hinges on the successful development of cheap runtime mitigations.

I hope that this doesn't cause PL researchers and language designers to ignore zig though. The comptime mechanism dramatically simplifies the language and enables new kinds of abstractions, and I'm only just starting to explore the possibilities for aggressive compile-time specialization of all kinds of libraries. Even if the lack of memory safety makes industrial adoption harder, I want to see other languages explore this mechanism and push it even further. Could we combine it with a Julia-like dynamic type system (the version of zig executed at comptime is dynamically-typed and garbage-collected)? Could we remove the two stage limit and have a builtin function for runtime specialization? Could we mix memory-safe and -unsafe subsets within a language (like terra but with a single language)? Maybe even limit the unsafe subset to running inside wasm sandboxes? There is so much potential here.