How safe is zig?

Published 2021-03-19
Updated 2021-10-04

I keep seeing discussions that equate zigs level of memory safety with c, or occasionally with rust. Neither is particularly accurate. This is an attempt at a more detailed breakdown.

I'm concerned mostly with security. In practice, it doesn't seem that any level of testing is sufficient to prevent vulnerabilities due to memory safety in large programs. So I'm not covering tools like AddressSanitizer that are intended for testing and are not recommended for production use. Instead I'll focus on tools which can systematically rule out errors (eg compiler-inserted bounds checks completely prevent out-of-bounds heap read/write).

I'm also focusing on software as it is typically shipped, ignoring eg bounds checking compilers like tcc or quarantining allocators like hardened_malloc which are rarely used because of the performance overhead.

Finally, note the 'Updated' date below the title. Zig in particular is still under rapid development and will likely change faster than this article updates. (See the tracking issue for safety mechanisms).

Here are the issues against which c/zig/rust have systematic protection:

issueczig (release-safe)rust (release)
out-of-bounds heap read/writenoneruntimeruntime
null pointer dereferencenoneruntime⁰runtime⁰
type confusionnoneruntime, partial¹runtime²
integer overflownoneruntimeruntime³
use after freenonenone⁴compile time
double freenonenone⁴compile time
invalid stack read/writenonenonecompile time
uninitialized memorynonenonecompile time
data racenonenonecompile time
  1. optional types
  2. tagged unions, doesn't protect against holding a pointer to value while changing tag
  3. tagged unions
  4. not by default, but available via compiler setting or by linting against unchecked arithmetic
  5. optional protections exist but I expect the runtime overhead to be unacceptable in many domains - see discussion here and here

There are two clear groups here:

So we can say that zigs spatial memory safety is roughly comparable to rust, and its temporal memory safety and data race safety are roughly comparable to c.

Zig also has some non-systematic improvements over c with regards to temporal memory safety:

Zig also has a number of tools to help detect violations of temporal memory safety during testing. These are very helpful for development, but experience with c indicates that they won't be sufficient to eliminate vulnerabilities.

I tried looking at some public breakdowns of security issues from various projects written in c and c++ (mostly sourced from Alex Gaynors handy summary) to get a sense of the relative frequencies of different kinds of errors:

This isn't a very clear picture. The percentages vary wildly between projects. The categories are sufficiently vague that I could be classifying them all wrong. Looking only at fixed issues tells us nothing about how easy they are to exploit, but looking at existing exploits limits us to a very small dataset.

It certainly seems like just fixing spatial memory safety (going from c to zig) is a non-trivial improvement. But I'd like to better understand why actual exploits appear here to rely more often on violating temporal memory safety.

When does this matter?

Rust bears additional complexity and friction to buy temporal memory safety and data race safety. But sometimes we might be able to buy those more cheaply eg:

Sometimes we might also just choose the bear the cost. For systems with low risk profiles (eg internal software that is never exposed to hostile input) we might decide that debugging the occasional use-after-free is preferable to adding development friction.

There are certainly systems though where none of the above are options. For example, the web spec pretty much mandates that browsers must have complicated ownership models, use pervasive sharing between threads and be constantly exposed to hostile inputs. In such cases it's hard to make an argument for zig, unless alongside some additional system of protection like memgc or rlbox.