How enum-specific compiler optimizations keep our programs performant
Recently, while touring the std::io::Result
source code, I found something that challenged my understanding of Rust’s enum types.
On 64-bit systems, std::io::Error
is a wrapper around a bit-packed internal representation, Repr
:
pub struct Error {
repr: Repr,
}
struct Repr(NonNull<()>, PhantomData<ErrorData<Box<Custom>>>)
The definition of Repr looks spooky, but the details aren’t important. All you need to know is that, despite representing several possible kinds of IO error, clever bit packing means that Repr
(and therefore io::Error
) fits into a single, 64-bit machine word.
From the documentation on Repr
’s bit packing, the following commentary about io::Result
caught my eye:
"This optimization not only allows io::Error
to occupy a single pointer, but improves io::Result
as well, especially for situations like io::Result<()>
(which is now 64 bits) […]."
Recall that io::Result<()>
is an alias for
std::result::Result<(), std::io::Error>
And that result::Result
is an enum with two variants:
enum Result<T, E> {
Ok(T),
Err(E),
}
We’ve learned that io::Error
is exactly 64 bits. So, how is io::Result<()>
, a type that seems to convey substantially more information than a lone io::Error
, still only 64 bits?
To answer this question, let’s first recap how enums are laid out in memory.
If you want all the gory details, Amos at fasterthanli.me drills down to enum bedrock in his classic investigation into the size of small string types. For now, all we need to know is that an enum value typically comprises two things:
io::Result
, the value associated with the Err
variant is an instance of io::Error
.The total size of an enum is thus the size of the discriminant plus the size of the largest possible associated field. Rust doesn’t know what variant will appear at runtime, so it always allocates enough space for the biggest variant.
The default representation of a discriminant is an isize
value — eight bytes on 64-bit systems. However, the compiler is allowed to use a smaller type if it chooses. The exact circumstances under which this happens are unspecified. The size may even change between compilations on the same machine!
To avoid confusion, the next few examples use the #[repr(u64)]
directive when defining enum types. This prompts the compiler to use the layout that C would use for the type, choosing u64
for enum discriminants.
Here’s an enum representing a variety of input events:
#[repr(u64)]
enum InputEvent {
KeyPress(char), // discriminant = 0
MouseClick(u64, u64), // discriminant = 1
}
The size of a KeyPress
on its own would be 4 bytes for the char
plus 8 bytes for the discriminant. A total of 12 bytes. But KeyPress
doesn’t exist in isolation. Rust allocates enough space to store the largest field — MouseClick
’s (u64, u64)
— and pads any unfilled space in variants with smaller fields. The size of an InputEvent
is, therefore, 24 bytes: three u64
s.
With the basic behavior of enums established, let me ask you: what is the size of Result<T, E>
? Result<T>
has two variants: Ok(T)
and Err(E)
.
Hence, its size is typically that of its discriminant plus the larger of T
and E
. Let’s see an example:
use std::error::Error;
use std::mem::size_of;
#[repr(u64)]
enum Result<T, E> {
Ok(T),
Result(E),
}
println!("{}", size_of::<Result<u128, Box<dyn Error>>>());
// => 24
println!("{}", size_of::<Result<u64, Box<dyn Error>>>());
// => 24
println!("{}", size_of::<Result<u32, Box<dyn Error>>>());
// => 24
println!("{}", size_of::<Result<(), Box<dyn Error>>>());
// => 24
A boxed trait object like our Box<dyn Error>
is two-pointers wide – 16 bytes on 64-bit platforms. In all of these examples, T
is equal or lesser in size compared to the Box
, but the size of Result
stays constant at 24 bytes in order to store the boxed Err
variant—should it occur — plus the eight-byte discriminant.
Right. Let’s get weird with this. Here’s the same example with the repr
directive removed from the definition of Result
, meaning the Rust compiler can choose its own representation of the discriminant.
enum Result<T, E> {
Ok(T),
Result(E),
}
println!("{}", size_of::<Result<u128, Box<dyn Error>>>());
// => 24
println!("{}", mem::size_of::<Result<u64, Box<dyn Error>>>());
// => 24
println!("{}", mem::size_of::<Result<u32, Box<dyn Error>>>());
// => 24
println!("{}", :mem::size_of::<Result<(), Box<dyn Error>>>());
// => 16
In the first three cases, we can see that the compiler has used the default representation for a discriminant: an isize
value, which is 64 bits on my machine. Remember, the compiler can change its mind about this, so there’s no guarantee you’ll see the same results.
The fourth print statement reveals a special case! Just like io::Result<()>
, result::Result<(), Box<dyn Error>>
is precisely the size of its error variant. When you give the compiler free rein, the discriminant seems to vanish into thin air.
Since this is black magic, only the Rustonomicon can tell us what’s happening. Under Data Layout: repr(Rust), we find:
"Naively, an enum such as:
enum Foo {
A(u32),
B(u64),
C(u8),
}
"might be laid out as
struct FooRepr {
data: u64, // this is either a u64, u32, or u8 based on `tag`
tag: u8, // 0 = A, 1 = B, 2 = C
}
"However there are several cases where such a representation is inefficient. The classic case of this is Rust’s “null pointer optimization”: an enum consisting of a single outer unit variant (e.g., None
) and a (potentially nested) non-nullable pointer variant (e.g., Some(&T)
) makes the tag unnecessary. A null pointer can safely be interpreted as the unit (None
) variant. The net result is that, for example, size_of::<Option<&T>>() == size_of::<&T>()
."
There you have it. Whenever you have a two-variant enum, such as Option
or Result
, where one variant has no field or a field of the unit type, ()
, and the other has a non-unit field. Rust optimizes away the need for a discriminant by treating the unit variant as a null pointer.
This optimization is entirely transparent. You’ll never handle this null pointer directly.
In enums, as in everything else, Rust gives us the best possible performance while shielding us from unsafe code.