DISCLAIMER: I talk a bit about red-black trees but only to provide motivation for the concepts in this post. No extensive knowledge is required. You should however be very familiar with some more advanced syntax like labeled breaks and align(size) to understand some of the code.
Hi there ziggers! (well, new to the community, is ziggits the term?)
This is my first post and it is sort of an idea dump which I thought could be useful. It came to be after looking at the arcane zig std red-black tree code in std.rb, which contains this familiar struct:
const Color = enum(u1) {
red,
black,
}
pub const Node = struct {
left: ?*Node,
right: ?*Node,
/// parent | color
parent_and_color: usize,
// ...
};
This definition closely resembles the Linux kernel red black tree implementation and is right up zig's data oriented design philosophy: we join an enum(u1)
with a pointer and create a tagged pointer. This saves up to 7 bytes of padding, per node.
Why tagged pointers?
Consider the following naive red-black tree node definition:
const Color = enum(u1) {
red,
black,
}
pub const Node = struct {
left: ?*Node,
right: ?*Node,
parent: ?*Node,
color: Color,
// Oh oh - we're only using 1 bit for color, but the
// compiler can add up to 7 bytes of padding, massively
// increasing the struct size.
// DISCALIMER: this is not valid syntax
compiler_padding: UP TO 7 BYTES
// ...
};
Notice compiler_padding
- There's a huge price for just 1 bit of information. This is a classic data oriented design anti-pattern of storing booleans or short enums in structures, which has a multitude of solutions (for example SOA).
A solution
Come fourth tagged pointers - take an aligned pointer. such a pointer must have it's log_2(alignment)
bits unset via the definition of an aligned pointer:
const a: align(4) u8 = 1;
const p: *align(4) u8 = &some_ptr;
const int_value = @intFromPtr(some_ptr); // Assume 32-bits
// int_value == xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx00
We can abuse that fact to store values in the unused bits, which is perfect for our situation - just store color in the lowest bit of the pointer and then mask appropriately. This can be done by converting the pointer to an integer address and doing some bit-fiddling with it. We obviously benefit from smaller memory usage, and it has other advantages.
The problem is that it's hard to create and manage these pointers manually as in std.rb
. Each time you use such an optimization you can accidentally mess up the sizes or arithmetic and end up with UB. It's too much work reasoning about this for many, differently sized types. Luckily, zig comptime offers us an easy way to build an API that just works while removing a lot of the unsafety-ness.
Type-safe tagged pointers in zig
Using a bit of comptime magic, we can create a generic tagged pointer with some comptime checked safety guarantees and ease-of-use ergonomics. I managed to come up with two ways for this, but there might be more.
Thanks to @xouncle in discord, it seems to be that both versions emit the same assembly if you make sure that the order of the fields in the packed struct conform to the same layout as the non-packed struct, which is great news for zig and for packed struct usage in general. see it here. This means that we can use the much less error-prone and more readable implementation using a packed struct.
Before we continue to the implementations, here's the aforementioned safety check:
if (@ctz(@as(usize, @alignOf(PtrType))) < @bitSizeOf(TagType)) {
@compileError("Not enough unused bits in " ++ @typeName(PtrType) ++ " to store a tag");
}
This simply checks that there are enough trailing zeroes[1] to accommodate for all possible values of TagType
. If there's not enough space, the compiler will stop you.
First attempt: packed struct
pub fn TaggedPackedPtr(
comptime PtrType: type,
comptime TagType: type,
) type {
return packed struct(usize) {
const BackingIntegerType = backing_integer: {
var info = @typeInfo(usize);
info.int.bits -= @bitSizeOf(TagType);
break :backing_integer @Type(info);
};
tag: TagType,
ptr: BackingIntegerType,
pub fn from(ptr: ?*PtrType, tag: TagType) @This() {
return @This(){ .tag = tag, .ptr = @intCast(@intFromPtr(ptr) >> @bitSizeOf(TagType)) };
}
pub fn getPtr(self: @This()) ?*PtrType {
return @ptrFromInt(@as(usize, self.ptr) << @bitSizeOf(TagType));
}
pub fn setPtr(self: *@This(), ptr: ?*PtrType) void {
self.ptr = @intCast(@intFromPtr(ptr) >> @bitSizeOf(TagType));
}
};
}
Let's digest this code:
- First, we dynamically create the appropriate integer-bit-width type
BackingIntegerType
that would store the address of the pointer minus it's unused bits. Since the struct must the same size as ausize
to accomodate for any aligned pointer,@bitSizeOf(BackingIntegerType) + @bitSizeOf(TagType) = @bitSizeOf(usize)
. In the red-black tree example, this type would be au63
because we need 1 bit to storeColor
. - Secondly, we create some getters and setters that transform the original
ptr
back and fourth between?*PtrType
andBackingIntegerType
by bit-shifting it the same amount of bits as@bitSizeOf(TagType)
, eliminating the unused bits and storing only the necessary information.
And that's it. If we want to change our red-black tree implementation now, we would use it like so:
const Color = enum(u1) {
red,
black,
}
pub const Node = struct {
left: ?*Node,
right: ?*Node,
parent_and_color: TaggedPackedPtr(Node, Color), // NO PADDING!
fn getParent(self: *Self) ?*Self {
return self.parent_and_color.getPtr();
}
fn setParent(self: *Self, parent: ?*Self) void {
self.parent_and_color.setPtr(parent);
}
fn getColor(self: *Self) Color {
return self.parent_and_color.tag;
}
fn setColor(self: *Self, color: Color) void {
self.parent_and_color.tag = color;
}
};
This works, but I was worried of some performance pitfalls - I wasn't exactly sure how zig handles packed struct field access. Therefore we can use a second solution - one that simply generalizes the implementation in std.rb
to support any bit-width tag, with the same safety checks.
Second attempt: oh no, where did the types go!
pub fn TaggedPtr(comptime PtrType: type, comptime TagType: type) type {
return struct {
data: usize = 0,
// Returns a number with @bitSizeOf(TagType) LSB set.
const TAG_MASK: usize = (1 << @bitSizeOf(TagType)) - 1;
pub inline fn from(ptr: ?*PtrType, meta: TagType) @This() {
return @This(){ .data = @intFromPtr(ptr) | @intFromEnum(meta) };
}
pub inline fn getPtr(self: @This()) ?*PtrType {
return @ptrFromInt(self.data & ~TAG_MASK);
}
pub inline fn setPtr(self: *@This(), ptr: ?*PtrType) void {
self.data = @intFromPtr(ptr) | (self.data & TAG_MASK);
}
pub inline fn getTag(self: @This()) TagType {
return @enumFromInt(self.data & TAG_MASK);
}
pub inline fn setTag(self: *@This(), meta: TagType) void {
self.data = (self.data & ~TAG_MASK) | @intFromEnum(meta);
}
};
}
This is the manual, C-like way of doing stuff - cast the pointer and manually mask out the correct bits. Pray you don't make a mistake. Thankfully, this is simple enough and self-contained, which means we only have to do this once then have zig calcualte all the correct offsets for any type at compile time.
This approach assumes that zig will ensure that the size of the struct is indeed the same as usize
, it's only field (in the actual code it's checked at comptime), it does not rely on field accesses to packed structs and we don't need to bit-shift the pointer each time we set or get it.
Since the API is the same, the "client" code is hardly different (in fact, the APIs of both approaches, packed and non-packed can be made identical and switched between transparently simply via adding the setTag
and getTag
functions).
To give some other motivating examples which might be more relevant than packing struct fields, it can be used as a tagged union to either the same type of pointer or some opaque pointer with known minimum alignment:
const UnionTag = enum(u3) {
tag1,
tag2,
tag3,
tag4,
};
const LargeTaggedUnion = union(UnionTag) {
tag1: SomeOpaquePtr,
tag2: SomeOpaquePtr,
tag3: SomeOpaquePtr,
tag4: SomeOpaquePtr,
};
const SmallTaggedUnion = TaggedPtr(SomeOpaquePtr, UnionTag);
// @sizeOf(LargeTaggedUnion) == 16;
// @sizeOf(SmallTaggedUnion) = 8;
// Using SmallTaggedUnion is hardly any different
const small_union = SmallTaggedUnion{ .tag1 = some_ptr };
const payload = small_union.getPtr();
switch (small_union.getTag()) {
tag1 => { payload.doSomethingFirstVariant() },
tag2 => { payload.doSomethingSecondVariant() },
tag3 => { payload.doSomethingThirdVariant() },
else => {},
}
Zig will enforce that the number of variants in UnionTag
can fit inside the unused bits of SomeOpaquePtr
. There is of course some remaining unsafety when using it as a tagged union, since it allows you to access "dead" fields, bypassing zig's union field access safety checks. If it's chosen to be used as such, use it with care.
Caveats
There are a couple of important caveats:
- The code isn't benchmarked, so I cannot give a good performance guarantee. This might really suck, or the cache benefits are great. Only profiling will let us know.
- This can be somewhat of a niche pattern and might not be worth the added complexity, although very minimal, in the general case.
- Portability. Tagged Pointers might not be portable for all platforms, since there's no guarantee that your OS or even hardware will not temper with these unused low bits.
Possible improvements
- We might be able to abuse the high bits of pointers and not only the aligned low bits in places where it's guaranteed by the architecture and the OS that such bits are unused.
- The size of
TaggedPtr
doesn't necessarily need to be usize, and an extra genericcomptime_int
parameter can be added to the type enforce the size of theTaggedPtr
struct. - The second implementation assumes
TagType
is an enum but it shouldn't be mandatory, and with a bit of compile time complexity it should be a very easy fix.
That is all! If I've made any mistakes or incorrect statements please share them with me and I will improve this article as needed. Also, in case anyone has any concrete information on the memory access patterns of packed structs I would love to read on them. Hope you had a good reading!
Footnotes:
- Trailing zeroes are the number of unset bits after the lowest set least-significant-bit, or in simple terms, the number of consecutive zeroes starting from the right hand side of a binary number.
Top comments (1)
Neat!
On x86_64, or possibly any target with 64-bit pointers, I think the top 16-bits are unused. I recently used the space for a enum for IO_URING which only allows a single 64-bit context/user field. Your comptime is much nicer than what I did.
Thanks for posting.