Zig NEWS

Cover image for Type-safe Tagged Pointers with comptime
Or-Gold
Or-Gold

Posted on • Updated on

Type-safe Tagged Pointers with comptime

DISCLAIMER: I talk a bit about red-black trees but only to provide motivation for the concepts in this post. No extensive knowledge is required. You should however be very familiar with some more advanced syntax like labeled breaks and align(size) to understand some of the code.

Hi there ziggers! (well, new to the community, is ziggits the term?)

This is my first post and it is sort of an idea dump which I thought could be useful. It came to be after looking at the arcane zig std red-black tree code in std.rb, which contains this familiar struct:

const Color = enum(u1) {
    red,
    black,
}

pub const Node = struct {
    left: ?*Node,
    right: ?*Node,
    /// parent | color
    parent_and_color: usize,
    // ...
};
Enter fullscreen mode Exit fullscreen mode

This definition closely resembles the Linux kernel red black tree implementation and is right up zig's data oriented design philosophy: we join an enum(u1) with a pointer and create a tagged pointer. This saves up to 7 bytes of padding, per node.

Why tagged pointers?

Consider the following naive red-black tree node definition:

const Color = enum(u1) {
    red,
    black,
}

pub const Node = struct {
    left: ?*Node,
    right: ?*Node,
    parent: ?*Node,
    color: Color,
    // Oh oh - we're only using 1 bit for color, but the
    // compiler can add up to 7 bytes of padding, massively
    // increasing the struct size.
    // DISCALIMER: this is not valid syntax
    compiler_padding: UP TO 7 BYTES
    // ...
};
Enter fullscreen mode Exit fullscreen mode

Notice compiler_padding - There's a huge price for just 1 bit of information. This is a classic data oriented design anti-pattern of storing booleans or short enums in structures, which has a multitude of solutions (for example SOA).

A solution

Come fourth tagged pointers - take an aligned pointer. such a pointer must have it's log_2(alignment) bits unset via the definition of an aligned pointer:

const a: align(4) u8 = 1;
const p: *align(4) u8 = &some_ptr;
const int_value = @intFromPtr(some_ptr); // Assume 32-bits
// int_value == xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx00
Enter fullscreen mode Exit fullscreen mode

We can abuse that fact to store values in the unused bits, which is perfect for our situation - just store color in the lowest bit of the pointer and then mask appropriately. This can be done by converting the pointer to an integer address and doing some bit-fiddling with it. We obviously benefit from smaller memory usage, and it has other advantages.

The problem is that it's hard to create and manage these pointers manually as in std.rb. Each time you use such an optimization you can accidentally mess up the sizes or arithmetic and end up with UB. It's too much work reasoning about this for many, differently sized types. Luckily, zig comptime offers us an easy way to build an API that just works while removing a lot of the unsafety-ness.

Type-safe tagged pointers in zig

Using a bit of comptime magic, we can create a generic tagged pointer with some comptime checked safety guarantees and ease-of-use ergonomics. I managed to come up with two ways for this, but there might be more.

Before we continue to the implementations, here's the aforementioned safety check:

if (@ctz(@as(usize, @alignOf(PtrType))) < @bitSizeOf(TagType)) {
    @compileError("Not enough unused bits in " ++ @typeName(PtrType) ++ " to store a tag");
}
Enter fullscreen mode Exit fullscreen mode

This simply checks that there are enough trailing zeroes[1] to accommodate for all possible values of TagType. If there's not enough space, the compiler will stop you.

First attempt: packed struct

pub fn TaggedPackedPtr(
    comptime PtrType: type,
    comptime TagType: type,
) type {
    return packed struct(usize) {
        const BackingIntegerType = backing_integer: {
            var usize_type_info = @typeInfo(usize);
            usize_type_info.int.bits -= @bitSizeOf(TagType);
            break :backing_integer @Type(usize_type_info);
        };

        ptr: BackingIntegerType,
        tag: TagType,

        pub fn from(ptr: ?*PtrType, tag: TagType) @This() {
            return @This(){ .tag = tag, .ptr = @intCast(@intFromPtr(ptr) >> @bitSizeOf(TagType)) };
        }

        pub fn getPtr(self: @This()) ?*PtrType {
            return @ptrFromInt(@as(usize, self.ptr) << @bitSizeOf(TagType));
        }

        pub fn setPtr(self: *@This(), ptr: ?*PtrType) void {
            self.ptr = @intCast(@intFromPtr(ptr) >> @bitSizeOf(TagType));
        }
    };
}
Enter fullscreen mode Exit fullscreen mode

Let's digest this code:

  1. First, we dynamically create the appropriate integer-bit-width type BackingIntegerType that would store the address of the pointer minus it's unused bits. Since the struct must the same size as a usize to accomodate for any aligned pointer, @bitSizeOf(BackingIntegerType) + @bitSizeOf(TagType) = @bitSizeOf(usize). In the red-black tree example, this type would be a u63 because we need 1 bit to store Color.
  2. Secondly, we create some getters and setters that transform the original ptr back and fourth between ?*PtrType and BackingIntegerType by bit-shifting it the same amount of bits as @bitSizeOf(TagType), eliminating the unused bits and storing only the necessary information.

And that's it. If we want to change our red-black tree implementation now, we would use it like so:

const Color = enum(u1) {
    red,
    black,
}

pub const Node = struct {
    left: ?*Node,
    right: ?*Node,
    parent_and_color: TaggedPackedPtr(Node, Color), // NO PADDING!

    fn getParent(self: *Self) ?*Self {
        return self.parent_and_color.getPtr();
    }

    fn setParent(self: *Self, parent: ?*Self) void {
        self.parent_and_color.setPtr(parent);
    }

    fn getColor(self: *Self) Color {
        return self.parent_and_color.tag;
    }

    fn setColor(self: *Self, color: Color) void {
        self.parent_and_color.tag = color;
    }
};
Enter fullscreen mode Exit fullscreen mode

This works, but I was worried of some performance pitfalls - I am not actually exactly sure how zig handles packed struct fields access. Therefore we can use a second solution - one that simply generalizes the implementation in std.rb to support any bit-width tag, with the same safety checks.

Second attempt: oh no, where did the types go!

pub fn TaggedPtr(comptime PtrType: type, comptime TagType: type) type {
    return struct {
        data: usize = 0,

        // Returns a number with @bitSizeOf(TagType) LSB set.
        const METADATA_MASK: usize = (1 << @bitSizeOf(TagType)) - 1;

        pub inline fn from(ptr: ?*PtrType, meta: TagType) @This() {
            return @This(){ .data = @intFromPtr(ptr) | @intFromEnum(meta) };
        }

        pub inline fn getPtr(self: @This()) ?*PtrType {
            return @ptrFromInt(self.data & ~METADATA_MASK);
        }

        pub inline fn setPtr(self: *@This(), ptr: ?*PtrType) void {
            self.data = @intFromPtr(ptr) | (self.data & METADATA_MASK);
        }

        pub inline fn getTag(self: @This()) TagType {
            return @enumFromInt(self.data & METADATA_MASK);
        }

        pub inline fn setTag(self: *@This(), meta: TagType) void {
            self.data = (self.data & ~METADATA_MASK) | @intFromEnum(meta);
        }
    };
}
Enter fullscreen mode Exit fullscreen mode

This is the manual, C-like way of doing stuff - cast the pointer and manually mask out the correct bits. Pray you don't make a mistake. Thankfully, this is simple enough and self-contained, which means we only have to do this once then have zig calcualte all the correct offsets for any type at compile time.
This approach assumes that zig will ensure that the size of the struct is indeed the same as usize, it's only field (in the actual code it's checked at comptime), it does not rely on field accesses to packed structs and we don't need to bit-shift the pointer each time we set or get it.

Since the API is the same, the "client" code is hardly different (in fact, the APIs of both approaches, packed and non-packed can be made identical and switched between transparently simply via adding the setTag and getTag functions).

To give some other motivating examples which might be more relevant than packing struct fields, it can be used as a tagged union to either the same type of pointer or some opaque pointer with known minimum alignment:

const UnionTag = enum(u3) {
    tag1,
    tag2,
    tag3,
    tag4,
};

const LargeTaggedUnion = union(UnionTag) {
    tag1: SomeOpaquePtr,
    tag2: SomeOpaquePtr,
    tag3: SomeOpaquePtr,
    tag4: SomeOpaquePtr,
};


const SmallTaggedUnion = TaggedPtr(SomeOpaquePtr, UnionTag);

// @sizeOf(LargeTaggedUnion) == 16;
// @sizeOf(SmallTaggedUnion) = 8;

// Using SmallTaggedUnion is hardly any different
const small_union = SmallTaggedUnion{ .tag1 = some_ptr };

const payload = small_union.getPtr();

switch (small_union.getTag()) {
    tag1 => { payload.doSomethingFirstVariant() },
    tag2 => { payload.doSomethingSecondVariant() },
    tag3 => { payload.doSomethingThirdVariant() },
    else => {},
}
Enter fullscreen mode Exit fullscreen mode

Zig will enforce that the number of variants in UnionTag can fit inside the unused bits of SomeOpaquePtr. There is of course some remaining unsafety when using it as a tagged union, since it allows you to access "dead" fields, bypassing zig's union field access safety checks. If it's chosen to be used as such, use it with care.

Caveats

There are a couple of important caveats:

  1. The code isn't benchmarked, so I cannot give a good performance guarantee. This might really suck, or the cache benefits are great. Only profiling will let us know.
  2. This can be somewhat of a niche pattern and might not be worth the added complexity, although very minimal, in the general case.
  3. Portability. Tagged Pointers might not be portable for all platforms, since there's no guarantee that your OS or even hardware will not temper with these unused low bits.

Possible improvements

  1. We might be able to abuse the high bits of pointers and not only the aligned low bits in places where it's guaranteed by the architecture and the OS that such bits are unused.
  2. The size of TaggedPtr doesn't necessarily need to be usize, and an extra generic comptime_int parameter can be added to the type enforce the size of the TaggedPtr struct.
  3. The second implementation assumes TagType is an enum but it shouldn't be mandatory, and with a bit of compile time complexity it should be a very easy fix.

That is all! If I've made any mistakes or incorrect statements please share them with me and I will improve this article as needed. Also, in case anyone has any concrete information on the memory access patterns of packed structs I would love to read on them. Hope you had a good reading!


Footnotes:

  1. Trailing zeroes are the number of unset bits after the lowest set least-significant-bit, or in simple terms, the number of consecutive zeroes starting from the right hand side of a binary number.

Latest comments (0)