Martin

Posted on Mar 5, 2024 • Updated on Mar 9, 2024

A "distinct index types" journey

#learn

A little journey report: How to use different-but-same types for different handle-spaces / (internal) indices

I was faced with a programming error, and I was wondering how I'd overcome it. See, I've assigned a value from datatype A to a variable of datatype B and the compiler didn't tell me. Well, naturally - after all under the hood A==B.

const CatCounter = usize;
const DogLegIndex = usize;

const some_cats: CatCounter = 127;
var paw_index : DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

I was using these as indices, and thought it wise to mark in my context struct which is which "Index" or "ID", here: the CatCounter and DogLegIndex types. I was quite surprised when I managed to assign one to the other without warning or error. (Ok, it's quite obvious when you boil it down like this, imagine various indirections, loops, collecting bits and pieces from structures, arrays, what have you).

So what I was looking for was a way to generate different unique types with the same underlying representation. I was thinking of using a struct, accessing the real index member.

const CatCounter = struct { idx: usize };
const DogLegIndex = struct { idx: usize };
const some_cats = CatCounter{ .idx = 127 };
var paw_index: DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

Here we've gained type safety, at the cost of a few chars more to type on each use. The above will fail to compile:

src/stage3.zig:6:17: error: expected type 'stage3.DogLegIndex', found 'stage3.CatCounter'
    paw_index = some_cats;
                ^~~~~~~~~
src/stage3.zig:1:20: note: struct declared here
const CatCounter = struct { idx: usize };
                   ^~~~~~~~~~~~~~~~~~~~~
src/stage3.zig:2:21: note: struct declared here
const DogLegIndex = struct { idx: usize };
                    ^~~~~~~~~~~~~~~~~~~~~

Nice. I was wondering if that's a good way, or if there's another, a better, a more common pattern. So I asked over on #zig-help, and learned: "(..) a common way to have distinct int types is non exhaustive enums", so:

const CatCounter = enum(usize) { _ };
const DogLegIndex = enum(usize) { _ };

Getting to/from the backing int with @enumFromInt and @intFromEnum.

Fine then, let's see that compiler error:

src/stage4.zig:6:17: error: expected type 'stage4.DogLegIndex', found 'stage4.CatCounter'
    paw_index = some_cats;
                ^~~~~~~~~
src/stage4.zig:1:20: note: enum declared here
const CatCounter = enum(usize) { _ };
                   ^~~~~~~~~~~~~~~~~
src/stage4.zig:2:21: note: enum declared here
const DogLegIndex = enum(usize) { _ };
                    ^~~~~~~~~~~~~~~~~

Idempotent - nice. I preferred the .idx though over having to type so many camelCasedThings, so of course, let's add some warts:

const CatCounter = enum(usize) {
    _,
    pub fn make(val: usize) CatCounter {
        return @enumFromInt(val);
    }
    fn idx(self: CatCounter) usize {
        return @intFromEnum(self);
    }
};
const DogLegIndex = enum(usize) {
    _,
    pub fn make(val: usize) DogLegIndex {
        return @enumFromInt(val);
    }
    fn idx(self: DogLegIndex) usize {
        return @intFromEnum(self);
    }
};
const some_cats = CatCounter.make(17);
var paw_index: DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

compiler error is still the same, of course, but now when I want to use the paw_index, I write paw_index.idx() instead of @intFromEnum(paw_index) - my preference is the former.

Fine, but that's a lot of copy-pasting. I'm not even sure I didnt' make a pasto above, so let's use a function to create those types for me. Also, let's use a smaller underlying type. I only want to address a few thousand things, not a ridiculous amount as in usize. Given we're writing this as a function now anyways, let's parametrize it:

fn MakeID(comptime t: type) type {
    return enum(t) {
        _,
        pub fn make(val: t) @This() {
            return @enumFromInt(val);
        }
        fn id(self: @This()) t {
            return @intFromEnum(self);
        }
    };
}

const CatCounter = MakeID(u12);
const DogLegIndex = MakeID(u16);

const some_cats = CatCounter.make(17);
var paw_index: DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

Compiler error's still here, of course, isn't it?

src/stage6.zig:19:17: error: expected type 'stage6.MakeID(u16)', found 'stage6.MakeID(u12)'
    paw_index = some_cats;
                ^~~~~~~~~
src/stage6.zig:2:12: note: enum declared here (2 times)
    return enum(t) {
           ^~~~

Nice... wait. MakeID(u16) != MakeID(u12) ?
This sounds like ... MakeID(u16) == MakeID(u16) ?

let's exchange the u12 for a u16 above, and try:

@@ -10,7 +10,7 @@
     };
 }

-const CatCounter = MakeID(u12);
+const CatCounter = MakeID(u16);
 const DogLegIndex = MakeID(u16);

 const some_cats = CatCounter.make(17);

Compiler error:

Yeah. There's no compiler error. Ooops, we're back at square one.

So, it seems the compiler postulates the comptime fn is referentially transparent and thus caches the result of these calls. So while enum(u16) { _ } != enum(u16) { _ }, MakeID(u16) == MakeID(u16) because we're looking at m = MakeID(u16); m == m.

So how to get my nice little compile error back?

Enter @squirl :

The compiler memoizes comptime calls, so type functions with the same arguments return the same type
You can fix it by just adding another argument that's a string or something (just pass the name of the type) and putting comptime { _ = type_name } inside the enum

This results in this solution:

fn MakeID(comptime t: type, comptime n: []const u8) type {
    return enum(t) {
        _,
        pub fn make(val: t) @This() {
            return @enumFromInt(val);
        }
        fn id(self: @This()) t {
            return @intFromEnum(self);
        }
        comptime {
            _ = n;
        }
    };
}

const CatCounter = MakeID(u16, "Cats");
const DogLegIndex = MakeID(u16, "Dogs");

const some_cats = CatCounter.make(17);
var paw_index: DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

and now we have our beautiful compiler error back!

src/stage8.zig:22:17: error: expected type 'stage8.MakeID(u16,"Dogs")', found 'stage8.MakeID(u16,"Cats")'
    paw_index = some_cats;
                ^~~~~~~~~
src/stage8.zig:2:12: note: enum declared here (2 times)
    return enum(t) {
           ^~~~

Frankly, I'd prefer it would use the names I've assigned to the type, but this'll do.

For what it's worth, I haven't decided which I like better: const IdxType = struct { idx: type }; or const IdyTxpe = MakeID(type, "IDY"); - they both use the same memory, the amount of typing is similar, they ought to be as efficient in any respect as far as I can tell (but correct me if I'm wrong - thanks in advance).

There's this little tidbit squirl also shared:

Oh, also, a neat trick if you need "nullable" handles: enum(u32) { invalid = std.math.maxInt(u32), _ }

So now you can initialize that handle with / compare to .invalid. This also works with different underlying types (and multiply associating .invalid with different values across different enum types, of course).

Thanks to squirl & Not no ones uncle for giving me directions. And this is probably not the end of the voyage as I bet one of you will show me an even better way in the comments, won't you?

Top comments (6)

Nairou • Mar 11 '24

Another possible tweak, if you are creating index types for existing structs and don't care about the index size, is to use the existing struct as the parameter.

fn MakeID(comptime t: type) type {
    return enum(usize) {
        _,
        pub fn make(val: usize) @This() {
            return @enumFromInt(val);
        }
        fn id(self: @This()) usize {
            return @intFromEnum(self);
        }
        comptime {
            _ = t;
        }
    };
}

const Cat = struct {
    // ...
};
const CatCounter = MakeID(Cat);

const DogLeg = struct {
    // ...
};
const DogLegIndex = MakeID(DogLeg);

const some_cats = CatCounter.make(17);
var paw_index: DogLegIndex = undefined;
pub fn main() !void {
    paw_index = some_cats;
}

Martin • Mar 12 '24

nice. I think I'd rather replace the string by the type and keep it at two params in my case. That way it would tie the ID to the type in code (which I like with your alteration). It's just I have many indices (hundreds of thousands) and I'd rather save space where I can.

I think I'd rather drop the actual type though, so MakeID(type, max_index) is my shower thoughts favorite so far (but I've other things to do than keep refactoring, hehe). MakeID would simply determine the necessary bits for max_index and synthesize the according enum.

GigaGrunch • Mar 16 '24 • Edited

I think the reason why the initial version doesn't work becomes very obvious when you take away the concrete values from those two variable definitions (which is exactly what they are). Would anyone think that a != b in this example?

const a = value;
const b = value;

Types are just regular values in Zig (at comptime) so replacing value with usize doesn't change anything. Let's go one step further and write a function:

fn isEqualToSelf(comptime T: type, value: T) bool {
    const a = value;
    const b = value;
    return a == b;
}

This should always return true, right? If const CatIndex = usize would be != to const DogIndex = usize than isEqualToSelf(type, usize) would have to return false.

I'm not saying that it wouldn't be cool to be able to do what you wanted to do there. But it cannot be by simply assigning the same value to two variables. Maybe there could be an extra keyword or comptime-function to do it 🤷

Edit: “No hidden control flow” is basically the first bullet point about Zig on the website. Creating a different value from a value based on its type sounds like hidden control flow to me.

Stéphane Bortzmeyer • Mar 6 '24

This raises a question: would it be better to modify the Zig language so that, in the very first example, CatCounter and DogLegIndex would be two different types (like it was the case, for instance, in Ada)?

Martin • Mar 6 '24

I think I'd feel better if the custom name stuck, i.e., if type assignments created distinct types. So the first example would always fail. I'm not sure about the implications throughout the language though. On one hand, I was surprised to see the types being equivalent, on the other hand, I kinda wasn't.
There's a disparity between enum/struct and naming basic types; I'm confident there's a good reason for it.
This isn't meant to be a "how zig could be better" criticism, though, I just documented my journey and options to actually make these types distinct.
The next thing is to actually wrap arrays so that if I were to try and index my paw array with a CatCounter, the compiler would complain...

Nairou • Mar 11 '24

I greatly appreciate seeing your journey through this issue, and the various options for solving it. I was facing this same question just yesterday!

Zig NEWS

A "distinct index types" journey

Top comments (6)

Read next

Implementing Closures and Monads in Zig

Don't `Self` Simple Structs!

Zig is now also a Windows resource compiler

Zig Package Manager 2 - WTF is Build.Zig.Zon and Build.Zig (0.11.0 Update)