Zig NEWS

KilianVounckx
KilianVounckx

Posted on

Zig Interfaces for the Uninitiated, an update

Way back in June 2020, Nathan Michaels published a post about how to do runtime polymophism (interfaces) in zig. However since then, the community has shifted from the @fieldParentPtr idiom to using fat pointers. This is now the idiom that the standard library uses, for example in allocator and rand. This post will cover the new idiom and how to use it. Just as in Nathan's original post, I will create a formal Iterator interface, which can be used like so:

while (iterator.next()) |val| {
    // do something with val
}
Enter fullscreen mode Exit fullscreen mode

Some notes before reading

Before reading the rest of this post, I highly recommend going through the standard library source code to look at the idiom yourself. See if you can understand it. If so, great, you don't need to read the rest. Some places to start are Allocator and Random
Another resource is [this one]https://revivalizer.xyz/post/the-missing-zig-polymorphism-reference/). This covers another use case (nodes for an expression calculator), and does things in a similar way.

The Iterator Interface

An iterator at it's simplest needs only one function. It should return the next value in the iterator, or null if the iterator is done. (One of the many reasons why optionals are awesome. Compare this for example to java, which doesn't have them. The equivalent Iterator interface has 3 methods to implement.)

The interface also needs some way to access the implementors' data, so we store a their pointer as well. We use *anyopaque, because we don't know the size and alignment of the implementors. We could use a usize as well and convert between pointer and integer every time. The standard library uses *anyopaque so that is what I will do here.

const Iterator = struct {
    const Self = @This();

    ptr: *anyopaque
    nextFn: fn(*anyopaque) ?u32,
};
Enter fullscreen mode Exit fullscreen mode

This will be the basis of our interface. Right now, to implement Iterator, you need a lot of knowledge about its internals. So let's create a helper initialization method.

pub fn init(ptr: anytype) Self {
    const Ptr = @TypeOf(ptr);
    const ptr_info = @typeInfo(Ptr);

    if (ptr_info != .Pointer) @compileError("ptr must be a pointer");
    if (ptr_info.Pointer.size != .One) @compileError("ptr must be a single item pointer");

    const alignment = ptr_info.Pointer.alignment;

    const gen = struct {
        pub fn nextImpl(pointer: *anyopaque) ?u32 {
            const self = @ptrCast(Ptr, @alignCast(alignment, pointer));

            return @call(.{.modifier=.always_inline}, ptr_info.Pointer.child.next, .{self});
        }
    };

    return .{
        .ptr = ptr,
        .nextFn = gen.nextImpl,
    };
}
Enter fullscreen mode Exit fullscreen mode

There is a lot going on in this new function, so let's break it down.

First of all, we check if ptr has the right type. It should be a single item pointer. If not, we give a compile error, so the implementor knows the problem. Afterwards we get the alignment. This is needed since we work with anyopaque, which can have any alignment. Since zig doesn't have anonymous functions yet we create gen to get its function afterwards. This will also help us later to more easily create a vtable.

Inside the implementation, we do two things. First, we cast the pointer to the correct type and alignment. Second, we call the underlying function. This is where I take a different approach than most of the standard library. In the standard library, the convention is to pass all needed methods to the init function. As far as I can see, this has two main benefits:
* It allows for data and functions to be seperated. Personally I can't think of an example for why you would do this, but the option is there.
* It allows for the methods in the implementor to be private, so users must call the method via the interface.
The first one is very rare in my experience. The second one is more useful, but I like my way more, because it asks less from the implementor. My way of doing things requires the prt_info.Pointer.child.next part, which gets the function from the implementor as well as give a user friendly compiler error in case the next function does not exist. Everything else is exactly the same as in the standard library examples. We inline the function for performance reasons since it just relays to another function call.

That was the biggest part. Afterwards, we just create the struct with the pointer to the data as well as the function.

We still need a way to call the next function so the last function we add to finish the interface is:

pub inline fn next(self: Self) ?u32 {
    return self.nextFn(self.ptr);
}
Enter fullscreen mode Exit fullscreen mode

Again, we inline the function for performance. We call the function on the stored pointer. The interface is now ready to use.

Implementing Iterator

On its own, the interface is pretty useless, so let's create a range iterator that iterates from a starting value to an end with an optional step. All it needs are 3 fields: (If you want to be able to reset it, or some other functionality, you can add some other fields. This is the bare minimum.)

const Range = struct {
    const Self = @This();

    start: u32 = 0,
    end: u32,
    step: u32 = 1,
};
Enter fullscreen mode Exit fullscreen mode

Of course it also needs an implementation of next:

pub fn next(self: *Self) ?u32 {
    if (self.start >= self.end) return null;
    const result = self.start;
    self.start += self.step;
    return result;
}
Enter fullscreen mode Exit fullscreen mode

That's all. If we want to create an iterator, just to Iterator.init(&range), where range is an instance of Range. To make peoples' live easier, let's follow the standard library conventions again and create a function to initialize the iterator inside Range itself:

pub fn iterator(self: *Self) Iterator {
    return Iterator.init(self);
}
Enter fullscreen mode Exit fullscreen mode

Now users can just do range.iterator() to create an iterator. Looks a lot like arena.allocator() doesn't it? It's exactly the same pattern.

To be good programmers let's create a test case before wrapping things up:

const std = @import("std");
test "Range" {
    var range = Range{ .end=5 };
    const iter = range.iterator();

    try std.testing.expectEqual(@as(?u32, 0), iter.next());
    try std.testing.expectEqual(@as(?u32, 1), iter.next());
    try std.testing.expectEqual(@as(?u32, 2), iter.next());
    try std.testing.expectEqual(@as(?u32, 3), iter.next());
    try std.testing.expectEqual(@as(?u32, 4), iter.next());
    try std.testing.expectEqual(@as(?u32, null), iter.next());
    try std.testing.expectEqual(@as(?u32, null), iter.next());
}
Enter fullscreen mode Exit fullscreen mode

This should now pass and give you an idea of how to use the interface.

Drawbacks

This pattern is really useful in some cases. However, before ending, I would like to point out a few drawbacks.

The first is in performance. This pattern can get really slow. It has to use follow a lot of pointers and function pointers to find its answer. Function pointers will always be slower than direct function calls. If you can, see if you can use something like a tagged union to implement something similar.

Secondly, a more subtle problem is that the original implementor has to live for at least as long as the interface it creates. This is because the interface stores a pointer, so if the implementor isn't alive anymore, the pointer is invalid. This means you can't return an interface you created in a function from a function:

fn thisWillCauseUndefinedBehaviour() Iterator {
    var range = Range{.end=10};
    return range.iterator();
}
Enter fullscreen mode Exit fullscreen mode

Of course, this dummy example will almost never occur in real code, but something similar could occur. You could solve this by passing an allocator and storing range on the heap. Make sure to free it afterwards however.

Conclusion

I hope you know have a better understanding on how to implement interfaces and how they work in the standard library. All the code is available on my github. (post.zig contains the code from this post. main.zig contains a lot more, like generics and way more implementors like map and filter.)

This is my first time writing, so any feedback on both the technical as the writing aspect are appreciated. Also English is not my native language, so feel free to correct me anywhere.

Discussion (4)

Collapse
guidorice profile image
Alex G Rice

Really nice post, thanks @kilianvounckx !

Newb Question: the linked post "the-missing-zig-polymorphism-reference" says:

You could also use anyopaque but it introduces alignment problems that only complicates the solution.

The zig language reference is a bit muddled about this, as it does say to use anyopaque for type erased pointers. But it also says anyopaque is for interop with C void pointers. (and it seems c_void was renamed to anyopaque at some time).

I would definitely lean towards simplify just using usize as in the linked post. But I'm probably not understanding the pros/cons?

Collapse
gowind profile image
Govind

Great article @kilianvounckx .

Question about the gen struct. We need it, because, as of now , function definitions are not expressions (so you can't write return { .nextFn = fn () {} }) ?
And it looks like the gen struct is created once for every invocation of the Interface's init.
Where is then, the nextImpl fn located. Is it in the stack (as it is returned as a part of the stack) or is it part of the .text section (as it is executable) and in which case gen is a struct with an embedded functionpointer ?

Collapse
guidorice profile image
Alex G Rice

Some examples of alternative solution w/ tagged unions here:
reddit.com/r/Zig/comments/st4ikn/r...

Collapse
david_vanderson profile image
David Vanderson

Great explanation!