Way back in June 2020, Nathan Michaels published a post about how to do runtime polymophism (interfaces) in zig. However since then, the community has shifted from the @fieldParentPtr
idiom to using fat pointers. This is now the idiom that the standard library uses, for example in allocator and rand. This post will cover the new idiom and how to use it. Just as in Nathan's original post, I will create a formal Iterator interface, which can be used like so:
while (iterator.next()) |val| {
// do something with val
}
Some notes before reading
Before reading the rest of this post, I highly recommend going through the standard library source code to look at the idiom yourself. See if you can understand it. If so, great, you don't need to read the rest. Some places to start are Allocator and Random
Another resource is [this one]https://revivalizer.xyz/post/the-missing-zig-polymorphism-reference/). This covers another use case (nodes for an expression calculator), and does things in a similar way.
The Iterator Interface
An iterator at it's simplest needs only one function. It should return the next value in the iterator, or null
if the iterator is done. (One of the many reasons why optionals are awesome. Compare this for example to java, which doesn't have them. The equivalent Iterator interface has 3 methods to implement.)
The interface also needs some way to access the implementors' data, so we store a their pointer as well. We use *anyopaque
, because we don't know the size and alignment of the implementors. We could use a usize
as well and convert between pointer and integer every time. The standard library uses *anyopaque
so that is what I will do here.
const Iterator = struct {
const Self = @This();
ptr: *anyopaque
nextFn: fn(*anyopaque) ?u32,
};
This will be the basis of our interface. Right now, to implement Iterator
, you need a lot of knowledge about its internals. So let's create a helper initialization method.
pub fn init(ptr: anytype) Self {
const Ptr = @TypeOf(ptr);
const ptr_info = @typeInfo(Ptr);
if (ptr_info != .Pointer) @compileError("ptr must be a pointer");
if (ptr_info.Pointer.size != .One) @compileError("ptr must be a single item pointer");
const alignment = ptr_info.Pointer.alignment;
const gen = struct {
pub fn nextImpl(pointer: *anyopaque) ?u32 {
const self = @ptrCast(Ptr, @alignCast(alignment, pointer));
return @call(.{.modifier=.always_inline}, ptr_info.Pointer.child.next, .{self});
}
};
return .{
.ptr = ptr,
.nextFn = gen.nextImpl,
};
}
There is a lot going on in this new function, so let's break it down.
First of all, we check if ptr
has the right type. It should be a single item pointer. If not, we give a compile error, so the implementor knows the problem. Afterwards we get the alignment. This is needed since we work with anyopaque
, which can have any alignment. Since zig doesn't have anonymous functions yet we create gen
to get its function afterwards. This will also help us later to more easily create a vtable.
Inside the implementation, we do two things. First, we cast the pointer to the correct type and alignment. Second, we call the underlying function. This is where I take a different approach than most of the standard library. In the standard library, the convention is to pass all needed methods to the init function. As far as I can see, this has two main benefits:
* It allows for data and functions to be seperated. Personally I can't think of an example for why you would do this, but the option is there.
* It allows for the methods in the implementor to be private, so users must call the method via the interface.
The first one is very rare in my experience. The second one is more useful, but I like my way more, because it asks less from the implementor. My way of doing things requires the prt_info.Pointer.child.next
part, which gets the function from the implementor as well as give a user friendly compiler error in case the next
function does not exist. Everything else is exactly the same as in the standard library examples. We inline the function for performance reasons since it just relays to another function call.
That was the biggest part. Afterwards, we just create the struct with the pointer to the data as well as the function.
We still need a way to call the next function so the last function we add to finish the interface is:
pub inline fn next(self: Self) ?u32 {
return self.nextFn(self.ptr);
}
Again, we inline the function for performance. We call the function on the stored pointer. The interface is now ready to use.
Implementing Iterator
On its own, the interface is pretty useless, so let's create a range iterator that iterates from a starting value to an end with an optional step. All it needs are 3 fields: (If you want to be able to reset it, or some other functionality, you can add some other fields. This is the bare minimum.)
const Range = struct {
const Self = @This();
start: u32 = 0,
end: u32,
step: u32 = 1,
};
Of course it also needs an implementation of next
:
pub fn next(self: *Self) ?u32 {
if (self.start >= self.end) return null;
const result = self.start;
self.start += self.step;
return result;
}
That's all. If we want to create an iterator, just to Iterator.init(&range)
, where range is an instance of Range. To make peoples' live easier, let's follow the standard library conventions again and create a function to initialize the iterator inside Range itself:
pub fn iterator(self: *Self) Iterator {
return Iterator.init(self);
}
Now users can just do range.iterator()
to create an iterator. Looks a lot like arena.allocator()
doesn't it? It's exactly the same pattern.
To be good programmers let's create a test case before wrapping things up:
const std = @import("std");
test "Range" {
var range = Range{ .end=5 };
const iter = range.iterator();
try std.testing.expectEqual(@as(?u32, 0), iter.next());
try std.testing.expectEqual(@as(?u32, 1), iter.next());
try std.testing.expectEqual(@as(?u32, 2), iter.next());
try std.testing.expectEqual(@as(?u32, 3), iter.next());
try std.testing.expectEqual(@as(?u32, 4), iter.next());
try std.testing.expectEqual(@as(?u32, null), iter.next());
try std.testing.expectEqual(@as(?u32, null), iter.next());
}
This should now pass and give you an idea of how to use the interface.
Drawbacks
This pattern is really useful in some cases. However, before ending, I would like to point out a few drawbacks.
The first is in performance. This pattern can get really slow. It has to use follow a lot of pointers and function pointers to find its answer. Function pointers will always be slower than direct function calls. If you can, see if you can use something like a tagged union to implement something similar.
Secondly, a more subtle problem is that the original implementor has to live for at least as long as the interface it creates. This is because the interface stores a pointer, so if the implementor isn't alive anymore, the pointer is invalid. This means you can't return an interface you created in a function from a function:
fn thisWillCauseUndefinedBehaviour() Iterator {
var range = Range{.end=10};
return range.iterator();
}
Of course, this dummy example will almost never occur in real code, but something similar could occur. You could solve this by passing an allocator and storing range
on the heap. Make sure to free it afterwards however.
Conclusion
I hope you know have a better understanding on how to implement interfaces and how they work in the standard library. All the code is available on my github. (post.zig contains the code from this post. main.zig contains a lot more, like generics and way more implementors like map and filter.)
This is my first time writing, so any feedback on both the technical as the writing aspect are appreciated. Also English is not my native language, so feel free to correct me anywhere.
Oldest comments (7)
Great explanation!
Really nice post, thanks @kilianvounckx !
Newb Question: the linked post "the-missing-zig-polymorphism-reference" says:
The zig language reference is a bit muddled about this, as it does say to use anyopaque for type erased pointers. But it also says anyopaque is for interop with C void pointers. (and it seems c_void was renamed to anyopaque at some time).
I would definitely lean towards simplify just using
usize
as in the linked post. But I'm probably not understanding the pros/cons?Some examples of alternative solution w/ tagged unions here:
reddit.com/r/Zig/comments/st4ikn/r...
Great article @kilianvounckx .
Question about the
gen
struct. We need it, because, as of now , function definitions are not expressions (so you can't writereturn { .nextFn = fn () {} }
) ?And it looks like the
gen
struct is created once for every invocation of the Interface'sinit
.Where is then, the
nextImpl
fn located. Is it in the stack (as it is returned as a part of the stack) or is it part of the.text
section (as it is executable) and in which casegen
is a struct with an embedded functionpointer ?The
gen
struct has size zero so it is just an abstraction in the source code ... it doesn't exist in the world anywhere and is never "created".The
nextImpl
function is, like all code, located in the text section. Note that, since it is generic, there will be multiple copies of the code, each calling a differentptr_info.Pointer.child.next
function.No, structs don't contain pointers to their member functions, in Zig or any other language ... there's no need for that. The struct is just a namespace for its methods.
There is a function pointer to
nextImpl
, but that's thenextFn
field of theIterator
struct:.nextFn = gen.nextImpl,
Just noticed this when I looked at analytics. Turns out my Zig articles are a significant chunk of traffic recently. Anyway, I updated the page to send people here.
You say that but I did when I used Random for the first time.