Introduction
This post explores the use of anytype
– a keyword to mark function parameters as generic. We'll look at how can we use it and if there are alternatives.
Duck Typing
The first and most idiomatic use of anytype
is so called Duck Typing. This is used throughout Zig's standard library, especially with readers and writers. We'll use a toy example of a writer. Notice how each struct contains a function of the same signature, but executes different code.
// Toy example of "writers" that share the same "contract".
const FileWriter = struct {
pub fn writeAll(_: FileWriter, bytes: []const u8) void {
debug.print("[FileWriter] {s};", .{bytes});
}
};
const MultiWriter = struct {
pub fn writeAll(_: MultiWriter, bytes: []const u8) void {
debug.print("[MultiWriter] {s};", .{bytes});
}
};
const NullWriter = struct {
pub fn writeAll(_: NullWriter, _: []const u8) void {}
};
// This writer differs from the other, so could be said to have a different "contract".
const BadWriter = struct {
pub fn writeAll(_: BadWriter, val: i32) void {
debug.print("[BadWriter] {d};", .{val});
}
};
To write a function that accepts all three of these types, you would use anytype
. That is essentially defining the accepted type to be the superset of all possible types.
const example_01_duck_typing = struct {
fn save(writer: anytype, bytes: []const u8) void {
writer.writeAll(bytes);
}
...
The save
function will accept any type that has function declaration named writeAll
and which takes a parameter of []const u8
. This is all figured out at compile time too, no virtual tables needed here.
pub fn main() void {
var file_writer = FileWriter{};
save(file_writer, "a");
var multi_writer = MultiWriter{};
save(multi_writer, "b");
var null_writer = NullWriter{};
save(null_writer, "c");
}
This outputs very sane assembly, that pretty much ends up looking like:
- call FileWriter.writeAll
- call MultiWriter.writeAll
Even the NullWriter.writeAll gets stripped out as it logically does no operation.
At this point it seems like similar behaviour to a dynamically typed language such as JavaScript. But if you're familiar with those you might also be familiar with the difficulty of debugging issues when you pass the wrong type into a generic function. Let's see what happens in Zig's case:
var bad_writer = BadWriter{};
save(bad_writer, "nope");
Zig won't even compile the code! We get a useful error message if we try:
error: expected type 'i32', found '[]const u8'
writer.writeAll(bytes);
^~~~~
note: parameter type declared here
pub fn writeAll(_: BadWriter, val: i32) void {
^~~
The error and related note are telling us the writeAll
function in BadWriter
isn't satisfying the contract of a Writer.
The downside to duck typing is that given just the save()
function, it is not immediately obvious what is the valid parameter space. We could either read the function body to see how the parameter is used (which could easily be buried and easy to miss), or we would rely on user documentation to express the requirements (which could easily become out of date), or we use trial and error with the compiler. Each of these are a compromise.
Traits
We could improve the readability of a generic function by using traits. These are simply functions from the standard library that can give us information about a type at compile time.
Lets define a new contract for the 'writer' parameter in the save function:
- Must be a mutable pointer to a single struct.
- Must have a function named writeAll.
We can express that with traits.
const trait = @import("std").meta.trai;
fn save(writer: anytype, bytes: []const u8) void {
comptime {
if (!trait.isPtrTo(.Struct)(@TypeOf(writer))) @compileError("Expects writer to be pointer type.");
if (!trait.hasFn("writeAll")(@TypeOf(writer.*))) @compileError("Expects writer.* to have fn 'writeAll'.");
}
writer.writeAll(bytes);
}
If we make putting traits like these at the top of our generic functions a consistent pattern, it would read like part of the function signature. This all happens at compile-time too, so it doesn't affect our runtime performance, but will affect compiling duration.
Tagged Unions
Anytype is all well and good for a generic function – a function that can be applied generically across all types – but often in reality you only need it to operate on a strict subset of types. The subset can be defined via a tagged union.
const Writer = union(enum) {
FileWriter: FileWriter,
MultiWriter: MultiWriter,
// Purposefully leave out NullWriter.
};
Now the save function can be defined with a more explicit signature.
pub fn save(writer: Writer, bytes: []const u8) void {
switch (writer) {
inline else => |w| w.writeAll(bytes)
}
}
In order to access the method on Writer we now need to switch on it. The inline else =>
is some nice syntax to reduce boilerplate, at compile time this expands into:
switch (writer) {
.FileWriter => |w| w.writeAll(bytes),
.MultiWriter => |w| w.writeAll(bytes)
}
If there were a bunch more prongs than just the two here, the assembly would become a jump-table. It is important to note that there are a couple more machine instructions executed with a tagged union approach rather than duck typing. For reference here is assembly emitted (-O ReleaseFast
flag) for duck typing (notice it has created a function for each type):
example.save_duck_typing__anon_987:
mov edi, offset example.example_main__anon_986
jmp example.FileWriter.writeAll
example.save_duck_typing__anon_989:
mov edi, offset example.example_main__anon_988
jmp example.MultiWriter.writeAll
Now for the tagged union version:
example.save_tagged_union:
test dil, 1
je .LBB3_1
mov rdi, rsi
jmp example.MultiWriter.writeAll
.LBB3_1:
mov rdi, rsi
jmp example.FileWriter.writeAll
The callsite of the save function has become a bit more cumbersome, we end up having to duplicate type names in two places.
var file_writer = FileWriter{};
save(Writer{ .FileWriter = file_writer }, "a");
var multi_writer = MultiWriter{};
save(Writer{ .MultiWriter = multi_writer }, "b");
But we can solve that with some comptime magic!
Comptime Tagged Unions
This is where comptime tagged unions will get us:
const Writer = Typeset(.{FileWriter, MultiWriter});
var file_writer = FileWriter{};
save(typeset(Writer, file_writer), "a");
var multi_writer = MultiWriter{};
save(typeset(Writer, multi_writer), "b");
I've gone with the name typeset
as it's basically what we're using unions for in this example, a set of types. To get there we first need to create a function that returns the Writer union type at compile time.
// There's a lot going on here,
// but it takes a tuple of types and turns it into a tagged union.
fn Typeset(comptime types: anytype) type {
var enum_fields: [types.len]builtin.Type.EnumField = undefined;
inline for (types, 0..) |T, i| {
enum_fields[i] = .{ .name = @typeName(T), .value = i };
}
const Tag = @Type(.{
.Enum = .{
.tag_type = meta.Int(.unsigned, math.log2_int_ceil(u16, types.len)),
.fields = &enum_fields,
.decls = &[_]builtin.Type.Declaration{},
.is_exhaustive = true,
}
});
var union_fields: [types.len]builtin.Type.UnionField = undefined;
inline for (types, 0..) |T, i| {
union_fields[i] = .{ .name = @typeName(T), .type = T, .alignment = @alignOf(T) };
}
const U = @Type(.{
.Union = .{
.layout = .Auto,
.tag_type = Tag,
.fields = &union_fields,
.decls = &[_]builtin.Type.Declaration{}
}
});
return U;
}
It is important to note that with this change, the 'tags' inside the union are now the full namespace of the types, which would be something like anytype.FileWriter
. But we don't have to worry about those as we can have a helper function to initialise them too.
// Takes our Typeset type (i.e. Writer) and some data
// (i.e. initialised FileWriter) and outputs an initialised
// union of that Typeset type.
fn typeset(comptime UnionT: type, data: anytype) UnionT {
return @unionInit(UnionT, @typeName(@TypeOf(data)), data);
}
Thanks to the magic of comptime, the output assembly is completely the same! Also, we still get useful compile errors if we attempt to use the wrong types. Attempting to call save on a NullWriter
would throw this compile error:
error: "no field named 'anytype.NullWriter' in union..."
Conclusion
We've seen various ways to handle generic functions, and the pros and cons of that. In a future post we'll go over ways to handle generic functions dynamically at runtime.
Oldest comments (5)
Beautiful!!!
Using first solution, love second and third ones. Last one seems a little overkill to me though. Still, thanks for sharing!
I honestly love how you can build types with imperative/structured code. It's so easy to learn and implement - this is in contrast to in Typescript having to do a lot of weird custom syntax to glue types together and create derivative types from existing - Typescript's system works well, in that the tooling is really mature around it, but if/once ZLS is able to handle these more complicated usecases like your Tyepset (🙏), then zig will be just better in all respects for fancy types.
I don't know whether zig should use the trait syntax, which is a higher-level syntax than the type, and only describes the syntax of the interface, because the disadvantage is that the language becomes complicated. But the anytype function can actually be optimized. It should be able to generate the corresponding syntax and describe what interface functions are needed. The error prompt should not directly enter the function body, but should prompt the cause of the error and the prompt for correction at the client code.
Thank you Luke! It's awesome publication. No water, only useful info.