Zig NEWS

Cover image for Anytype Antics
Luke
Luke

Posted on

Anytype Antics

Introduction

This post explores the use of anytype – a keyword to mark function parameters as generic. We'll look at how can we use it and if there are alternatives.

Duck Typing

The first and most idiomatic use of anytype is so called Duck Typing. This is used throughout Zig's standard library, especially with readers and writers. We'll use a toy example of a writer. Notice how each struct contains a function of the same signature, but executes different code.

// Toy example of "writers" that share the same "contract".
const FileWriter = struct {
    pub fn writeAll(_: FileWriter, bytes: []const u8) void { 
        debug.print("[FileWriter] {s};", .{bytes});
    }
};
const MultiWriter = struct {
    pub fn writeAll(_: MultiWriter, bytes: []const u8) void {
        debug.print("[MultiWriter] {s};", .{bytes});
    }
};
const NullWriter = struct {
    pub fn writeAll(_: NullWriter, _: []const u8) void {}
};
// This writer differs from the other, so could be said to have a different "contract".
const BadWriter = struct {
    pub fn writeAll(_: BadWriter, val: i32) void {
        debug.print("[BadWriter] {d};", .{val});
    }
};
Enter fullscreen mode Exit fullscreen mode

To write a function that accepts all three of these types, you would use anytype. That is essentially defining the accepted type to be the superset of all possible types.

const example_01_duck_typing = struct {
    fn save(writer: anytype, bytes: []const u8) void {
        writer.writeAll(bytes);
    }
...
Enter fullscreen mode Exit fullscreen mode

The save function will accept any type that has function declaration named writeAll and which takes a parameter of []const u8. This is all figured out at compile time too, no virtual tables needed here.

pub fn main() void {
    var file_writer = FileWriter{};
    save(file_writer, "a");
    var multi_writer = MultiWriter{};
    save(multi_writer, "b");
    var null_writer = NullWriter{};
    save(null_writer, "c");
}
Enter fullscreen mode Exit fullscreen mode

This outputs very sane assembly, that pretty much ends up looking like:

  1. call FileWriter.writeAll
  2. call MultiWriter.writeAll

Even the NullWriter.writeAll gets stripped out as it logically does no operation.

At this point it seems like similar behaviour to a dynamically typed language such as JavaScript. But if you're familiar with those you might also be familiar with the difficulty of debugging issues when you pass the wrong type into a generic function. Let's see what happens in Zig's case:

var bad_writer = BadWriter{};
save(bad_writer, "nope");
Enter fullscreen mode Exit fullscreen mode

Zig won't even compile the code! We get a useful error message if we try:

error: expected type 'i32', found '[]const u8'
        writer.writeAll(bytes);
                        ^~~~~
note: parameter type declared here
    pub fn writeAll(_: BadWriter, val: i32) void {
                                       ^~~
Enter fullscreen mode Exit fullscreen mode

The error and related note are telling us the writeAll function in BadWriter isn't satisfying the contract of a Writer.

The downside to duck typing is that given just the save() function, it is not immediately obvious what is the valid parameter space. We could either read the function body to see how the parameter is used (which could easily be buried and easy to miss), or we would rely on user documentation to express the requirements (which could easily become out of date), or we use trial and error with the compiler. Each of these are a compromise.

Traits

We could improve the readability of a generic function by using traits. These are simply functions from the standard library that can give us information about a type at compile time.

Lets define a new contract for the 'writer' parameter in the save function:

  • Must be a mutable pointer to a single struct.
  • Must have a function named writeAll.

We can express that with traits.

const trait = @import("std").meta.trai;
fn save(writer: anytype, bytes: []const u8) void {
        comptime {
            if (!trait.isPtrTo(.Struct)(@TypeOf(writer))) @compileError("Expects writer to be pointer type.");
            if (!trait.hasFn("writeAll")(@TypeOf(writer.*))) @compileError("Expects writer.* to have fn 'writeAll'.");
        }
        writer.writeAll(bytes);
    }
Enter fullscreen mode Exit fullscreen mode

If we make putting traits like these at the top of our generic functions a consistent pattern, it would read like part of the function signature. This all happens at compile-time too, so it doesn't affect our runtime performance, but will affect compiling duration.

Tagged Unions

Anytype is all well and good for a generic function – a function that can be applied generically across all types – but often in reality you only need it to operate on a strict subset of types. The subset can be defined via a tagged union.

const Writer = union(enum) {
     FileWriter: FileWriter,
     MultiWriter: MultiWriter,
     // Purposefully leave out NullWriter.
};
Enter fullscreen mode Exit fullscreen mode

Now the save function can be defined with a more explicit signature.

pub fn save(writer: Writer, bytes: []const u8) void {
    switch (writer) {
        inline else => |w| w.writeAll(bytes)
    }
}
Enter fullscreen mode Exit fullscreen mode

In order to access the method on Writer we now need to switch on it. The inline else => is some nice syntax to reduce boilerplate, at compile time this expands into:

    switch (writer) {
        .FileWriter => |w| w.writeAll(bytes),
        .MultiWriter => |w| w.writeAll(bytes)
    }
Enter fullscreen mode Exit fullscreen mode

If there were a bunch more prongs than just the two here, the assembly would become a jump-table. It is important to note that there are a couple more machine instructions executed with a tagged union approach rather than duck typing. For reference here is assembly emitted (-O ReleaseFast flag) for duck typing (notice it has created a function for each type):

example.save_duck_typing__anon_987:
        mov     edi, offset example.example_main__anon_986
        jmp     example.FileWriter.writeAll

example.save_duck_typing__anon_989:
        mov     edi, offset example.example_main__anon_988
        jmp     example.MultiWriter.writeAll
Enter fullscreen mode Exit fullscreen mode

Now for the tagged union version:

example.save_tagged_union:
        test    dil, 1
        je      .LBB3_1
        mov     rdi, rsi
        jmp     example.MultiWriter.writeAll
.LBB3_1:
        mov     rdi, rsi
        jmp     example.FileWriter.writeAll
Enter fullscreen mode Exit fullscreen mode

The callsite of the save function has become a bit more cumbersome, we end up having to duplicate type names in two places.

var file_writer = FileWriter{};
save(Writer{ .FileWriter = file_writer }, "a");
var multi_writer = MultiWriter{};
save(Writer{ .MultiWriter = multi_writer }, "b");
Enter fullscreen mode Exit fullscreen mode

But we can solve that with some comptime magic!

Comptime Tagged Unions

This is where comptime tagged unions will get us:

const Writer = Typeset(.{FileWriter, MultiWriter});
var file_writer = FileWriter{};
save(typeset(Writer, file_writer), "a");
var multi_writer = MultiWriter{};
save(typeset(Writer, multi_writer), "b");
Enter fullscreen mode Exit fullscreen mode

I've gone with the name typeset as it's basically what we're using unions for in this example, a set of types. To get there we first need to create a function that returns the Writer union type at compile time.

// There's a lot going on here,
// but it takes a tuple of types and turns it into a tagged union.
fn Typeset(comptime types: anytype) type {
    var enum_fields: [types.len]builtin.Type.EnumField = undefined;
    inline for (types, 0..) |T, i| {
        enum_fields[i] = .{ .name = @typeName(T), .value = i };
    }
    const Tag = @Type(.{
        .Enum = .{
            .tag_type = meta.Int(.unsigned, math.log2_int_ceil(u16, types.len)),
            .fields = &enum_fields,
            .decls = &[_]builtin.Type.Declaration{},
            .is_exhaustive = true,
        }
    });
    var union_fields: [types.len]builtin.Type.UnionField = undefined;
    inline for (types, 0..) |T, i| {
        union_fields[i] = .{ .name = @typeName(T), .type = T, .alignment = @alignOf(T) };
    }
    const U = @Type(.{
        .Union = .{
            .layout = .Auto,
            .tag_type = Tag,
            .fields = &union_fields,
            .decls = &[_]builtin.Type.Declaration{}
        }
    });
    return U;
}
Enter fullscreen mode Exit fullscreen mode

It is important to note that with this change, the 'tags' inside the union are now the full namespace of the types, which would be something like anytype.FileWriter. But we don't have to worry about those as we can have a helper function to initialise them too.

// Takes our Typeset type (i.e. Writer) and some data 
// (i.e. initialised FileWriter) and outputs an initialised
// union of that Typeset type.
fn typeset(comptime UnionT: type, data: anytype) UnionT {
    return @unionInit(UnionT, @typeName(@TypeOf(data)), data);
}
Enter fullscreen mode Exit fullscreen mode

Thanks to the magic of comptime, the output assembly is completely the same! Also, we still get useful compile errors if we attempt to use the wrong types. Attempting to call save on a NullWriter would throw this compile error:

error: "no field named 'anytype.NullWriter' in union..."

Conclusion

We've seen various ways to handle generic functions, and the pros and cons of that. In a future post we'll go over ways to handle generic functions dynamically at runtime.

Oldest comments (5)

Collapse
 
luizpbraga profile image
LuizZzZzz

Beautiful!!!

Collapse
 
jackji profile image
jack

Using first solution, love second and third ones. Last one seems a little overkill to me though. Still, thanks for sharing!

Collapse
 
nathanfranck profile image
Nathan Franck

I honestly love how you can build types with imperative/structured code. It's so easy to learn and implement - this is in contrast to in Typescript having to do a lot of weird custom syntax to glue types together and create derivative types from existing - Typescript's system works well, in that the tooling is really mature around it, but if/once ZLS is able to handle these more complicated usecases like your Tyepset (🙏), then zig will be just better in all respects for fancy types.

Collapse
 
htqx profile image
thanks you see

I don't know whether zig should use the trait syntax, which is a higher-level syntax than the type, and only describes the syntax of the interface, because the disadvantage is that the language becomes complicated. But the anytype function can actually be optimized. It should be able to generate the corresponding syntax and describe what interface functions are needed. The error prompt should not directly enter the function body, but should prompt the cause of the error and the prompt for correction at the client code.

Collapse
 
egoholic profile image
Volodymyr Melnyk

Thank you Luke! It's awesome publication. No water, only useful info.