Zig NEWS

Cover image for Zig Union(Enum) -- WTF is switch(union(enum))
Ed Yu
Ed Yu

Posted on • Updated on

Zig Union(Enum) -- WTF is switch(union(enum))

The power and complexity of Union(Enum) in Zig


Ed Yu (@edyu on Github and
@edyu on Twitter)
Jun.13.2023


Zig Logo

Introduction

Zig is a modern system programming language and although it claims to a be a better C, many people who initially didn't
need system programming were attracted to it due to the simplicity of its syntax compared to alternatives such as C++ or Rust.

However, due to the power of the language, some of the syntax are not obvious for those first coming into the language. I was actually one such person.

One of my favorite languages is Haskell and if you ever thought that you prefer a typed language you owe it to yourself to learn Haskell at least once so you can appreciate how many other languages "borrowed" their type systems from it. I can promise you that you'll come out a better programmer.

ADT

One of the most widely used features and the underlying foundation of the Haskell type system is the ADT or Algebraic Data Types (not to be confused with Abstract Data Types).

You can look up the difference on StackOverflow.

However, for us programmers, you can just think of Abstract Data Types as either a struct or a simple class (simple as in not nested).

For ADT or Algebraic Data Types, we need to have access to union for those that has experienced it before in languages that provide such construct such as C or in our case Zig.

Note: In order for ADT to be called Algebraic, it needs to support both sum and product.
Sum means that the type needs to support A or B but not both together, whereas product means the type needs to support A and B together.

Why do we care?

The main reason for ADT to exist is so that you can express the concept of a type that can be in multiple states or forms. In other words, you can say that an object of that type can be either this or that, or something else.

For example, for a typical tree structure, you can say a tree node is either a leaf or a node that contains either other nodes or a leaf.

Another example would be that for a linked list, you can say that the list is formed recursively by a node that points to either another node or by the end of the list.

However, to show how we can use ADT in Zig, we have to explain some other concepts first.

Zig Struct

The foundation of data types in Zig is the struct.
In fact, it's pretty much everywhere in Zig.

The struct in Zig is probably the closest thing to a class in most object-oriented programming languages.

Here is the basic idea:

// if you want to try yourself, you must import `std`
const std = @import("std");

// let's construct a binary tree node
const BinaryTree = struct {
    // a binary tree has a left subtree and a right subtree
    left: ?*BinaryTree,
    // for simplicity, let's just say we have an unsigned 32-bit integer value
    value: u32,
    right: ?*BinaryTree,
};

const tree = BinaryTree{ .left = null, .value = 42, .right = null };
Enter fullscreen mode Exit fullscreen mode

There are several things of note here in the code above:

  1. If you are not familiar with ?, you are welcome to look over Zig If - WTF.
    It basically means that the variable can either have a value of the type after ? or if it doesn't then it will take on a value of null.

  2. We are referring to the BinaryTree type inside the BinaryTree type definition as a tree is a recursive structure.

  3. However, you must use the * to denote that left and right are pointers to another BinaryTree struct. If you leave out the pointer then the compiler will complain because then the size of BinaryTree is dynamic as it can grow to be arbitrarily big as we add more sub-trees.

The following code will show a slightly more complex tree structure.
Note that we have to use & in order to get the pointer of the BinaryTree struct.

    var left = BinaryTree{ .left = null, .value = 21, .right = null };
    var far_right = BinaryTree{ .left = null, .value = 168, .right = null };
    var right = BinaryTree{ .left = null, .value = 84, .right = &far_right };

    const tree2 = BinaryTree{ .left = &left, .value = 42, .right = &right };
Enter fullscreen mode Exit fullscreen mode

Zig Enum

Sometimes, a struct is an overkill if you just want to have a set of possible values for a variable to take and restrict the variable to take only a value from the set. Usually, we would use enum for such a use case.

// sorry if I left our your favorite pet
const Pet = enum { Dog, Cat, Fish, Iguana, Platypus };

const fav: Pet = .Cat;

// Each of the value of an enum is called a tag
std.debug.print("Ed's favorite pet is {s}.\n", .{@tagName(Pet.Cat)});

// you can specify what type and what value the enum takes
const Binary = enum(u1) { Zero = 0, One = 1 };

std.debug.print("There are {d}{d} types of people in this world, those understand binary and those who don't.\n", .{
    @intFromEnum(Binary.One),
    @intFromEnum(Binary.Zero)
});
Enter fullscreen mode Exit fullscreen mode

Switch on Enum

One of the most convenient constructs for an enum is the switch expression.
In Haskell, the reason ADT is so useful is the ability to pattern match on the switch expression. In fact, Haskell, function definition is basically a super-charged switch statement.

So how do we use switch statement in Zig?

const fav: Pet = .Cat;

std.debug.print("{s} is ", .{@tagName(fav)});
switch (fav) {
    .Dog => std.debug.print("needy!\n", .{}),
    .Cat => std.debug.print("perfect!\n", .{}),
    .Fish => std.debug.print("so much work!\n", .{}),
    .Iguana => std.debug.print("not tasty!\n", .{}),
    else => std.debug.print("legal?\n", .{}),
}

const score = switch (fav) {
    .Dog => 50,
    .Cat => 100,
    .Fish => 25,
    .Iguana => 15,
    else => 75,
};
Enter fullscreen mode Exit fullscreen mode

Union

In C and in Zig, union is similar to struct, except that instead of the structure having all the fields, only one of the fields of the union is
active. For those familiar with C union, please be aware that Zig union cannot be used to reinterpret memory. So in other words, you cannot use one field of the union to cast the value defined by another field type.

const Value = union {
    int: i32,
    float: f64,
    string: []const u8,
};

var value = Value{ .int = 42 };
// you can't do this
var fval = value.float;
std.debug.print("{d}\n", .{fval});

// you can't do this, either
var bval = value.string;
std.debug.print("{c}\n", .{bval[0]});
Enter fullscreen mode Exit fullscreen mode

Switch on Union

Well, you cannot use switch on union; at least not on simple union.

// won't compile
switch (value) {
    .int => std.debug.print("value is int={d}\n", .{value.int}),
    .float => std.debug.print("value is float={d}\n", .{value.float}),
    .string => std.debug.print("value is string={s}!\n", .{value.string}),
}
Enter fullscreen mode Exit fullscreen mode

Union(Enum) is Tagged Union

The error message on the previous example will actual say:
note: consider 'union(enum)' here.

The Zig nomenclature for union(enum) is actually called tagged union.
As we mentioned earlier, the individual fields of an enum are called tags.

Tagged union was created so that they can be used in switch expressions.

// first define the tags
const ValueType = enum {
    int,
    float,
    string,
    unknown,
};

// not too different from simple union
const Value = union(ValueType) {
    int: i32,
    float: f64,
    string: []const u8,
    unknown: void,
};

// just like the simple union
var value = Value{ .float = 42.21 };

switch (value) {
    .int => std.debug.print("value is int={d}\n", .{value.int}),
    .float => std.debug.print("value is float={d}\n", .{value.float}),
    .string => std.debug.print("value is string={s}\n", .{value.string}),
    else => std.debug.print("value is unknown!\n", .{}),
}
Enter fullscreen mode Exit fullscreen mode

Capture Tagged Union Value

You can use the capture in the switch expression if you need to access the value.

switch (value) {
    .int => |v| std.debug.print("value is int={d}\n", .{v}),
    .float => |v| std.debug.print("value is float={d}\n", .{v}),
    .string => |v| std.debug.print("value is string={s}\n", .{v}),
    else => std.debug.print("value is unknown!\n", .{}),
}
Enter fullscreen mode Exit fullscreen mode

Modify Tagged Union

If you need to modify the value, you have to use convert the value to a pointer in the capture using *.

switch (value) {
    .int => |*v| v.* += 1,
    .float => |*v| v.* ^= 2,
    .string => |*v| v.* = "I'm not Ed",
    else => std.debug.print("value is unknown!\n", .{}),
}
Enter fullscreen mode Exit fullscreen mode

Tagged Union as ADT

We now have everything we need to implement Zig version of ADT.
What makes ADT useful is that not only it will tell you the state but also the context of the state.

Using Zig for instance, the active tag in a union will tell you the state, and if the tag is a type that has a value, then the value is the context.

// this example is fairly involved, please see the full code on github
// You can find the code at https://github.com/edyu/wtf-zig-adt/blob/master/testadt.zig
const NodeType = enum {
    tip,
    node,
};

const Tip = struct {};

const Node = struct {
    left: *const Tree,
    value: u32,
    right: *const Tree,
};

const Tree = union(NodeType) {
    tip: Tip,
    node: *const Node,
}

const leaf = Tip{};

// this is meant to reimplement the binary tree example on https://wiki.haskell.org/Algebraic_data_type
// if you call tree.toString(), it will print out:
// Node (Node (Node (Tip 1 Tip) 3 Node (Tip 4 Tip)) 5 Node (Tip 7 Tip))
const tree = Tree{ .node = &Node{
    .left = &Tree{ .node = &Node{
        .left = &Tree{ .node = &Node{
            .left = &Tree{ .tip = leaf },
            .value = 1,
            .right = &Tree{ .tip = leaf } } },
        .value = 3,
        .right = &Tree{ .node = &Node{
            .left = &Tree{ .tip = leaf },
            .value = 4,
            .right = &Tree{ .tip = leaf } } } } },
    .value = 5,
    .right = &Tree{ .node = &Node{
        .left = &Tree{ .tip = leaf },
        .value = 7,
        .right = &Tree{ .tip = leaf } } } } };
// see the full example on github
Enter fullscreen mode Exit fullscreen mode

Bonus

In Zig, there is also something called non-exhaustive enum.

Non-exhaustive enum must be defined with an integer tag type in the ().
You then put _ as the last tag in the enum definition.

Instead of else, you can use _ to ensure you handled all the cases in the switch expression.

const Eds = enum(u8) {
    Ed,
    Edward,
    Edmond,
    Eduardo,
    Edwin,
    Eddy,
    Eddie,
    _,
};

const ed = Eds.Ed;

std.debug.print("All your code are belong to ", .{});
switch (ed) {
    // Zig switch uses , not | for multiple options
    .Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
    // can use capture
    .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
    // else works but look at the code below for _ vs else
    else => std.debug.print("us\n", .{}),
}

// obviously no such enum predefined
const not_ed = @as(Eds, @enumFromInt(Eds, 241));
std.debug.print("All your base are belong to ", .{});
switch (not_ed) {
    .Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
    .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
    // _ will force you to handle all defined cases
    // if any of the previous .Ed, .Edward ... .Eddie is missing, this won't compile
    // for example, if you forgot .Edurdo
    // and wrote: .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => ...
    // the code won't compile
    _ => std.debug.print("us\n", .{}),
}
Enter fullscreen mode Exit fullscreen mode

Btw, you can add function to enum, union, union(enum) just like you can in struct.
You can see examples of that in the code below.

The End

You can find the code here.

Zig Logo

Top comments (4)

Collapse
 
jnordwick profile image
Jason Nordwick

"union is similar to struct"

if by similar you mean pretty close to the exact opposite then yeah you're right

Haskellers always seem to be trying to win the old war that they lost and now they might every language to be as bad as theirs with the least elucidating descriptions as they could possibly create.

Somebody's been doing this for more than 20 years with my degree from Berkeley I still look at that and go huh?

More practically I wonder can you have the same type twice - useful for auto-generated code or also just so you don't have to care about what you're unioning? DO tag unions stacked with tagged unions wind up wasting a lot of space with the tag plus padding for each level? Do arrays of tagged unions waste a ton of space for the tag plus padding? Can you set the size of the tag? You generally don't want it to be a u8 because that will have Loop carry dependencies on some of the of the x86 registers. Rust made that mistake.

I was writing a db connection once and I end up having third my space taken up in tags and padding for atoms when I have three or four levels of tag unions so I end up having to rip all that out and just duplicate all my code which really sucked.

Collapse
 
edyu profile image
Ed Yu

FYI, the new 0.11 changed @intToEnum() to @enumFromInt(), and instead of @intToEmum(), you need to use @as(your_enum, @enumFromInt(some_int)).

Collapse
 
segfault profile image
Sam

When you go from the union example to the tagged union example, why did you add the unknown: void field? Was this necessary for the example of tagged union, or could it be omitted?

Collapse
 
edyu profile image
Ed Yu

It was not necessary. It was meant as a way to show how you can also use void as a type for the union. You can certainly omit it.

The idea is to show that union can take a list of different types as it's a union of all the types defined in the union.