The power and complexity of Union(Enum) in Zig
Ed Yu (@edyu on Github and
@edyu on Twitter)
Jun.13.2023
Introduction
Zig is a modern system programming language and although it claims to a be a better C, many people who initially didn't
need system programming were attracted to it due to the simplicity of its syntax compared to alternatives such as C++ or Rust.
However, due to the power of the language, some of the syntax are not obvious for those first coming into the language. I was actually one such person.
One of my favorite languages is Haskell and if you ever thought that you prefer a typed language you owe it to yourself to learn Haskell at least once so you can appreciate how many other languages "borrowed" their type systems from it. I can promise you that you'll come out a better programmer.
ADT
One of the most widely used features and the underlying foundation of the Haskell type system is the ADT or Algebraic Data Types (not to be confused with Abstract Data Types).
You can look up the difference on StackOverflow.
However, for us programmers, you can just think of Abstract Data Types as either a struct or a simple class (simple as in not nested).
For ADT or Algebraic Data Types, we need to have access to union for those that has experienced it before in languages that provide such construct such as C or in our case Zig.
Note: In order for ADT to be called Algebraic, it needs to support both sum and product.
Sum means that the type needs to support A or B but not both together, whereas product means the type needs to support A and B together.
Why do we care?
The main reason for ADT to exist is so that you can express the concept of a type that can be in multiple states or forms. In other words, you can say that an object of that type can be either this or that, or something else.
For example, for a typical tree structure, you can say a tree node is either a leaf or a node that contains either other nodes or a leaf.
Another example would be that for a linked list, you can say that the list is formed recursively by a node that points to either another node or by the end of the list.
However, to show how we can use ADT in Zig, we have to explain some other concepts first.
Zig Struct
The foundation of data types in Zig is the struct.
In fact, it's pretty much everywhere in Zig.
The struct in Zig is probably the closest thing to a class in most object-oriented programming languages.
Here is the basic idea:
// if you want to try yourself, you must import `std`
const std = @import("std");
// let's construct a binary tree node
const BinaryTree = struct {
// a binary tree has a left subtree and a right subtree
left: ?*BinaryTree,
// for simplicity, let's just say we have an unsigned 32-bit integer value
value: u32,
right: ?*BinaryTree,
};
const tree = BinaryTree{ .left = null, .value = 42, .right = null };
There are several things of note here in the code above:
If you are not familiar with
?
, you are welcome to look over Zig If - WTF.
It basically means that the variable can either have a value of the type after?
or if it doesn't then it will take on a value ofnull
.We are referring to the BinaryTree type inside the BinaryTree type definition as a tree is a recursive structure.
However, you must use the
*
to denote that left and right are pointers to another BinaryTree struct. If you leave out the pointer then the compiler will complain because then the size of BinaryTree is dynamic as it can grow to be arbitrarily big as we add more sub-trees.
The following code will show a slightly more complex tree structure.
Note that we have to use &
in order to get the pointer of the BinaryTree struct.
var left = BinaryTree{ .left = null, .value = 21, .right = null };
var far_right = BinaryTree{ .left = null, .value = 168, .right = null };
var right = BinaryTree{ .left = null, .value = 84, .right = &far_right };
const tree2 = BinaryTree{ .left = &left, .value = 42, .right = &right };
Zig Enum
Sometimes, a struct is an overkill if you just want to have a set of possible values for a variable to take and restrict the variable to take only a value from the set. Usually, we would use enum for such a use case.
// sorry if I left our your favorite pet
const Pet = enum { Dog, Cat, Fish, Iguana, Platypus };
const fav: Pet = .Cat;
// Each of the value of an enum is called a tag
std.debug.print("Ed's favorite pet is {s}.\n", .{@tagName(Pet.Cat)});
// you can specify what type and what value the enum takes
const Binary = enum(u1) { Zero = 0, One = 1 };
std.debug.print("There are {d}{d} types of people in this world, those understand binary and those who don't.\n", .{
@intFromEnum(Binary.One),
@intFromEnum(Binary.Zero)
});
Switch on Enum
One of the most convenient constructs for an enum is the switch expression.
In Haskell, the reason ADT is so useful is the ability to pattern match on the switch expression. In fact, Haskell, function definition is basically a super-charged switch statement.
So how do we use switch statement in Zig?
const fav: Pet = .Cat;
std.debug.print("{s} is ", .{@tagName(fav)});
switch (fav) {
.Dog => std.debug.print("needy!\n", .{}),
.Cat => std.debug.print("perfect!\n", .{}),
.Fish => std.debug.print("so much work!\n", .{}),
.Iguana => std.debug.print("not tasty!\n", .{}),
else => std.debug.print("legal?\n", .{}),
}
const score = switch (fav) {
.Dog => 50,
.Cat => 100,
.Fish => 25,
.Iguana => 15,
else => 75,
};
Union
In C and in Zig, union is similar to struct, except that instead of the structure having all the fields, only one of the fields of the union is
active. For those familiar with C union, please be aware that Zig union cannot be used to reinterpret memory. So in other words, you cannot use one field of the union to cast the value defined by another field type.
const Value = union {
int: i32,
float: f64,
string: []const u8,
};
var value = Value{ .int = 42 };
// you can't do this
var fval = value.float;
std.debug.print("{d}\n", .{fval});
// you can't do this, either
var bval = value.string;
std.debug.print("{c}\n", .{bval[0]});
Switch on Union
Well, you cannot use switch on union; at least not on simple union.
// won't compile
switch (value) {
.int => std.debug.print("value is int={d}\n", .{value.int}),
.float => std.debug.print("value is float={d}\n", .{value.float}),
.string => std.debug.print("value is string={s}!\n", .{value.string}),
}
Union(Enum) is Tagged Union
The error message on the previous example will actual say:
note: consider 'union(enum)' here
.
The Zig nomenclature for union(enum)
is actually called tagged union.
As we mentioned earlier, the individual fields of an enum are called tags.
Tagged union was created so that they can be used in switch expressions.
// first define the tags
const ValueType = enum {
int,
float,
string,
unknown,
};
// not too different from simple union
const Value = union(ValueType) {
int: i32,
float: f64,
string: []const u8,
unknown: void,
};
// just like the simple union
var value = Value{ .float = 42.21 };
switch (value) {
.int => std.debug.print("value is int={d}\n", .{value.int}),
.float => std.debug.print("value is float={d}\n", .{value.float}),
.string => std.debug.print("value is string={s}\n", .{value.string}),
else => std.debug.print("value is unknown!\n", .{}),
}
Capture Tagged Union Value
You can use the capture in the switch expression if you need to access the value.
switch (value) {
.int => |v| std.debug.print("value is int={d}\n", .{v}),
.float => |v| std.debug.print("value is float={d}\n", .{v}),
.string => |v| std.debug.print("value is string={s}\n", .{v}),
else => std.debug.print("value is unknown!\n", .{}),
}
Modify Tagged Union
If you need to modify the value, you have to use convert the value to a pointer in the capture using *
.
switch (value) {
.int => |*v| v.* += 1,
.float => |*v| v.* ^= 2,
.string => |*v| v.* = "I'm not Ed",
else => std.debug.print("value is unknown!\n", .{}),
}
Tagged Union as ADT
We now have everything we need to implement Zig version of ADT.
What makes ADT useful is that not only it will tell you the state but also the context of the state.
Using Zig for instance, the active tag in a union
will tell you the state, and if the tag is a type that has a value, then the value is the context.
// this example is fairly involved, please see the full code on github
// You can find the code at https://github.com/edyu/wtf-zig-adt/blob/master/testadt.zig
const NodeType = enum {
tip,
node,
};
const Tip = struct {};
const Node = struct {
left: *const Tree,
value: u32,
right: *const Tree,
};
const Tree = union(NodeType) {
tip: Tip,
node: *const Node,
}
const leaf = Tip{};
// this is meant to reimplement the binary tree example on https://wiki.haskell.org/Algebraic_data_type
// if you call tree.toString(), it will print out:
// Node (Node (Node (Tip 1 Tip) 3 Node (Tip 4 Tip)) 5 Node (Tip 7 Tip))
const tree = Tree{ .node = &Node{
.left = &Tree{ .node = &Node{
.left = &Tree{ .node = &Node{
.left = &Tree{ .tip = leaf },
.value = 1,
.right = &Tree{ .tip = leaf } } },
.value = 3,
.right = &Tree{ .node = &Node{
.left = &Tree{ .tip = leaf },
.value = 4,
.right = &Tree{ .tip = leaf } } } } },
.value = 5,
.right = &Tree{ .node = &Node{
.left = &Tree{ .tip = leaf },
.value = 7,
.right = &Tree{ .tip = leaf } } } } };
// see the full example on github
Bonus
In Zig, there is also something called non-exhaustive enum.
Non-exhaustive enum must be defined with an integer tag type in the ()
.
You then put _
as the last tag in the enum definition.
Instead of else
, you can use _
to ensure you handled all the cases in the switch expression.
const Eds = enum(u8) {
Ed,
Edward,
Edmond,
Eduardo,
Edwin,
Eddy,
Eddie,
_,
};
const ed = Eds.Ed;
std.debug.print("All your code are belong to ", .{});
switch (ed) {
// Zig switch uses , not | for multiple options
.Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
// can use capture
.Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
// else works but look at the code below for _ vs else
else => std.debug.print("us\n", .{}),
}
// obviously no such enum predefined
const not_ed = @as(Eds, @enumFromInt(Eds, 241));
std.debug.print("All your base are belong to ", .{});
switch (not_ed) {
.Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
.Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
// _ will force you to handle all defined cases
// if any of the previous .Ed, .Edward ... .Eddie is missing, this won't compile
// for example, if you forgot .Edurdo
// and wrote: .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => ...
// the code won't compile
_ => std.debug.print("us\n", .{}),
}
Btw, you can add function to enum, union, union(enum) just like you can in struct.
You can see examples of that in the code below.
The End
You can find the code here.
Top comments (4)
"union is similar to struct"
if by similar you mean pretty close to the exact opposite then yeah you're right
Haskellers always seem to be trying to win the old war that they lost and now they might every language to be as bad as theirs with the least elucidating descriptions as they could possibly create.
Somebody's been doing this for more than 20 years with my degree from Berkeley I still look at that and go huh?
More practically I wonder can you have the same type twice - useful for auto-generated code or also just so you don't have to care about what you're unioning? DO tag unions stacked with tagged unions wind up wasting a lot of space with the tag plus padding for each level? Do arrays of tagged unions waste a ton of space for the tag plus padding? Can you set the size of the tag? You generally don't want it to be a u8 because that will have Loop carry dependencies on some of the of the x86 registers. Rust made that mistake.
I was writing a db connection once and I end up having third my space taken up in tags and padding for atoms when I have three or four levels of tag unions so I end up having to rip all that out and just duplicate all my code which really sucked.
FYI, the new
0.11
changed@intToEnum()
to@enumFromInt()
, and instead of@intToEmum()
, you need to use@as(your_enum, @enumFromInt(some_int))
.When you go from the union example to the tagged union example, why did you add the unknown: void field? Was this necessary for the example of tagged union, or could it be omitted?
It was not necessary. It was meant as a way to show how you can also use void as a type for the union. You can certainly omit it.
The idea is to show that union can take a list of different types as it's a union of all the types defined in the union.