Andres

Posted on Sep 27, 2021

Crafting an Interpreter in Zig - part 4

This is the fourth post of Crafting an Interpreter in Zig, the blog series where I read the book Crafting Interpreters, implement the III part using Zig and highlight some C feature used in the book and compare it to how I did it using Zig. You can find the code here, and if you haven't read the first two parts go read them!

In this chapter, chapter 17, we implement the first version of our compiler-parser. At this point it can only parse numbers and the basic arithmetic operations, but this is the first time we finally see our interpreter working end to end.

This was a long chapter and a lot of things happen in it. We implement a Vaughan Pratt’s Parser and we implement error handling in our compiler.

Let's start with error handling. If something differitates C and Zig is error handling. Error handling in Zig is first class, errors are values that can be returned from a function and must be handled by the caller of the function. In C, errors are handled a bit different, they are not first class values and usually are represented by a specific value of the type a function returns, for example, in C if you return other than 0 from the main function it means there was an error, or if you try to alloc some memory and get NULL back it means an error happened. Let's see it with an example in the book.

InterpretResult interpret(const char* source) {
  Chunk chunk;
  initChunk(&chunk);

  if (!compile(source, &chunk)) {
    freeChunk(&chunk);
    return INTERPRET_COMPILE_ERROR;
  }

  vm.chunk = &chunk;
  vm.ip = vm.chunk->code;

  InterpretResult result = run();

  freeChunk(&chunk);
  return result;
}

In this code we are trying to compile the Lox code. The compile function returns false if there was an error compiling the code or true otherwise. In case there was an error compiling, we return INTERPRET_COMPILE_ERROR from the interpret function, INTERPRET_COMPILE_ERROR is one of the values of the InterpretResult enum.

This way of handling errors is error prone (no pun intended). You have to know which values of the set of possible return values represent errors and which not, and the compiler will not warn you or force you to handle such error. Not only that but resource management becomes complex. See that we have to call freeChunk on error as well as when there is no error.

In Zig the story is a bit different, let's take a look at the equivalent code and see what is going on.

pub fn interpret(self: *Self, source: []const u8, allocator: *Allocator) InterpretError!void {
    var chunk = Chunk.init(allocator);
    defer chunk.deinit();

    compiler.compile(source, &chunk) catch return InterpretError.CompileError;
    self.chunk = &chunk;
    self.ip = 0;

    try self.run();
}

Our function now returns a InterpretError!void, which is an error union, it means that we can either have and error or success, we are telling the caller right away that calling this function might cause an error that you must handle, and not only that, but the compiler will force you to do it. When we call the compile function, instead of it return true or false when there is an error, it returns an error union as well and we have to handle it. In this case we want to convert whatever error we have at our compile stage and covert it in a InterpretError.CompileError. Also it is important to note that to free our resources we only had to call defer chunk.deinit() right after initializing our chunk and it will be called if we return either an error or if we get to the end successfully.

How does our compile function looks?

pub fn compile(source: []const u8, chunk: *Chunk) !void {
    var scanner = Scanner.init(source);
    var parser = Parser.init(&scanner, chunk);
    try parser.advance();
    try parser.expression();

    // Make sure there are no tokens left. if there are, it is an error.
    if (scanner.nextToken()) |_| {
        parser.errAtCurrent("Expect end of expression.");
        return CompileError.CompileError;
    }
    parser.endCompiler();
}

You can see it has a bunch of trys which means we care about the error but don't or can't handle it right now, so we forward it to the caller. Also we can return an error anywhere in our function. In this case if we still have unconsumed tokens at the end of our process we want to return an error. See that parser.errAtCurrent that is one of the things I will say I missed when handling errors in Zig. I missed being able to add context to the errors. I which I could have write something like

return CompileError.CompileError("Expect end of expression.");

but in this case I had to print and "handle" the error even before returning it and probably not in the best possible place. However, as you can see there are work arounds, for example, I could have more specific errors, instead of returning CompileError.CompileError I could have returned CompileError.NoEndOfExpression (very bad name) and handle it accordingly in the caller.

Other interesting thing we did in this chapter was implementing a Pratt Parser. I am not going to try to explain what a Pratt Parser is, I don't fully understand it myself yet, please go read it directly from Crafting Interpreters itself. What I can tell you is that this kind of parser requires a mapping from token types to a struct that we are going to call ParseRule this struct holds two function pointers and a number that indicates the precedence of that rule, very important when parsing infix operations. This is our struct defined in Zig.

const ParseFn = fn (parser: *Parser) anyerror!void;
const ParseRule = struct {
    prefix: ?ParseFn,
    infix: ?ParseFn,
    precedence: Precedence,
}

I really want to note how good this syntax for function types is. It is very clear and resembles how we normally define functions in other contexts.

For comparison this is how the definition of ParseFn is donde in C.

typedef void (*ParseFn)();

I don't know about you, but I couldn't understand what was going on.
Then for the mapping we use a simple switch expression.

    return switch (ty) {
        .LEFT_PAREN => ParseRule.init(Parser.grouping, null, .precNone),
...
        .MINUS => ParseRule.init(Parser.unary, Parser.binary, .precTerm),
        .PLUS => ParseRule.init(null, Parser.binary, .precTerm),
        .SEMICOLON => ParseRule.init(null, null, .precNone),
        .SLASH => ParseRule.init(null, Parser.binary, .precFactor),
        .STAR => ParseRule.init(null, Parser.binary, .precFactor),
        .BANG => ParseRule.init(Parse.unary, null, .precNone),
...
}

I am omitting a lot of the tokens, but what I want to highlight is how simple are the switchs in Zig. I've found myself using switch even in situations where I would normally use an if in any other language, it is very powerful and can be use as a statement and as an expression which is a great feature that feels very modern. Also note that passing a function to be use as a function Pointer is as simple as writing its name. In this case the functions I wanted to use were defined inside of the Parser struct, and to use them as function pointers I just had to think about Parser as a namespace.

Zig truly feels like a modern language, the error handling features are very powerful and really makes us thing twice about when and where errors can happen in our code, making it robust. Zig's syntax while very close to C's syntax improves in some key concepts over C, in this entry of the series we noted two significant improvements, function pointers and switch expressions/statements. We will talk about other features that improve over C syntax in feature posts, spoiler alert tagged unions.

This is all for today, hope you liked it and see you in the next one.

Cover Photo by Ann H from Pexels

Top comments (3)

storyfeet • Oct 18 '21

Thanks to this series, I was inspired to follow in your footsteps and try this myself.

I've just come to the part where we make the table of rules in the compiler :

ParseRule rules[] = {
  [TOKEN_LEFT_PAREN]    = {grouping, NULL,   PREC_NONE},
  [TOKEN_RIGHT_PAREN]   = {NULL,     NULL,   PREC_NONE},
  // ...
}

When I realized the enums were the location in the array, my head just went ouch, as I while I can see this is fast it's so precarious if you get your array locations wrong somehow.

I couldn't work out how to do this in zig, I considered a comptime hashmap, or just a function. Took a sneak peek at your solution, I think it's a good choice, "just follow the switch"

Andres • Oct 19 '21

One of the things I've enjoyed the most about Zig is the switch expression/statement. it has a very modern syntax as well as compose really good with other constructs on the language like try, return, etc

Antonio Patriarca • Jan 23 '22

I got the impression the book author is sometimes trying to show some C feature instead of taking the easier route. Even in C a switch would have been easier to write and understand IMHO. Each row has in fact only a few entries that are different from the default.

It is possible to obtain something similar in Zig as well anyway. You create an array big enough to contain all the tokens and you then initialize it at compile time with those values.

Zig NEWS

Crafting an Interpreter in Zig - part 4

Top comments (3)

Read next

Creating safer zig bindings for liburing

Yet another parser combinators library

Caching API responses with Redis in Zig

A Zig Easing/Tweening Library for Game Dev