This post was inspired by this thread and my frustrations with the notoriously sparse Zig documentation (Rant towards the end)
Imagine a contrived function, that sets every alternative character in a string to a
. We don't know (and don't care) about how our String is created, just that it is X chars long and each char as an u8
. Fortunately, there is a perfect type that satisfies our needs : A slice
.
Our function therefore takes a slice
as input.
fn alternate_a(input: []u8) void {
var i: usize = 0;
while(i < input.len): (i += 2) {
input[i] = 'a';
}
}
Now you try to call this from main
const std = @import("std");
pub fn main() void {
const input = "Hello Zig";
alternate_a(input);
std.debug.print("Updated string is {s}\n", .{input});
}
What happens ? Kaboom! You run into the first error !
./src/main5.zig:12:17: error: expected type '[]u8', found '*const [9:0]u8'
alternate_a(input);
You think, okay, perhaps its because I have declared input
as a const
. No worries, let me change it to var
(var input = "Hello Zig";
)
./src/main5.zig:12:17: error: expected type '[]u8', found '*const [9:0]u8'
alternate_a(input);
WTH ? It still shows the same error !
You wonder:
Shouldn't
var
make things mutable (or atleast, shouldn't the compiler complain if you try to make avar
point to a string-literal that isn't mutable ?)
Let us look at the documentation for string-literals
String literals are constant single-item Pointers to null-terminated byte arrays. The type of string literals encodes both the length, and the fact that they are null-terminated, and thus they can be coerced to both Slices and Null-Terminated Pointers. Dereferencing string literals converts them to Arrays.
Let us break this sentence down into the dialectic style of the Upanishads
Q: What is a string literal ?
A: It is a constant pointer
Q: What does it point to ?
A: It points to a null-terminated byte array
Q: What is a null-terminated byte array ?
A: It is an array with a null value at the end. The type info contains the size of the array and the type of the element
The type of "Hello Zig" is therefore *const [9:0]u8
Like any pointer, you can also dereference (*
) a string literal (Crazy, I know!), as string-literals are also pointers according to the documentation. Let us see what we get when we dereference our string-literal.
var input = "Hello Zig".*`;
./src/main5.zig:12:17: error: expected type '[]u8', found '[9:0]u8'
alternate_a(input);
Close. The type
of an array encodes its length also.
When we pass input (an array) as argument, the compiler complains that is expecting a slice, but is getting an array
instead.
How do we coerce an array into a slice ?
The zig documentation has a section on Type coercion between arrays and slice, but doesn't have a good, illustrative example for coercing [N:0]u8 to []u8.
A little bit of experimentation and I figured out that you can turn an array into a slice, by referencing (&) it.
var input = "Hello Zig".*;
alternate_a(&input);
In this case, the program prints nicely the expected output
Updated string is aeala aia
What is my frustration with Zig ? Hard to grok documentation aside, here is something that seems paradoxical to me:
const g = 44;
var gp = &g;
This fails with the following error:
./src/main5.zig:16:5: error: variable of type '*const comptime_int' must be const or comptime
var gp = &g;
But this compiles
var input = "Hello Zig";
Why is 1) an error , but 2) isn't ?
In both cases, I am trying to make a var
point to *const X
, where X is comptime_int
in 1) and [9:0]u8
in 2).
Maybe the comptime_int
is compiled into an immediate
value in assembly, in which case it doesn't make sense to be able to create an address for it, but what is the case with string-literals then ? Is it stored in the .bss
section ? or in the .rodata
section of an ELF ? And what is meant by *const
? My understanding from a rudimentary C background is that a const pointer
cannot be made to point to other things once it is initialized to point to something. If so, var input = "Hello zig"
should be illegal like case 1). But that is clearly not the case, as I can do something like input = "Yolo swag";
in the next line.
Or does it have to do with the fact that string literals are immutable ?
input[0] = 'x'
fails, but shouldn't the type be then *[N:0]const u8
indicating that it is the data that is const and not the pointer itself ?
This behaviour feels very inconsistent and hard for beginners to grasp the language's basics very well. One of the core ethos of Zig is Communicate intent precisely
and Reduce the amount one must remember
, aka, be explainable, but the fact that I need to write such a big blog post to understand something as basic as string-literals, to me, implies that these ethos are being violated.
As a long-time Zig follower, I understand that most of Zig is free labour from volunteers and I cannot thank them enough for contributing to this language. That said, jumping into Discord at 3 AM in the morning, everytime i need something rudimentary understood is not the kind of ergonomics that I was looking for, for it is no better than the anything can happen world of C
or so complicated that a mere mortal cannot understanding in a lifetime complication of Rust
.
For Zig to be more widely used, a more concerted effort must be make the mechanics of the language understood more easily.
Top comments (7)
The reason why strings behave like this is because of interning. The variable is allowed to be mutable because it contains a pointer, so, in theory, you could make it point to another string (but not modify the string itself, since multiple vars might refer to the same bytes).
In this case Zig is mainly giving you insight into a process that is present in pretty much all languages.
This one is Zig-specific, I believe. Since you're not specifying the type of
g
, you getcomptime_int
, which pulls you into the semantics of comptime evaluation. I believe that in stage2 you will be able to take a pointer from acomptime_int
, but I'm actually not sure. In other words, this second case is the result of admittedly not-well-defined semantics in the language, but also you most probably want to specify a runtime type for any value that you want the pointer of. Note that during compilationcomptime_int
s are BigInts internally, meaning that taking their pointer is not a basic operation and the compiler must come up with its own design to avoid leaking that implementation detail.Thanks for the reply !
The Zig specific behaviour of
comptime_int
makes more sense , but for strings I can still dovar input = "Hello Zig".*
and then modify it to something likeinput[0] = 'a'
and input actually changes toaello Zig
. What happens in this case?Does Zig create a copy of
Hello Zig
on the stack, or does it create a completely new String and then modified that ?"Hello Zig"
is a chunk of memory inside the.rodata
of your executable (or something of that sort, I believe different architecture-specific backends can decide where this stuff goes). So when you assing it to a variable, you get a pointer to those bytes.When you dereference the pointer you get the full array contents which, yes, get copied to stack memory (assuming we're inside a function) and that then you can modify, since that memory is yours. If you look at the types it's very clear and consistent (if you know about string interning).
Ok, this is the context I was missing (basically dereferring "x".*) creates a copy on the Stack that I can then modify. Thanks !
You have done a good job. For ordinary programmers, even if there are complete and clear documents in many programming languages, there will still be a lot of confusion. Zig is even more lacking in documentation. Learning resources about zig need everyone to work together. Make an effort to enrich it and make it easier to understand and learn. I have to say, zig still has a long way to go.
Thanks. I will be trying to writing more such content about the internals of PLs and Zig to get people to jump from the beginner -> learned person level.
Despite how much I enjoy discovering Zig, I do have trouble wrapping my head around certain concepts, strings literals being one of them.
I just spend some serious time trying to pass a string literal as a function argument. Despite this great article (zig.news/kristoff/what-s-a-string-...), which almost gives the answer, I kept banging my head to the wall.
I finally clicked after reading the post above, so I figure I could as well write it down, maybe it will help others :
In other word, you want to pass a
slice of immutable u8 values