Zig NEWS

Cover image for What's a String Literal in Zig?
Loris Cro
Loris Cro

Posted on

What's a String Literal in Zig?

A string literal is something like "hello world": a hard-coded string in source code. Seemingly one of the most basic parts of Zig, but in fact there are a couple of non-obvious things worth knowing.

Where's the memory?

String literals are put in a special section of the executable and even get de-duplicated. This is a process generally called string interning.

This means that when you're working with string literals, you're not using stack memory for the string itself and instead you're always dealing with pointers. This can be easily confirmed:

pub fn main() void {
   const foo = "banana";
   @compileLog(@TypeOf(foo));
}
Enter fullscreen mode Exit fullscreen mode

Compiling the code above will show how the type of foo is *const [6:0]u8, a pointer indeed.

This has also one interesting implication: you can return pointers to string literals defined inside a function, something that would be wrong in any other case.

Broken code:

pub fn main () void {
   _ = ohno(); // ohno returns a pointer to garbage memory
}

fn ohno() []const u8 {
   const foo = [4]u8{'o', 'h', 'n', 'o'};
   return &foo; // oh no, the memory where foo is stored
                // will be reclaimed as soon as we return!
}
Enter fullscreen mode Exit fullscreen mode

Perfectly fine:

pub fn main () void {
   _ = ohyes(); // everything is fine
}

fn ohyes() []const u8 {
   const foo = "ohyes";
   return foo; // foo is already a pointer, or rather
               // foo was a pointer all along 
}
Enter fullscreen mode Exit fullscreen mode

As I mentioned above, string interning is the reason why that works: even though the string literal is declared inside ohyes, the bytes are somewhere else.

The most common newbie mistake

So while you're learning Zig you come up with the following program:

const std = @import("std");

pub fn main () void {
   funnyPrint("banana");
}

fn funnyPrint(msg: []u8) void {
   std.debug.print("*farts*, {s}", .{msg});
}
Enter fullscreen mode Exit fullscreen mode

Everything seems correct except that the compiler gives you an error:

./example.zig:4:15: error: expected type '[]u8', found '*const [6:0]u8'
   funnyPrint("banana");
Enter fullscreen mode Exit fullscreen mode

Reading the error message, you start asking yourself: How do I make a slice from a pointer to an array? Also shouldn't the language to that automatically for me?
And you would be right, well almost right.

The problem here has to do with constness: string literals are constant pointers and you are asking in funnyPrint for a []u8 while you should have been asking for a []const u8.

Going back to the question about automatic conversion: yes, pointers to arrays coerce to slices, but constness is a one-way road, meaning that you can pass a mutable pointer/slice to a function that expects a const pointer/slice, but not vice versa.

As for why string literals are const, well that has to do with string interning and if you think about it, it's kinda obvious: string literals get de-duplicated so there is no way of changing one instance without affecting the others, making immutability the saner design decision by far.

When learning Zig this is usually the moment where the student thinks that this constness factor can be removed by assigning the string literal to a variable, like so:

var msg = "banana";
funnyPrint(msg);
Enter fullscreen mode Exit fullscreen mode

Given what we learned now, we know that this is not true, because msg itself might be mutable, but that doesn't change the fact that it holds a const pointer. This is a concept that you might not be aware of, if you're not familiar with C or other systems programming languages: there is a difference between pointer constness and var vs const variables.

Another reason why I think this is a fairly common mistake when learning Zig, is that you are already busy learning the language itself and so you're tempted to be sloppy and just leave out const specifiers. Unfortunately string literals are both a very common placeholder value and the one thing that will punish you if you don't understand const pointers.

Watch the talk

If you want to watch me give a full talk on this topic, here it is :^)

Also you might want to use this post as a cheat-sheet:

Bonus credit

How do you make a mutable string from a string literal then? Simple: dereference the pointer and you get an array, which can be assigned to local memory, which can then be used freely.

pub fn main() void {
   var foo = "hello".*;
   foo[0] = 'j'; // foo now equals "jello"
}
Enter fullscreen mode Exit fullscreen mode

Top comments (11)

Collapse
 
david_vanderson profile image
David Vanderson

This is good - string literals are still something I'm stumbling over.

var foo: [50]u8 = undefined;
foo = "bar"; // compile error here

I think I can do this in a function that uses comptime to introspect the size of the string literal:

fn assignStr(out: []u8, str: [:0]const u8) void {
for (str) |c, i|
out[i] = c;
out[str.len] = 0;
}

assignStr(&foo, "bar"); // works

But this doesn't help in struct initializers:
const A = struct {
foo: [10]u8,
};

var a = A{.foo = "bar"}; // Is there a way to make this work?

Collapse
 
mrkishi profile image
mrkishi • Edited

I don't think there's a way to directly assign a smaller array to a larger one, but you can always create a helper function to do so:

fn array(comptime T: type, comptime size: usize, items: ?[]const T) [size]T {
    var output = std.mem.zeroes([size]T);
    if (items) |slice| std.mem.copy(T, &output, slice);
    return output;
}

const A = struct {
    foo: [10]u8,
};

// fully initialize
var foo: [3]u8 = .{'f', 'o', 'o'};

// zero initialize then copy desired contents
var bar: [50]u8 = [_]u8{0} ** 50;
std.mem.copy(u8, &bar, "bar");

var baz: A = A{.foo = [_]u8{0} ** 10};
std.mem.copy(u8, &baz.foo, "baz");

// fully initialize with comptime concatenation
var bar_full: [50]u8 = "bar".* ++ [_]u8{0} ** 47;
var baz_full: A = A{.foo = "baz".* ++ [_]u8{0} ** 7};

// with helper function
var bar_alt = comptime array(u8, 50, "bar");
var baz_alt = A{.foo = comptime array(u8, 10, "baz")};
Enter fullscreen mode Exit fullscreen mode
Collapse
 
haydenridd profile image
haydenridd

Yep, tried a couple ways but can't seem to get around having to manually specify the type in some way rather than being able to infer it. Slight variant I came up with specifically for string literals only:

fn stringLiteralToArray(comptime literal: []const u8, ArrayType: type) ArrayType {
    const ti = @typeInfo(ArrayType);
    comptime {
        assert(ti == .Array);
        assert(ti.Array.len >= literal.len);
    }
    return literal[0..literal.len].* ++ .{0} ** (ti.Array.len - literal.len);
}

// Non-field variables can infer the type
var string_buffer = stringLiteralToArray("A default value", [64]u8);

// Struct fields need to specify both sadly
const SomeStruct = struct {
    field: [32]u8 = stringLiteralToArray("Another default", [32]u8),
};

Enter fullscreen mode Exit fullscreen mode
Collapse
 
david_vanderson profile image
David Vanderson

Thanks - great to see different options for how to do it!

Can you think of a way to use the helper function and somehow infer the size of the struct member so it doesn't have to be repeated? Not a huge deal but I can't figure out how to do it.

Thread Thread
 
mrkishi profile image
mrkishi

I don't know how to do that exactly. The closest I could come up with would be to leave the array undefined and use a helper function to fill it in, but that has slightly different semantics since the memcpy is a little more explicit:

fn init_array(target: anytype, items: ?[]const std.meta.Elem(@TypeOf(target))) void {
    const T = std.meta.Elem(@TypeOf(target));
    if (items) |slice| {
        std.mem.copy(T, target, slice);
        std.mem.set(T, target[slice.len..], std.mem.zeroes(T));
    } else {
        std.mem.set(T, target, std.mem.zeroes(T));
    }
}

var foo: [50]u8 = undefined;
init_array(&foo, "foo");

var a = A{.foo = undefined};
init_array(&a.foo, "foo");
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
david_vanderson profile image
David Vanderson

Good solution. I haven't seen std.meta.Elem before so I'm going to go read up on that. Thanks!

Collapse
 
mrkishi profile image
mrkishi

I believe all consts end up on static data sections, not only strings.

Check it out on Godbolt.

Collapse
 
kristoff profile image
Loris Cro

Uh, interesting, I'll investingate a little bit and edit the wording. Thanks for pointing that out!

Collapse
 
mrkishi profile image
mrkishi

Cheers!

In case anyone is following along, I updated the example to better demonstrate where consts end up in memory: godbolt.org/z/jM75hf5b9.

I don't know whether Zig provides a way to check what kind of memory a pointer points to, or if it exposes sections' addresses (eg. gcc's end, edata and etext), but note how get_ptr_static basically returns the address of an unnamed label, which is before the heap (and way before the stack, which grows downwards).

Collapse
 
emmanueloga profile image
Emmanuel Oga • Edited

Q: "dereference the pointer and you get an array"

... if this array is now mutable, does this mean that a copy was made? Otherwise you would be mutating the string literal, which before you explained is not possible.

I suspect dereferencing the string literal is making a copy of it to the stack, although that is a bit surprising to me.

Collapse
 
kristoff profile image
Loris Cro

The array is a value type so assigning it to a local variable means copying the entire value to the local variable, so yes a copy is being made. Whenever you dereference a pointer to a value and assign it somewhere, you're making a copy.