Posted on • Updated on

So where is my stuff stored ? - Part 2

This is a continuation of my previous post, where I explored how Zig returns structs via the stack.
We saw that for small struct, we can allocate some space in the frame of the calling function and then pass a pointer to it, that gets filled by the function returning the struct.
What if the returning struct is very big ? Something like the following :

const std = @import("std");
const X = struct { x: u32, y: u64, r: [32000]u32 };
fn Xmaker() X {
    return X{
        .x = 455,
        .y = 497,
        .r = [_]u32{0} ** 32000,
pub fn main() void {
    var q = Xmaker();
    std.debug.print("{}", .{q});
Enter fullscreen mode Exit fullscreen mode

We are storing an array of 32,000 unsigned integers instead of 8 integers previously. A single struct has the size 16 + (32,000 * 4) bytes = 128016 bytes (128k bytes) !

What does Zig do in this case ? Let us take a look at the generated assembly.

000000000024c280 <main>:
  24c280:   55                      push   rbp
  24c281:   48 89 e5                mov    rbp,rsp
  24c284:   b8 20 e8 03 00          mov    eax,0x3e820
  24c289:   e8 92 a0 00 00          call   256320 <__zig_probe_stack>
  24c28e:   48 29 c4                sub    rsp,rax
  24c291:   48 8d bd f0 0b fe ff    lea    rdi,[rbp-0x1f410]
  24c298:   e8 c3 72 00 00          call   253560 <Xmaker>
  24c29d:   48 8d bd e0 17 fc ff    lea    rdi,[rbp-0x3e820]
  24c2a4:   48 8d b5 f0 0b fe ff    lea    rsi,[rbp-0x1f410]
  24c2ab:   ba 10 f4 01 00          mov    edx,0x1f410
  24c2b0:   e8 1b 9e 00 00          call   2560d0 <memcpy>
Enter fullscreen mode Exit fullscreen mode

Notice in line 4 of main, we have a call to __zig_probe_stack. We did not directly call this fn, so it looks like the zig compiler injected this fn call into our code. What does __zig_probe_stack do ?

0000000000256320 <__zig_probe_stack>:
  256320:   51                      push   rcx  256321:   48 89 c1                mov    rcx,rax
  256324:   48 81 f9 00 10 00 00    cmp    rcx,0x1000
  25632b:   72 1c                   jb     256349 <__zig_probe_stack+0x29>
  25632d:   48 81 ec 00 10 00 00    sub    rsp,0x1000
  256334:   83 4c 24 10 00          or     DWORD PTR [rsp+0x10],0x0
  256339:   48 81 e9 00 10 00 00    sub    rcx,0x1000
  256340:   48 81 f9 00 10 00 00    cmp    rcx,0x1000
  256347:   77 e4                   ja     25632d <__zig_probe_stack+0xd>
  256349:   48 29 cc                sub    rsp,rcx
  25634c:   83 4c 24 10 00          or     DWORD PTR [rsp+0x10],0x0
  256351:   48 01 c4                add    rsp,rax
  256354:   59                      pop    rcx
  256355:   c3                      ret
Enter fullscreen mode Exit fullscreen mode

We call __zig_probe_stack from main with a value of 0x3e820 (decimal 256032 = 2 * sizeof(x)). This argument is passed to __zig_probe_stack via the rax register.
__zig_probe_stack subtracts rsp by this value (the jb and ja are if else to do this sub only once if rax < 4096 or more than once if rax > 4096).

__zig_probe_stack after subtracting rsp , access a value 16 bytes above rsp and ors it with 0x0. It then returns after restoring the value of rcx.

This seems weird ? The or seems useless and is done at a totally random location. Why ?
The source of zig_stack_probe doesn't shed a lot of light, except that any access below the rsp will cause a segfault in Linux with kernel versions below 5.1.

Some further stack overflow-ing later, I found a plausible explanation: sub rsp is a way to extend the stack of a process, lazily (till it hits the limits set on the process by the kernel). Think of it like malloc but for the stack.
Once rsp is subbed, we access a location just above it, in order to trigger the stack expansion if necessary.

What Zig is doing here is making sure that there is enough space on the stack to allocate 2 instances of our large struct. If there isn't, then this lazy allocation of stack will segfault, causing our program to crash early. A pretty elegant solution, I must say!

Once the process is able to extend the stack, it then proceeds, with our Xmaker storing the large struct in the stack and then followed by memcpy making a copy of it, as in the previous example.

Discussion (4)

kristoff profile image
Loris Cro

Thank you for the article!

If you want, consider putting part 1 and part 2 into a series (edit the first article, then click the button with a gear icon to create a new series). Also, since you mentioned that the first article was originally posted in your blog, you can also add a setting for that (in the same menu opened by the button with a gear icon).

gowind profile image
Govind Author

Hi Loris,
Thanks for the tip ! Update my bio and linked to the articles and made them into a series.
I am learning more Zig and will be planning to write more such intro articles. Maybe this will get more beginners interested in exploring Zig and operating systems / assembly :)

cassepipe profile image

Does Zig will systematically panic in case of memory allocation failure ?
Or is there a way to handle the error ?

gowind profile image
Govind Author

I don't think this can be handled. unlike malloc where something like sbrk, a system call is used internally, this is no system call where you can check the return value, but upto the OS to decide a process can have a larger stack or not and can crash if the OS decides that this process cannot have more stack storage.