This article has been originally published in my personal site.
This is a small investigation into how Zig creates structs  on the stack.
Programs normally store data in 3 different places
- In the executable as .dataor.rodata. These sections are loaded in the memory space of the process when it is created.
- The heap. These are new memory spaces created as the process requests it from the operating system.
- The Stack. The stack is a part of the memory space of a process. As processes execute functions, the stack is manipulated to create space for data.
As a language with no implicit heap allocation, allocating memory on the heap  involves creating and configuring an Allocator and using it to get pointers to allocated objects in the heap (Note: An allocator can be configured to use the stack as well, but lets ignore it for now)
When you don't use an allocator and still return structs or primitives (ints, floats etc) from functions, where is it stored ? Most platforms/OSes have a Calling Convention for returning primitives suchs as ints or floats in a register (eg. on x86-64, a function returning an int puts the return value in the rax register). 
What about more complicated data types, such as structs ?
const std = @import("std");
const X = struct { x: u32, y: u64, r: [8]u32 };
fn Xmaker() X {
    return X{
        .x = 455,
        .y = 497,
        .r = [_]u32{0} ** 8,
    };
}
pub fn main() void {
    var q = Xmaker();
    std.debug.print("{}", .{q});
}
To investigate this, I compiled this code into an executable and disassembled the binary.
Lets take a look at it to see what happens (Note, this might change slightly depending on OS/Platform, but I suspect the mechanism is more or less the same.)
000000000022cea0 <main>:
  22cea0:   55                      push   rbp
  22cea1:   48 89 e5                mov    rbp,rsp
  22cea4:   48 83 ec 60             sub    rsp,0x60
  22cea8:   48 8d 7d d0             lea    rdi,[rbp-0x30]
  22ceac:   e8 bf 72 00 00          call   234170 <Xmaker>
  ... more follows
Trying to grok this, it looks like the following happens:
- main first creates 96 bytes of space in the stack (sub rsp, 0x60). This corresponds to 2x sizeof(X) (9 x u32 = 36 + 1 x u64 + 4 bytes padding = 48 bytes).
- It then sets register rdito the address rbp - 48. Each address addresses 1 byte and a function can store its local variables starting at addressrbpand lower (stack direction is from high -> low in x86 )
-  lea rdi, [rbp-0x30]loads the valuerbp-48intordi. In x86 the first integer argument to a function is stored inrdi(more info about calling conventions in [[From Source Code to Hello World/X86 calling convention]])
- main calls our Xmakerfunction.
How does Xmaker return a struct ?
  234170:   55                      push   rbp
  234171:   48 89 e5                mov    rbp,rsp
  234174:   48 89 f8                mov    rax,rdi
  234177:   c7 07 c7 01 00 00       mov    DWORD PTR [rdi],0x1c7
  23417d:   48 c7 47 08 f1 01 00    mov    QWORD PTR [rdi+0x8],0x1f1
  234184:   00
  234185:   48 8b 0c 25 00 32 20    mov    rcx,QWORD PTR ds:0x203200
  23418c:   00
  23418d:   48 89 4f 10             mov    QWORD PTR [rdi+0x10],rcx
  234191:   48 8b 0c 25 08 32 20    mov    rcx,QWORD PTR ds:0x203208
  234198:   00
  234199:   48 89 4f 18             mov    QWORD PTR [rdi+0x18],rcx
  23419d:   48 8b 0c 25 10 32 20    mov    rcx,QWORD PTR ds:0x203210
  2341a4:   00
  2341a5:   48 89 4f 20             mov    QWORD PTR [rdi+0x20],rcx
  2341a9:   48 8b 0c 25 18 32 20    mov    rcx,QWORD PTR ds:0x203218
  2341b0:   00
  2341b1:   48 89 4f 28             mov    QWORD PTR [rdi+0x28],rcx
  2341b5:   5d                      pop    rbp
  2341b6:   c3                      ret
Xmaker first copies the value in rdx (the address when our struct will be stored) into rax
The first DWORD PTR [rdi], 0x1c7, copies the decimal value 455 into the first byte of the struct. This corresponds with the Zig code in our Xmaker fn return { .x = 455, ...}. 
Next we store 0x1f1 (decimal 497) at [rdi + 0x8] . This is because y is of type u64 and therefore needs 8 bytes of alignment. 
The next instruction : mov rcx QWORD PTR ds:0x203200, is interesting. The registers ds , fs (segment registers) etc are not used in 32-bit or  64-bit modes in x86-64 (except fs , I think which is used when you have multiple threads and each accessing a threadlocal variable).
I do not know why the compiler generated this code, but when debugging using gdb , the value of ds was  0 and the values at address 0x203200-203219 , which lie in the .rodata (read-only) section of the process were also 0.  
There is a bit of optimization going on. Since our array is a nice size of 8,  we can use 4 operations to copy 2 4-byte values in each instruction ( X86-64 instructions can move 64-bit (8 bytes) at a time). We set .r[n], .r[n+1] in each instruction. QWORD PTR [rdi+0x20],rcx sets  rdi+32...rdi+36 to 0
Xmaker "returns" a value in the Zig code. In the generated assembly however, we passed a pointer, via rdi to a location inside main's stack to store our return value. 
This is how a struct return is translated into assembly
Looking to the code of main after the call to Xmaker :
  22ceb1:   48 8d 7d a0             lea    rdi,[rbp-0x60]
  22ceb5:   48 8d 75 d0             lea    rsi,[rbp-0x30]
  22ceb9:   ba 30 00 00 00          mov    edx,0x30
  22cebe:   e8 dd 9d 00 00          call   236ca0 <memcpy>
  22cec3:   48 8d 7d a0             lea    rdi,[rbp-0x60]
  22cec7:   e8 f4 72 00 00          call   2341c0 <std.debug.print.157>
  22cecc:   48 83 c4 60             add    rsp,0x60
  22ced0:   5d                      pop    rbp
  22ced1:   c3                      ret
The return value of Xmaker is stored starting at rbp-0x30.  Here we call memcpy to copy this entire struct into a location starting at rbp-0x60 (a little confusing, but the struct is stored from rbp-0x60.. rbp-0x31 due to the downward growing size of the stack.)
Again, I do not know why a memcpy is needed here. We can continue using the initial struct at rbp-0x30 as we are not modifying it and passing it as a read only value to std.debug.print. 
Again, when we want to print this struct with std.debug.print, we pass it as an the first argument  for the fn, via the rdi index (usually, the format string "{}"s address is sent to functions like printf in C. In Zig, it seems to have been optimized away. Interesting.)
Once print returns, we restore our stack pointer to the state it was at the beginning of our function, pop the base pointer and then return. 
Now our struct is small enough that we can store it on the stack. What happens when we have a very big struct, with a big member , for example , like :
const X = struct { x: u32, y: u64, r: [20000]u32 };
Would this still be allocatable on the stack ? wouldn't our program crash, as the struct is too big ? Zig seems to have an interesting technique to address this (or atleast crash cleanly). Let us explore this in Part 2.
 
Top comments (3)
Cliffhanger ending, nice :^)
I am finding out so much stuff about how Zig works ! (On vacation this week, so will only get to finishing this up this weekend , but its so much fun !)
good, I'm really looking forward to what's next