This article has been originally published in my personal site.
This is a small investigation into how Zig creates structs
on the stack.
Programs normally store data in 3 different places
- In the executable as
.data
or.rodata
. These sections are loaded in the memory space of the process when it is created. - The heap. These are new memory spaces created as the process requests it from the operating system.
- The Stack. The stack is a part of the memory space of a process. As processes execute functions, the stack is manipulated to create space for data.
As a language with no implicit heap allocation, allocating memory on the heap
involves creating and configuring an Allocator
and using it to get pointers to allocated objects in the heap (Note: An allocator can be configured to use the stack as well, but lets ignore it for now)
When you don't use an allocator and still return structs or primitives (ints, floats etc) from functions, where is it stored ? Most platforms/OSes have a Calling Convention for returning primitives suchs as ints or floats in a register (eg. on x86-64, a function returning an int puts the return value in the rax
register).
What about more complicated data types, such as struct
s ?
const std = @import("std");
const X = struct { x: u32, y: u64, r: [8]u32 };
fn Xmaker() X {
return X{
.x = 455,
.y = 497,
.r = [_]u32{0} ** 8,
};
}
pub fn main() void {
var q = Xmaker();
std.debug.print("{}", .{q});
}
To investigate this, I compiled this code into an executable and disassembled the binary.
Lets take a look at it to see what happens (Note, this might change slightly depending on OS/Platform, but I suspect the mechanism is more or less the same.)
000000000022cea0 <main>:
22cea0: 55 push rbp
22cea1: 48 89 e5 mov rbp,rsp
22cea4: 48 83 ec 60 sub rsp,0x60
22cea8: 48 8d 7d d0 lea rdi,[rbp-0x30]
22ceac: e8 bf 72 00 00 call 234170 <Xmaker>
... more follows
Trying to grok this, it looks like the following happens:
- main first creates 96 bytes of space in the stack (
sub rsp, 0x60
). This corresponds to 2x sizeof(X) (9 x u32 = 36 + 1 x u64 + 4 bytes padding = 48 bytes). - It then sets register
rdi
to the address rbp - 48. Each address addresses 1 byte and a function can store its local variables starting at addressrbp
and lower (stack direction is from high -> low in x86 ) -
lea rdi, [rbp-0x30]
loads the valuerbp-48
intordi
. In x86 the first integer argument to a function is stored inrdi
(more info about calling conventions in [[From Source Code to Hello World/X86 calling convention]]) - main calls our
Xmaker
function.
How does Xmaker
return a struct ?
234170: 55 push rbp
234171: 48 89 e5 mov rbp,rsp
234174: 48 89 f8 mov rax,rdi
234177: c7 07 c7 01 00 00 mov DWORD PTR [rdi],0x1c7
23417d: 48 c7 47 08 f1 01 00 mov QWORD PTR [rdi+0x8],0x1f1
234184: 00
234185: 48 8b 0c 25 00 32 20 mov rcx,QWORD PTR ds:0x203200
23418c: 00
23418d: 48 89 4f 10 mov QWORD PTR [rdi+0x10],rcx
234191: 48 8b 0c 25 08 32 20 mov rcx,QWORD PTR ds:0x203208
234198: 00
234199: 48 89 4f 18 mov QWORD PTR [rdi+0x18],rcx
23419d: 48 8b 0c 25 10 32 20 mov rcx,QWORD PTR ds:0x203210
2341a4: 00
2341a5: 48 89 4f 20 mov QWORD PTR [rdi+0x20],rcx
2341a9: 48 8b 0c 25 18 32 20 mov rcx,QWORD PTR ds:0x203218
2341b0: 00
2341b1: 48 89 4f 28 mov QWORD PTR [rdi+0x28],rcx
2341b5: 5d pop rbp
2341b6: c3 ret
Xmaker
first copies the value in rdx
(the address when our struct will be stored) into rax
The first DWORD PTR [rdi], 0x1c7
, copies the decimal value 455 into the first byte of the struct. This corresponds with the Zig code in our Xmaker fn return { .x = 455, ...}
.
Next we store 0x1f1
(decimal 497) at [rdi + 0x8]
. This is because y
is of type u64
and therefore needs 8 bytes of alignment.
The next instruction : mov rcx QWORD PTR ds:0x203200
, is interesting. The registers ds
, fs
(segment registers) etc are not used in 32-bit or 64-bit modes in x86-64 (except fs
, I think which is used when you have multiple threads and each accessing a threadlocal variable).
I do not know why the compiler generated this code, but when debugging using gdb
, the value of ds
was 0
and the values at address 0x203200-203219
, which lie in the .rodata
(read-only) section of the process were also 0
.
There is a bit of optimization going on. Since our array is a nice size of 8, we can use 4 operations to copy 2 4-byte values in each instruction ( X86-64 instructions can move 64-bit (8 bytes) at a time). We set .r[n]
, .r[n+1]
in each instruction. QWORD PTR [rdi+0x20],rcx
sets rdi+32...rdi+36
to 0
Xmaker "returns" a value in the Zig code. In the generated assembly however, we passed a pointer, via rdi
to a location inside main
's stack to store our return value.
This is how a struct
return is translated into assembly
Looking to the code of main
after the call to Xmaker
:
22ceb1: 48 8d 7d a0 lea rdi,[rbp-0x60]
22ceb5: 48 8d 75 d0 lea rsi,[rbp-0x30]
22ceb9: ba 30 00 00 00 mov edx,0x30
22cebe: e8 dd 9d 00 00 call 236ca0 <memcpy>
22cec3: 48 8d 7d a0 lea rdi,[rbp-0x60]
22cec7: e8 f4 72 00 00 call 2341c0 <std.debug.print.157>
22cecc: 48 83 c4 60 add rsp,0x60
22ced0: 5d pop rbp
22ced1: c3 ret
The return value of Xmaker
is stored starting at rbp-0x30
. Here we call memcpy
to copy this entire struct into a location starting at rbp-0x60
(a little confusing, but the struct is stored from rbp-0x60
.. rbp-0x31
due to the downward growing size of the stack.)
Again, I do not know why a memcpy
is needed here. We can continue using the initial struct at rbp-0x30
as we are not modifying it and passing it as a read only value to std.debug.print
.
Again, when we want to print this struct with std.debug.print
, we pass it as an the first argument for the fn, via the rdi
index (usually, the format string "{}"
s address is sent to functions like printf
in C. In Zig, it seems to have been optimized away. Interesting.)
Once print
returns, we restore our stack pointer to the state it was at the beginning of our function, pop the base pointer and then return.
Now our struct is small enough that we can store it on the stack. What happens when we have a very big struct, with a big member , for example , like :
const X = struct { x: u32, y: u64, r: [20000]u32 };
Would this still be allocatable on the stack ? wouldn't our program crash, as the struct is too big ? Zig seems to have an interesting technique to address this (or atleast crash cleanly). Let us explore this in Part 2.
Top comments (3)
Cliffhanger ending, nice :^)
I am finding out so much stuff about how Zig works ! (On vacation this week, so will only get to finishing this up this weekend , but its so much fun !)
good, I'm really looking forward to what's next