Loris Cro

Posted on Aug 5, 2021 • Updated on Aug 12, 2021

How to Add Buffering to a Reader / Writer in Zig

#beginners #tutorial

Once you get access to an open file descriptor, you can start reading or writing to it. In this example we're going to use stdin and stdout, but the same applies to sockets, files, and any stream that offers a reader/writer interface.

Why buffer reads and writes?

Long story short: for performance.

Everytime you issue an unbuffered write/read, the program will execute syscall to make the OS perform the relative operation. Unfortunately, syscalls are slow because they have to navigate through lots of abstraction layers in the system.

On top of that, many situations will require issuing many small reads/writes. For example many parsers will try to read one token at a time. Buffering allows to batch those small read/writes, resulting in a much lower number of syscalls.

While buffering is just a pattern like many others, it's a very useful one to know as it allows to keep things clean at a high level (e.g., parser code), while drastically improving performance at a lower level with very little complexity added between the two layers.

Buffering stdout

First we need to get a handle to stdout.

const std = @import("std");

pub fn main() !void {
   const out = std.io.getStdOut();
}

out is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its writer() method. This is what gives you access to (unbuffered) print and other similar methods (as exposed by the generic writer inferface).

var w = out.writer();
try w.print("Hello {s}!", .{"World"});

Obtain a BufferedWriter

Buffering is implemented as a sort of wrapper around Writer and, similarly to Writer itself, its a generic type because it needs, among other things, to know the error set of the underlying Writer so that those can be added to the error set exposed by its implemenation of print etc.

BufferedWriter is implemented here and its (type) constructor has the following signature:

pub fn BufferedWriter(comptime buffer_size: usize, comptime WriterType: type) type

Here you can see that it needs WriterType, as mentioned above, but it also needs a buffer_size. This has to be a comptime parameter because it decides the amount of stack memory that will be used to buffer the writes. This is an important detail to know:

the buffer is an array on the stack
the BufferedReader doesn't do dynamic allocations

At this point you could use that function directly to obtain a buffered reader, but if you look at the same same file where it's implemented, near the bottom you will see a very nice helper function that does all this work for us. I'll report here the full implementation:

pub fn bufferedWriter(underlying_stream: anytype) BufferedWriter(4096, @TypeOf(underlying_stream)) {
    return .{ .unbuffered_writer = underlying_stream };
}

As you can see it automatically wires in the generic parameter and defaults to a 4kb buffer. Pretty handy!

Flushing

The BufferedWriter will automatically flush (i.e., issue a write syscall with the content of its buffer and empty it) when full, but it has no way of knowing when you intend to issue the last write. For this reason you need to conclude your writing session with a call to its flush method.

Writer Writer

BufferedWriter, despite its name, its not a proper Writer, but instead it just implements write() and exposes a Writer interface to gain access to the usual functionality (e.g., print). This way it can reuse the same implementation that all other writers share.

From an architectural perspective, the full "abstraction cake" looks like this:

[Writer]
   ▽
[BufferedWriter]
   ▽
[Writer]
   ▽
[Stdout]

The final code should look like this:

const std = @import("std");

pub fn main() !void {
   const out = std.io.getStdOut();
   var buf = std.io.bufferedWriter(out.writer());

   // Get the Writer interface from BufferedWriter
   var w = buf.writer();

   try w.print("Hello {s}!", .{"World"});

   // Don't forget to flush!
   try buf.flush();
}

About Readers

The same thing that we've seen for BufferedWriter also applies to readers: there's a Reader interface and a BufferedReader generic type implemented in std.io. The only difference is that you don't have to flush() readers.

Here's some sample code just for the sake of clarity:

const std = @import("std");

pub fn main() !void {
   const in = std.io.getStdIn();
   var buf = std.io.bufferedReader(in.reader());

   // Get the Reader interface from BufferedReader
   var r = buf.reader();

   std.debug.print("Write something: ", .{});
   // Ideally we would want to issue more than one read
   // otherwise there is no point in buffering.
   var msg_buf: [4096]u8 = undefined;
   var msg = try r.readUntilDelimiterOrEof(&msg_buf, '\n');

   if (msg) | m | {
       std.debug.print("msg: {s}\n", .{m});
    }
}

Oldest comments (9)

David Vanderson • Aug 6 '21

Great explanation and thanks for the code examples!

Dmitry Matveyev • Aug 12 '21

I think there's a typo reader()->writer()

out is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its reader() method. This is what gives you access to (unbuffered) print and other similar methods (as exposed by the generic writer inferface).

Loris Cro • Aug 12 '21

Fixed, thank you very much!

user673679 • Nov 20 '21

Isn't stdout usually line buffered (or fully buffered). Does zig do something to turn this buffering off, or do we now have two sets of buffering? If so, how does one flush the underlying stream?

jlombera • Feb 13 '22 • Edited

That buffering is done by libc not the OS. So no buffering if you write directly to stdout's OS file descriptor. In C, you'll observe the buffered IO if you use printf(3)/fprintf(3)/fwrite(3), but not if you used write(2) directly.

btree • May 30 '22 • Edited

Thanks, Loris.
I wonder what are the scenarios for using buffered writer / reader. Looks like it shouldn't be always recommended. E.g., when the underlying Writer / Reader already provide a buffer argument at creation time [1], it seems there is 0 performance gain by wrapper a buffered writer / reader on top of it. In fact, it adds additional memory copy which slows things down. Is my understanding correct? Thanks.

[1] github.com/fengb/zig-https-example...

Loris Cro • May 31 '22

If the underlying stream is already buffered then of course adding more buffering will not improve things. One practical example that I've encountered today where buffering made a big difference is when serializing a struct into JSON.

I'm working on a piece of the Zig compiler that reads source code and automatically generates docs from it. The job mostly consists in collecting type information an then serializing it into a JSON file. The original code serialized (using std.json) directly into a File.Writer and took about 10s to do the job when run on the Zig standard library. After the change the serialization step took less than 1 second.

Jiacai Liu • Sep 20 '22 • Edited

Thanks for share.

I'm implement a loc cli program, after wrap file.reader() inside bufferedReader, sys time dropped from 0m1.944s to 0m0.055s

Before

time ./zig-out/bin/loc ~/code/rust-analyzer/
Language   Files Lines  Code   Comment Blank 
---------- ----- ------ ------ ------- ----- 
Rust       1053  267266 221082 20476   25708 
Markdown   15    4119   2979   3       1137  
HTML       14    1247   1108   0       139   
JavaScript 2     184    160    9       15    

real    0m2.918s
user    0m0.863s
sys 0m1.944s

After

time ./zig-out/bin/loc ~/code/rust-analyzer/
Language   Files Lines  Code   Comment Blank 
---------- ----- ------ ------ ------- ----- 
Rust       1053  267266 221082 20476   25708 
Markdown   15    4119   2979   3       1137  
HTML       14    1247   1108   0       139   
JavaScript 2     184    160    9       15    

real    0m0.719s
user    0m0.119s
sys 0m0.055s

Will open source this cli after some tidy-up

Antoine Gagniere • Dec 2 '23

I was wondering about that and stumbled upon you article after having written code snippets to compare the number of calls to read of various implementation : gist.github.com/agagniere/41b26032...

Couldn't an alias to buffered stdin be added ? Like std.io.getStdInBuffered() or something. It would help promote the usage of this good practice.

Zig NEWS

How to Add Buffering to a Reader / Writer in Zig

Why buffer reads and writes?

Buffering stdout

Obtain a BufferedWriter

Flushing

Writer Writer

About Readers

Oldest comments (9)

Before

After

Read next

build.zig.zon dependency hashes

Andrew Kelly's talk at goto: Zig Build System & How to Build Software From Source

Puzzler: Impossible slice?

To SIMD and beyond: Optimizing a simple comparison routine