Zig NEWS

Cover image for How to Add Buffering to a Reader / Writer in Zig
Loris Cro
Loris Cro

Posted on • Updated on

How to Add Buffering to a Reader / Writer in Zig

Once you get access to an open file descriptor, you can start reading or writing to it. In this example we're going to use stdin and stdout, but the same applies to sockets, files, and any stream that offers a reader/writer interface.

Why buffer reads and writes?

Long story short: for performance.

Everytime you issue an unbuffered write/read, the program will execute syscall to make the OS perform the relative operation. Unfortunately, syscalls are slow because they have to navigate through lots of abstraction layers in the system.

On top of that, many situations will require issuing many small reads/writes. For example many parsers will try to read one token at a time. Buffering allows to batch those small read/writes, resulting in a much lower number of syscalls.

While buffering is just a pattern like many others, it's a very useful one to know as it allows to keep things clean at a high level (e.g., parser code), while drastically improving performance at a lower level with very little complexity added between the two layers.

Buffering stdout

First we need to get a handle to stdout.

const std = @import("std");

pub fn main() !void {
   const out = std.io.getStdOut();
}
Enter fullscreen mode Exit fullscreen mode

out is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its writer() method. This is what gives you access to (unbuffered) print and other similar methods (as exposed by the generic writer inferface).

var w = out.writer();
try w.print("Hello {s}!", .{"World"});
Enter fullscreen mode Exit fullscreen mode

More on writer interfaces
If you clicked the link above you should have noticed that Writer is a generic struct. Why is that?

There are multiple ways of implementing interfaces in Zig with different degrees of runtime dynamicism.

This specific implementation is concerned with two main things: knowing the set of possible errors that the stream can produce (so that then they can be included in the error set of print etc), and with the ability to pass a (correctly typed) reference to the original stream to the write function, which is the only primitive that a stream has to expose for print and all the other goodies already implemented in Writer to work.


Obtain a BufferedWriter

Buffering is implemented as a sort of wrapper around Writer and, similarly to Writer itself, its a generic type because it needs, among other things, to know the error set of the underlying Writer so that those can be added to the error set exposed by its implemenation of print etc.

BufferedWriter is implemented here and its (type) constructor has the following signature:

pub fn BufferedWriter(comptime buffer_size: usize, comptime WriterType: type) type
Enter fullscreen mode Exit fullscreen mode

Here you can see that it needs WriterType, as mentioned above, but it also needs a buffer_size. This has to be a comptime parameter because it decides the amount of stack memory that will be used to buffer the writes. This is an important detail to know:

  1. the buffer is an array on the stack
  2. the BufferedReader doesn't do dynamic allocations

At this point you could use that function directly to obtain a buffered reader, but if you look at the same same file where it's implemented, near the bottom you will see a very nice helper function that does all this work for us. I'll report here the full implementation:

pub fn bufferedWriter(underlying_stream: anytype) BufferedWriter(4096, @TypeOf(underlying_stream)) {
    return .{ .unbuffered_writer = underlying_stream };
}
Enter fullscreen mode Exit fullscreen mode

As you can see it automatically wires in the generic parameter and defaults to a 4kb buffer. Pretty handy!

Flushing

The BufferedWriter will automatically flush (i.e., issue a write syscall with the content of its buffer and empty it) when full, but it has no way of knowing when you intend to issue the last write. For this reason you need to conclude your writing session with a call to its flush method.

Writer Writer

BufferedWriter, despite its name, its not a proper Writer, but instead it just implements write() and exposes a Writer interface to gain access to the usual functionality (e.g., print). This way it can reuse the same implementation that all other writers share.

From an architectural perspective, the full "abstraction cake" looks like this:

[Writer]
   ▽
[BufferedWriter]
   ▽
[Writer]
   ▽
[Stdout]
Enter fullscreen mode Exit fullscreen mode

The final code should look like this:

const std = @import("std");

pub fn main() !void {
   const out = std.io.getStdOut();
   var buf = std.io.bufferedWriter(out.writer());

   // Get the Writer interface from BufferedWriter
   var w = buf.writer();

   try w.print("Hello {s}!", .{"World"});

   // Don't forget to flush!
   try buf.flush();
}
Enter fullscreen mode Exit fullscreen mode

About Readers

The same thing that we've seen for BufferedWriter also applies to readers: there's a Reader interface and a BufferedReader generic type implemented in std.io. The only difference is that you don't have to flush() readers.

Here's some sample code just for the sake of clarity:

const std = @import("std");

pub fn main() !void {
   const in = std.io.getStdIn();
   var buf = std.io.bufferedReader(in.reader());

   // Get the Reader interface from BufferedReader
   var r = buf.reader();

   std.debug.print("Write something: ", .{});
   // Ideally we would want to issue more than one read
   // otherwise there is no point in buffering.
   var msg_buf: [4096]u8 = undefined;
   var msg = try r.readUntilDelimiterOrEof(&msg_buf, '\n');

   if (msg) | m | {
       std.debug.print("msg: {s}\n", .{m});
    }
}
Enter fullscreen mode Exit fullscreen mode

Oldest comments (9)

Collapse
 
david_vanderson profile image
David Vanderson

Great explanation and thanks for the code examples!

Collapse
 
greenfork profile image
Dmitry Matveyev

I think there's a typo reader()->writer()

out is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its reader() method. This is what gives you access to (unbuffered) print and other similar methods (as exposed by the generic writer inferface).

Collapse
 
kristoff profile image
Loris Cro

Fixed, thank you very much!

Collapse
 
user673679 profile image
user673679

Isn't stdout usually line buffered (or fully buffered). Does zig do something to turn this buffering off, or do we now have two sets of buffering? If so, how does one flush the underlying stream?

Collapse
 
jlombera profile image
jlombera • Edited

That buffering is done by libc not the OS. So no buffering if you write directly to stdout's OS file descriptor. In C, you'll observe the buffered IO if you use printf(3)/fprintf(3)/fwrite(3), but not if you used write(2) directly.

Collapse
 
btree profile image
btree • Edited

Thanks, Loris.
I wonder what are the scenarios for using buffered writer / reader. Looks like it shouldn't be always recommended. E.g., when the underlying Writer / Reader already provide a buffer argument at creation time [1], it seems there is 0 performance gain by wrapper a buffered writer / reader on top of it. In fact, it adds additional memory copy which slows things down. Is my understanding correct? Thanks.

[1] github.com/fengb/zig-https-example...

Collapse
 
kristoff profile image
Loris Cro

If the underlying stream is already buffered then of course adding more buffering will not improve things. One practical example that I've encountered today where buffering made a big difference is when serializing a struct into JSON.

I'm working on a piece of the Zig compiler that reads source code and automatically generates docs from it. The job mostly consists in collecting type information an then serializing it into a JSON file. The original code serialized (using std.json) directly into a File.Writer and took about 10s to do the job when run on the Zig standard library. After the change the serialization step took less than 1 second.

Collapse
 
jiacai2050 profile image
Jiacai Liu • Edited

Thanks for share.

I'm implement a loc cli program, after wrap file.reader() inside bufferedReader, sys time dropped from 0m1.944s to 0m0.055s

Before

time ./zig-out/bin/loc ~/code/rust-analyzer/
Language   Files Lines  Code   Comment Blank 
---------- ----- ------ ------ ------- ----- 
Rust       1053  267266 221082 20476   25708 
Markdown   15    4119   2979   3       1137  
HTML       14    1247   1108   0       139   
JavaScript 2     184    160    9       15    

real    0m2.918s
user    0m0.863s
sys 0m1.944s
Enter fullscreen mode Exit fullscreen mode

After

time ./zig-out/bin/loc ~/code/rust-analyzer/
Language   Files Lines  Code   Comment Blank 
---------- ----- ------ ------ ------- ----- 
Rust       1053  267266 221082 20476   25708 
Markdown   15    4119   2979   3       1137  
HTML       14    1247   1108   0       139   
JavaScript 2     184    160    9       15    

real    0m0.719s
user    0m0.119s
sys 0m0.055s
Enter fullscreen mode Exit fullscreen mode

Will open source this cli after some tidy-up

Collapse
 
agagniere profile image
Antoine Gagniere

I was wondering about that and stumbled upon you article after having written code snippets to compare the number of calls to read of various implementation : gist.github.com/agagniere/41b26032...

Couldn't an alias to buffered stdin be added ? Like std.io.getStdInBuffered() or something. It would help promote the usage of this good practice.