Once you get access to an open file descriptor, you can start reading or writing to it. In this example we're going to use stdin and stdout, but the same applies to sockets, files, and any stream that offers a reader/writer interface.
Why buffer reads and writes?
Long story short: for performance.
Everytime you issue an unbuffered write/read, the program will execute syscall to make the OS perform the relative operation. Unfortunately, syscalls are slow because they have to navigate through lots of abstraction layers in the system.
On top of that, many situations will require issuing many small reads/writes. For example many parsers will try to read one token at a time. Buffering allows to batch those small read/writes, resulting in a much lower number of syscalls.
While buffering is just a pattern like many others, it's a very useful one to know as it allows to keep things clean at a high level (e.g., parser code), while drastically improving performance at a lower level with very little complexity added between the two layers.
Buffering stdout
First we need to get a handle to stdout.
const std = @import("std");
pub fn main() !void {
const out = std.io.getStdOut();
}
out
is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its writer()
method. This is what gives you access to (unbuffered) print
and other similar methods (as exposed by the generic writer inferface).
var w = out.writer();
try w.print("Hello {s}!", .{"World"});
There are multiple ways of implementing interfaces in Zig with different degrees of runtime dynamicism. This specific implementation is concerned with two main things: knowing the set of possible errors that the stream can produce (so that then they can be included in the error set of More on writer interfaces
If you clicked the link above you should have noticed that Writer
is a generic struct. Why is that?
print
etc), and with the ability to pass a (correctly typed) reference to the original stream to the write
function, which is the only primitive that a stream has to expose for print
and all the other goodies already implemented in Writer
to work.
Obtain a BufferedWriter
Buffering is implemented as a sort of wrapper around Writer
and, similarly to Writer
itself, its a generic type because it needs, among other things, to know the error set of the underlying Writer
so that those can be added to the error set exposed by its implemenation of print
etc.
BufferedWriter
is implemented here and its (type) constructor has the following signature:
pub fn BufferedWriter(comptime buffer_size: usize, comptime WriterType: type) type
Here you can see that it needs WriterType
, as mentioned above, but it also needs a buffer_size
. This has to be a comptime parameter because it decides the amount of stack memory that will be used to buffer the writes. This is an important detail to know:
- the buffer is an array on the stack
- the
BufferedReader
doesn't do dynamic allocations
At this point you could use that function directly to obtain a buffered reader, but if you look at the same same file where it's implemented, near the bottom you will see a very nice helper function that does all this work for us. I'll report here the full implementation:
pub fn bufferedWriter(underlying_stream: anytype) BufferedWriter(4096, @TypeOf(underlying_stream)) {
return .{ .unbuffered_writer = underlying_stream };
}
As you can see it automatically wires in the generic parameter and defaults to a 4kb buffer. Pretty handy!
Flushing
The BufferedWriter
will automatically flush (i.e., issue a write syscall with the content of its buffer and empty it) when full, but it has no way of knowing when you intend to issue the last write. For this reason you need to conclude your writing session with a call to its flush
method.
Writer Writer
BufferedWriter
, despite its name, its not a proper Writer
, but instead it just implements write()
and exposes a Writer
interface to gain access to the usual functionality (e.g., print
). This way it can reuse the same implementation that all other writers share.
From an architectural perspective, the full "abstraction cake" looks like this:
[Writer]
▽
[BufferedWriter]
▽
[Writer]
▽
[Stdout]
The final code should look like this:
const std = @import("std");
pub fn main() !void {
const out = std.io.getStdOut();
var buf = std.io.bufferedWriter(out.writer());
// Get the Writer interface from BufferedWriter
var w = buf.writer();
try w.print("Hello {s}!", .{"World"});
// Don't forget to flush!
try buf.flush();
}
About Readers
The same thing that we've seen for BufferedWriter
also applies to readers: there's a Reader
interface and a BufferedReader
generic type implemented in std.io
. The only difference is that you don't have to flush()
readers.
Here's some sample code just for the sake of clarity:
const std = @import("std");
pub fn main() !void {
const in = std.io.getStdIn();
var buf = std.io.bufferedReader(in.reader());
// Get the Reader interface from BufferedReader
var r = buf.reader();
std.debug.print("Write something: ", .{});
// Ideally we would want to issue more than one read
// otherwise there is no point in buffering.
var msg_buf: [4096]u8 = undefined;
var msg = try r.readUntilDelimiterOrEof(&msg_buf, '\n');
if (msg) | m | {
std.debug.print("msg: {s}\n", .{m});
}
}
Top comments (9)
Thanks for share.
I'm implement a loc cli program, after wrap file.reader() inside bufferedReader, sys time dropped from 0m1.944s to 0m0.055s
Before
After
Will open source this cli after some tidy-up
Thanks, Loris.
I wonder what are the scenarios for using buffered writer / reader. Looks like it shouldn't be always recommended. E.g., when the underlying Writer / Reader already provide a
buffer
argument at creation time [1], it seems there is 0 performance gain by wrapper a buffered writer / reader on top of it. In fact, it adds additional memory copy which slows things down. Is my understanding correct? Thanks.[1] github.com/fengb/zig-https-example...
If the underlying stream is already buffered then of course adding more buffering will not improve things. One practical example that I've encountered today where buffering made a big difference is when serializing a struct into JSON.
I'm working on a piece of the Zig compiler that reads source code and automatically generates docs from it. The job mostly consists in collecting type information an then serializing it into a JSON file. The original code serialized (using std.json) directly into a File.Writer and took about 10s to do the job when run on the Zig standard library. After the change the serialization step took less than 1 second.
I think there's a typo reader()->writer()
out is one specific type of thing that can be written to. You can obtain a unified writer interface by calling its reader() method. This is what gives you access to (unbuffered) print and other similar methods (as exposed by the generic writer inferface).
Fixed, thank you very much!
I was wondering about that and stumbled upon you article after having written code snippets to compare the number of calls to read of various implementation : gist.github.com/agagniere/41b26032...
Couldn't an alias to buffered stdin be added ? Like
std.io.getStdInBuffered()
or something. It would help promote the usage of this good practice.Great explanation and thanks for the code examples!
Isn't stdout usually line buffered (or fully buffered). Does zig do something to turn this buffering off, or do we now have two sets of buffering? If so, how does one flush the underlying stream?
That buffering is done by libc not the OS. So no buffering if you write directly to stdout's OS file descriptor. In C, you'll observe the buffered IO if you use
printf(3)
/fprintf(3)
/fwrite(3)
, but not if you usedwrite(2)
directly.