Zig NEWS

Cover image for New way to split and iterate over strings
Pyrolistical
Pyrolistical

Posted on • Updated on

New way to split and iterate over strings

std.mem.window has been merged!

std.mem.split is very useful when there is a known delimiter, but there is no easy way to split a buffer every N items.

Manually implementing this for every 3 items looks like:

const buffer = "abcdefg";
var i: usize = 0;
const size  = 3;
while (size < buffer.len) : (i += size) {
  const end = @min(i + size, buffer.len);
  const slice = buffer[i..end];
  ...slice is "abc", "def", "g"
}
Enter fullscreen mode Exit fullscreen mode

std.mem.window simplifies that to:

const buffer = "abcdefg";
var it = std.mem.window(u8, buffer, 3, 3);
while (it.next()) |slice| {
  ...slice is "abc", "def", "g"
}
Enter fullscreen mode Exit fullscreen mode

a 3 width sliding window covering "abc", "def", "g"

But there's more! This isn't named splitEvery as std.mem.window is more powerful. It takes in both a size and advance parameter. When they are equal, it is the same as splitEvery.

By choosing an advance smaller than size we get a sliding window:

const buffer = "abcdefg";
var it = std.mem.window(u8, buffer, 3, 1);
while (it.next()) |slice| {
  ...slice is "abc", "bcd", "cde", "def", "efg"
}
Enter fullscreen mode Exit fullscreen mode

a 3 width sliding window covering "abc", "bcd", "cde", "def", "efg"

Going the other way, we can pick out every Nth element. For example, if we only want the items with an even index:

const buffer = "abcdefg";
var it = std.mem.window(u8, buffer, 1, 2);
while (it.next()) |slice| {
  ...slice is "a", "c", "e", "g"
}
Enter fullscreen mode Exit fullscreen mode

a 1 width sliding window covering "a", "c", "e", "g"

Latest comments (4)

Collapse
 
jpl profile image
Jean-Pierre • Edited

linux zig 0.10.0 error: root struct of file 'mem' has no member named 'window'
var it = std.mem.window(u8, buffer, 1, 1);

const buffer = "àéç";
var it = std.mem.window(u8, buffer, 1, 1);
while (it.next()) |slice| {
std.debug.print("value:{any}",.{slice});
}

Only works with the master version,
on the other hand does not support UTF8
only american ascii 128
too bad, because we are not far from Rune de nim-lang

Collapse
 
lisael profile image
Lisael

It's a low level memory operation. It's just a sliding window along an array in memory. It's presented in examples as a string tool (as [_]const u8 are the easiest array to create in a small example snippet), but it's really not. Low level memory operations are useless when dealing with real world strings (except that they may be blocks to build higher-level ops).

What you want is github.com/JakubSzark/zig-string that does exactly that.

Collapse
 
jpl profile image
Jean-Pierre

zig-string isn't bad, but it's missing some stuff.

Collapse
 
jpl profile image
Jean-Pierre • Edited

hello, it works with UTF8 ex: here éçà...???