Share with you this handy string lib I created. jstring.zig
liyu1981 / jstring.zig
a reusable string lib for myself with all familiar methods methods can find in javascript string
jstring.zig
Target: create a reusable string lib for myself with all familiar methods methods can find in javascript string.
Reason:
- string is important we all know, so a good string lib will be very useful.
- javascript string is (in my opinion) the most battle tested string library out there, strike a good balance between features and complexity.
The javascript string specs and methods this file use as reference can be found at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String
All methods except those marked as deprecated (such as anchor, big, blink etc) are implemented, in zig way.
integration with PCRE2 regex
One highlight of jstring.zig
is that it integrates with PCRE2 to provide match
, match_all
and more just like the familar feeling of javascript string.
here are some examples of how regex can be used
var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world")
var results = try str1.
…As the name assumed, it is a string lib inspired by Javascript (ECMA Script, precisely). The target is to get all methods specified at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String implemented, except those marked as deprecated (such as anchor, big, blink etc).
Highlight: excellen Regex support with help from PCRE2
see some examples on how it is supported
var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world");
var results = try str1.splitByRegex(arena.allocator(), "l+", 0, 0);
try testing.expectEqual(results.len, 1);
try testing.expect(results[0].eqlSlice("hello,hello,world"));
results = try str1.splitByRegex(arena.allocator(), "l+", 0, -1);
try testing.expectEqual(results.len, 4);
try testing.expect(results[0].eqlSlice("he"));
try testing.expect(results[1].eqlSlice("o,he"));
try testing.expect(results[2].eqlSlice("o,wor"));
try testing.expect(results[3].eqlSlice("d"));
or
var re = try RegexUnmanaged.init(arena.allocator(), "(hi,)(?<h>hel+o?)", 0);
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
try re.reset(arena.allocator());
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
const match_results = re.getResults();
const group_results = re.getGroupResults();
_ = group_results;
if (match_results) |mrs| {
try testing.expectEqual(mrs[0].start, 0);
try testing.expectEqual(mrs[0].len, 8);
}
var it = re.getGroupResultsIterator("hi,hello");
var maybe_r = it.nextResult();
try testing.expect(maybe_r != null);
if (maybe_r) |r| {
try testing.expect(std.mem.eql(u8, r.name, ""));
try testing.expectEqual(r.start, 0);
}
maybe_r = it.nextResult();
try testing.expect(maybe_r != null);
if (maybe_r) |r| {
try testing.expect(std.mem.eql(u8, r.name, "h"));
try testing.expectEqual(r.start, 3);
}
Use it in your project
jstring.zig
can be used with zig pkg manager
like below
zig fetch --save https://github.com/liyu1981/jstring.zig/archive/refs/tags/0.1.0.tar.gz
and because it has integrated PCRE2
, when build with it, you will need enable PCRE2
linkage. jstring.zig
also provided build time module for getting this part really easily done, like below
// in your build.zig
const jstring_build = @import("jstring");
...
const jstring_dep = b.dependency("jstring", .{});
exe.addModule("jstring", jstring_dep.module("jstring"));
jstring_build.linkPCRE(exe, jstring_dep);
How about the performance?
jstring.zig
is designed with performance in mind, and it should approach bare []const u8
as much as it can. Though the benchmark part is still work-in-progress. But my early test shows that, jstring.zig
outperforms C++'s std:string
, ~70% faster.
benchmark % ./zig-out/bin/benchmark
|zig create/release: | [ooooo] | avg= 16464000ns | min= 14400000ns | max= 20975000ns |
|cpp create/release: | [ooooo] | avg= 56735400ns | min= 56137000ns | max= 57090000ns |
(test is done by randomly allocating/releasing 1M short/long strings).
but I want to attribute the credits to zig
, not jstring.zig
, because zig
is really a cache friendly language, and very easy to get your program fast!
coverage
One of my goal when build this lib (as a practice) is to push the coverage to max. And in jstring.zig
, the coverage is 100% on both the zig
and c
code. Take a look at the report here.
it can be used as a single file lib too
just copy jstring.zig
to your project will do the job too. For Regex
support it can be turned off by modifying the comptime var enable_pcre
in the beginning of file. Though without pcre
support, it still has excellent performance, and even has built in KMP
fast search algorithm implemented. KMP
will be very useful if you need use jstring
to search high repeating strings (like scientific data), it will turn O(n^2)
time to O(n)
time.
zig
doc
browsing zig
doc of jstring
here.
Top comments (2)
Thank you for sharing.
One point for you to consider: don't use
anyerror
for the return value of your functions as that loses information about which errors can be returned. For example some functions return an index out of bounds error, while others allocate so can returnerror.OutOfMemory
. Withanyerror
all this information is lost.Personally I like dealing with strings in Zig as byte arrays and use ziglyph / zigstr when I need full unicode processing (ie dealing with grapheme clusters, etc).
codeberg.org/dude_the_builder/ziglyph
thanks for the suggestion. Definitely I will work on that. As I wrote in another blog,
anyerror
is the last resort to use, which I only learned after writing this one.for unicode, thanks to the excellent underlying support from zig, it is a not hard thing, so I believe my lib can deal with them too. And it can also deal with unicode regex match or search, there are some examples in tests of jstring.zig src