Zig NEWS

LI Yu
LI Yu

Posted on

`jstring.zig`, my javascript inspired string lib with excellent Regex support

Share with you this handy string lib I created. jstring.zig

GitHub logo liyu1981 / jstring.zig

a reusable string lib for myself with all familiar methods methods can find in javascript string

jstring.zig

Target: create a reusable string lib for myself with all familiar methods methods can find in javascript string.

Reason:

  1. string is important we all know, so a good string lib will be very useful.
  2. javascript string is (in my opinion) the most battle tested string library out there, strike a good balance between features and complexity.

The javascript string specs and methods this file use as reference can be found at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String

All methods except those marked as deprecated (such as anchor, big, blink etc) are implemented, in zig way.

integration with PCRE2 regex

One highlight of jstring.zig is that it integrates with PCRE2 to provide match, match_all and more just like the familar feeling of javascript string.

here are some examples of how regex can be used

var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world")
var results = try str1.
Enter fullscreen mode Exit fullscreen mode

As the name assumed, it is a string lib inspired by Javascript (ECMA Script, precisely). The target is to get all methods specified at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String implemented, except those marked as deprecated (such as anchor, big, blink etc).

Highlight: excellen Regex support with help from PCRE2

see some examples on how it is supported

var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world");
var results = try str1.splitByRegex(arena.allocator(), "l+", 0, 0);
try testing.expectEqual(results.len, 1);
try testing.expect(results[0].eqlSlice("hello,hello,world"));
results = try str1.splitByRegex(arena.allocator(), "l+", 0, -1);
try testing.expectEqual(results.len, 4);
try testing.expect(results[0].eqlSlice("he"));
try testing.expect(results[1].eqlSlice("o,he"));
try testing.expect(results[2].eqlSlice("o,wor"));
try testing.expect(results[3].eqlSlice("d"));
Enter fullscreen mode Exit fullscreen mode

or

var re = try RegexUnmanaged.init(arena.allocator(), "(hi,)(?<h>hel+o?)", 0);
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
try re.reset(arena.allocator());
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
const match_results = re.getResults();
const group_results = re.getGroupResults();
_ = group_results;
if (match_results) |mrs| {
   try testing.expectEqual(mrs[0].start, 0);
   try testing.expectEqual(mrs[0].len, 8);
}
var it = re.getGroupResultsIterator("hi,hello");
    var maybe_r = it.nextResult();
    try testing.expect(maybe_r != null);
    if (maybe_r) |r| {
        try testing.expect(std.mem.eql(u8, r.name, ""));
        try testing.expectEqual(r.start, 0);
     }
     maybe_r = it.nextResult();
     try testing.expect(maybe_r != null);
     if (maybe_r) |r| {
         try testing.expect(std.mem.eql(u8, r.name, "h"));
         try testing.expectEqual(r.start, 3);
}
Enter fullscreen mode Exit fullscreen mode

Use it in your project

jstring.zig can be used with zig pkg manager like below

zig fetch --save https://github.com/liyu1981/jstring.zig/archive/refs/tags/0.1.0.tar.gz
Enter fullscreen mode Exit fullscreen mode

and because it has integrated PCRE2, when build with it, you will need enable PCRE2 linkage. jstring.zig also provided build time module for getting this part really easily done, like below

// in your build.zig
const jstring_build = @import("jstring");
...
const jstring_dep = b.dependency("jstring", .{});
exe.addModule("jstring", jstring_dep.module("jstring"));
jstring_build.linkPCRE(exe, jstring_dep);
Enter fullscreen mode Exit fullscreen mode

How about the performance?

jstring.zig is designed with performance in mind, and it should approach bare []const u8 as much as it can. Though the benchmark part is still work-in-progress. But my early test shows that, jstring.zig outperforms C++'s std:string, ~70% faster.

benchmark % ./zig-out/bin/benchmark
|zig create/release: | [ooooo] | avg=    16464000ns | min=    14400000ns | max=    20975000ns |
|cpp create/release: | [ooooo] | avg=    56735400ns | min=    56137000ns | max=    57090000ns |
Enter fullscreen mode Exit fullscreen mode

(test is done by randomly allocating/releasing 1M short/long strings).

but I want to attribute the credits to zig, not jstring.zig, because zig is really a cache friendly language, and very easy to get your program fast!

coverage

One of my goal when build this lib (as a practice) is to push the coverage to max. And in jstring.zig, the coverage is 100% on both the zig and c code. Take a look at the report here.

it can be used as a single file lib too

just copy jstring.zig to your project will do the job too. For Regex support it can be turned off by modifying the comptime var enable_pcre in the beginning of file. Though without pcre support, it still has excellent performance, and even has built in KMP fast search algorithm implemented. KMP will be very useful if you need use jstring to search high repeating strings (like scientific data), it will turn O(n^2) time to O(n) time.

zig doc

browsing zig doc of jstring here.

Top comments (2)

Collapse
 
kristoff profile image
Loris Cro

Thank you for sharing.

One point for you to consider: don't use anyerror for the return value of your functions as that loses information about which errors can be returned. For example some functions return an index out of bounds error, while others allocate so can return error.OutOfMemory. With anyerror all this information is lost.

Personally I like dealing with strings in Zig as byte arrays and use ziglyph / zigstr when I need full unicode processing (ie dealing with grapheme clusters, etc).

codeberg.org/dude_the_builder/ziglyph

Collapse
 
liyu1981 profile image
LI Yu

thanks for the suggestion. Definitely I will work on that. As I wrote in another blog, anyerror is the last resort to use, which I only learned after writing this one.

for unicode, thanks to the excellent underlying support from zig, it is a not hard thing, so I believe my lib can deal with them too. And it can also deal with unicode regex match or search, there are some examples in tests of jstring.zig src