I've been working on a parser for djot, a light markup language similar to CommonMark. The parser is written in zig, so I've named it djot.zig
. In this series of posts I'll share some of the thoughts I've had while writing it.
Note that djot.zig
is not yet finished, and the code in this post is for example only.
Posts in my Thoughts on Parsing series:
- Part 1 - Parsed Representation
- Part 2 - Read Cursors
- Part 3 - Write Cursors
I'm designing djot.zig
to have a small in memory representation once parsed. My design looks something like this at the moment:
const Document = struct {
source: []const u8,
events: []Event,
/// Where each event starts in source
event_source: []u32,
};
const Event = enum(u8) {
text,
start_paragraph,
close_paragraph,
start_list,
close_list,
start_list_item,
close_list_item,
};
This design is very much inspired by the Zig compiler's internals and data oriented design.
Thus the following markup would be parsed as so:
Hello world!
- a bullet
- another bullet
- more bullet
idx | event | src | source text |
---|---|---|---|
0 | .start_paragraph |
0 | "" |
1 | .text |
0 | "Hello, world" |
2 | .close_paragraph |
12 | "\n\n" |
3 | .start_list |
14 | "" |
4 | .start_list_item |
14 | "- " |
5 | .text |
16 | "a bullet\n" |
6 | .close_list_item |
25 | "" |
7 | .start_list_item |
25 | "- " |
8 | .text |
27 | "another bullet\n" |
9 | .close_list_item |
42 | "" |
10 | .start_list_item |
42 | "- " |
11 | .text |
44 | "more bullet" |
12 | .close_list_item |
54 | "" |
Which is 13 bytes for the events
list, and 52 bytes for the event_source
list, and 54 bytes for source
itself.
We can then turn this abstract representation into html by looping over the list of events:
pub fn toHtml(writer: anytype, doc: Document) !void {
for (doc.events) |event, i| {
switch (event) {
.text => try writer.writeAll(doc.text(i)),
.start_paragraph => try writer.writeAll("<p>"),
.close_paragraph => try writer.writeAll("</p>"),
.start_list => try writer.writeAll("<ul>"),
.close_list => try writer.writeAll("</ul>"),
.start_list_item => try writer.writeAll("<li>"),
.close_list_item => try writer.writeAll("</li>"),
}
}
}
In part 2, I'll describe a pattern I've been using while parsing, which I am calling the Cursor
pattern.
Top comments (0)