Zig NEWS

David Sugar
David Sugar

Posted on • Updated on

zbor - a CBOR en-/ decoder

I've been toying around with WebAuthn for a couple of months. For those who have never heard of it, it's part of the FIDO2 standard and used for registering and authenticating users to a Relying Party (Server) using public key cryptography. Some say it's the next big thing and will eradicate passwords once and for all. I'd say you trade one problem for another, but maybe the new one is better from an IT-Security perspective. But that's not what I want to talk about.

Some data like the attestationObject, which is part of the credential object returned from the create() function call, is encoded using the Concise Binary Object Representation (CBOR). If you want to read more about WebAuthn, I recommend the WebAuthn guide by Suby Raman. During my tests, I found that the C++ JSON library with CBOR support I used has issues decoding captured credential objects, so I had three options.

  1. read through the source code and find the problem.
  2. try out another library.
  3. use this new programming language, I wanted to try for a while now and implement a CBOR en-/decoder that fits my needs.

As you might have guessed, I chose the third option. You can find the project on Github.

zbor

In general CBOR encodes data in so called data items. A data item can contain zero, one or more nested data items, and each belongs to one of seven major types (unsigned integer, signed integer, byte string, text string, data item array, (key, value) map, tagged data item or simple value). The library reflects this structure using a tagged union (DataItem) where each field is associated with one of the seven major types.

/// A single piece of CBOR data.
///
/// The structure of a DataItem may contain zero, one, or more nested DataItems.
pub const DataItem = union(DataItemTag) {
        /// Major type 0 and 1: An integer in the range -2^64..2^64-1
    int: i128,
    /// Major type 2: A byte string.
    bytes: []u8,
    /// Major type 3: A text string encoded as utf-8.
    text: []u8,
    /// Major type 4: An array of data items.
    array: []DataItem,
    /// Major type 5: A map of pairs of data items.
    map: []Pair,
    /// Major type 6: A tagged data item.
    tag: Tag,
    /// Major type 7: IEEE 754 Half-, Single-, or Double-Precision float.
    float: Float,
    /// Major type 7: Simple value [false, true, null]
    simple: SimpleValue,

    // more code...
}
Enter fullscreen mode Exit fullscreen mode

The three most significant bits of the first byte of a data item encode the major type and the remaining 5 bits give additional information, usually how to load an unsigned integer that represents a value (major type 0, 1 and 7) or a size (major type 2, 3, 4 and 5).

The unsigned integer 65536 is for example encoded as 1a 00 01 00 00. 1a = 0001 1010 can be split into major type 000 = 0 and additional information 11010 = 26, i.e. the next four bytes encode an unsigned integer (see CBOR encoding). Note that CTAP2 canonical CBOR encoding requires that integers are encoded as small as possible.

If you want to encode 65536 using the library you can define a DataItem and pass it to the encode() function. This function will return a std.ArrayList(u8) containing the CBOR byte string on success, or an CborError otherwise.

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();

const data_item = DataItem.int(65536);
const cbor = try encode(allocator, &data_item);
defer cbor.deinit();

try std.testing.expectEqualSlices(u8, &.{ 0x1a, 0x00, 0x01, 0x00, 0x00 }, cbor.items);
Enter fullscreen mode Exit fullscreen mode

To decode a CBOR byte string just use the decode() function, which will return a (nested) DataItem on success.

const di = try decode(allocator, cbor);
// Always use deinit() to free the allocated memory of all
// (nested) data items.
defer di.deinit(allocator);
Enter fullscreen mode Exit fullscreen mode

This is just a trivial example but you can also decode more complex CBOR data like a attestationObject. If you're interested, please take a look at the zbor repo.

current state

Encoder and decoder support most of the types defined by RFC 8949. The next thing I want to add is serialization from and to JSON. If you have any suggestions feel free to open a issue or just leave a comment :).

Edit - 29.07.2022

Loris Cro suggested slices instead of std.Arraylis. He told me about the approach to ask the user to provide an Allocator to any function call that might need it. Because the same Allocator is usually used for every (nested) DataItem, the approach seemed to fit quite well with my use case and I adopted it.

Top comments (6)

Collapse
 
kristoff profile image
Loris Cro

Thank you for the write up, something for your consideration: maybe some of those types represented as ArrayLists could be slices instead. That, generally speaking, would be more appropriate for a "dumb" data type.

Collapse
 
r4gus profile image
David Sugar

Hi, thanks for you comment. I've thought about that too (at least for the u8 ArrayLists).

The one thing I want to avoid are dangling pointers, e.g. a byte (mt 2) u8 slice that references some section of a CBOR byte string which gets freed before the referencing DataItem gets freed. I'd argue that that's not an unlikely scenario.

To avoid this I need to allocate a unknown amount of memory at runtime, copy the bytes and reference the allocated memory section with my slice. If DataItem were a struct I could just add an allocator member with a default value (e.g. GPA) but DataItem is a tagged union, so I need to store the used Allocator with each union field that requires one. Because a tagged union feels quite 'natural' as type for DataItem, I would be reluctant to deviate from it.

So I need to store an Allocator for each field, allocate the memory myself, and copy the data from one place to another. A ArrayList offers that too.

I'm new to Zig and read through ziglearn.org and the Zig Language Reference but couldn't find anything that would hint me to a satisfying solution. If you know of a example or blog post that deals with something similar I'd be very grateful if you could comment with a link below.

Sry for the lengthy answer and thanks for this awesome platform :)

Collapse
 
kristoff profile image
Loris Cro

One approach adopted by some stdlib APIs is to never store the allocator anywhere and ask the user to provide it to any function call that might need it. This way you also avoid paying the (admittedly small) price of storing mutliple copies of the same allocator interface.

Anyway I just wanted to point out some potentially non-obvious design choices that can make sense in Zig.

Thread Thread
 
r4gus profile image
David Sugar

I really like that idea. Falls apart as soon as one uses more than one allocator for a nested data structure, but should fit my use case. Thanks for the suggestion.

Collapse
 
lisael profile image
Lisael • Edited

I've been working on a CBOR encoder/decoder too and it's a fun project. Yours seems more advanced, I may help (if I find time to work on the project that needed CBOR in the first place :D).

One pain point about CBOR is extentions. There are a lot of them, some are widely used, meaning that any implementation that don't support them is almost never usable. The most problematic one is #25 (reference the nth previously seen string) which is:

  1. useful (compress the data, a lot sometimes)
  2. used everywhere
  3. Forces to switch to a context-aware parser (unbound space and time, bleh!) :(

Another question I had at the time is which encoding/decoding API should the lib expose? Doesn't Zig need a common interface for serialization/deserialization à la Rust's serde?

(EDIT: #29 -> #25)

Collapse
 
r4gus profile image
David Sugar

Hi, thanks for your comment :)

To be honest, I haven't thought about adding extensions so far because my main goal was to work with CBOR data used in CTAP2/ WebAuthn (FIDO2). Due to resource limitations data used with FIDO2 is only allowed to use a subset of the RFC specification (e.g. items with unknown length are not permitted).

To your question: I stuck with the decode(cbor)/ encode(data) api most implementations use. Another approach would be to mimic the JSON api. I want to add serialization from and to structs in the future so I'll probably implement something like cbor.stringify(my_struct, .{options}, string.writer()) and cbor.parse(my_struct, &stream, .{}). I don't think there is an official interface pattern to use at the moment but xyz.stringify and xyz.parse are probably what comes closest. But I'd ask on the discord server to be sure.

Feel free to open a pull request if you want to contribute.