Deciphering Many-Item Types. The skeleton.

#learn

Abstract

In the Zig language, when working with a sequence of objects in memory of the same type, one may encounter various built-in types. It is quite difficult to understand how these types are related when you read the language documentation. This article discusses the results of the author's exploration of types in Zig.

Types

In the context of the u8 single item type, there is a variety of structures:

Array[n]u8. This type corresponds to arrays seen in other languages.
Sentinel-terminated array [n:0]u8. Array which has a sentinel element of value 0 at the index corresponding to len (0 can be replaced by other values of u8).
Many-item pointer [*]u8. This type is simply a pointer.
Sentinel-terminated pointer [*:0]u8. Such a pointer points to a sequence that ends with a specific value (0 can be replaced with other u8 values).
Slice []u8. A slice acts like a fat pointer, also containing the sequence length it refers to.
Sentinel-terminated slice [:0]u8. This type of slice points to a sequence that ends with a specific value (0 can be replaced with other u8 values).
Pointer to array *[n]u8. It acts alike a single item pointer, but the compiler knows that it refers to an array of a known size.
Pointer to sentinel-terminated array *[n:0]u8. Similar to a pointer-to-array, but this type of array is sentinel-terminated.

Representing Types Through Multiple Inheritance

Based on the documentation and various discussions on Zig Telegram channel, a concept arose to organize these types using a "skeleton" chart.

In the chart, green squares represent types. The labels for these green squares are depicted as variable_name:type. Meanwhile, yellow squares denote the operations required to transform a variable from one type to another. The labels for these yellow squares are illustrated through Zig language expressions.
The chart demonstrates four operations:

var array: [3]u8 = pointer_to_array.*;
var sentinel_array: [3:0]u8 = pointer_to_sentinel_array.*;
var pointer = slice.ptr;
var sentinel_pointer: [*:0] = sentinel_slice.ptr;

If a yellow square does not exist between green squares, this signifies that a variable of one type can be directly assigned to a variable of another type, without executing any additional operations. We can assert that the type of the variable on the right side is coerced to match the type of the variable on the left side of the equal sign.

var pointer_to_array: *[3]u8 = pointer_to_sentinel_array;
var sentinel_slice: [:0]u8 = pointer_to_sentinel_array;
var slice: []u8 = sentinel_slice;
slice = pointer_to_array;
var sentinel_pointer: [*:0] = pointer_to_sentinel_array;
var pointer: [*]u8 = sentinel_pointer;
var array: [3]u8 = sentinel_array;

Remember the inheritance rule from other languages: if Type2 extends Type1, then var_type1 = var_type2 is valid. Using this rule, green squares seem to resemble multiple inheritance. If Type3 extends Type2, and Type2 extends Type1, then Type3 should also extend Type1. For our type system, this rule is applicable: since *[3:0] extends *[3]u8 and *[3]u8 extends []u8, it follows that *[3:0] also extends []u8. This allows us to write:

var slice: []u8 = pointer_to_sentinel_array;

By reasoning in a similar way we can write:

var pointer: [*]u8 = pointer_to_sentinel_array;

In order to transform a variable of source type to a variable of destination type, we can nest operations and inheritance rules. Consider that we have a pointer to a sentinel array, *[3:0]u8, as our source type and array, [3]u8, as our destination type. As per the chart provided, this transformation is feasible.

var array: [3]u8 = pointer_to_sentinel_array.*;

In Depth

Actually sentinel slice [:0]u8 can be coerced to sentinel pointer [*:0]u8. At the same time slice []u8 can not be coerced to pointer [*]u8.

var sentinel_array: [3:0]u8 = .{ 1, 2, 3 };
var array: [3]u8 = .{ 1, 2, 3 };
var sentinel_slice: [:0]u8 = &sentinel_array;
var slice: []u8 = &array;

// Ok
var sentinel_pointer: [*:0]u8 = sentinel_slice;
// Error: expected type '[*]u8', found '[]u8' 
var pointer: [*]u8 = slice;

It appears inconsistent, and I recommend using the .ptr syntax, which reveals hidden control flow as declared on Zig's homepage.

// Ok
var sentinel_pointer: [*:0]u8 = sentinel_slice.ptr;
// Ok
var pointer: [*]u8 = slice.ptr;

However, Andrew disagrees. Therefore, let's leave this decision up to each individual programmer. For simplicity, only the explicit .ptr syntax is depicted in the chart.

While it's possible to assign a sentinel array [3:0]u8 to an array [3]u8, unfortunately, the reverse is incorrect.

var array: [3]u8 = .{ 1, 2, 3};
var sentinel_array: [3:0] = .{ 1, 2, 3};

// Ok
var array1: [3]u8 = sentinel_array;
// Error
var sentinel_array1: [3:0] = array;

This is generally regarded as a good proposal.

To be continued...

Note that all of the above reasoning is true for runtime, but the language has the famous comptime capabilities. The differences will be covered in the next article.
In the upcoming article, we will delve into a key function pertinent to multi-item type conversions: the slicing operation variable[a..b]. Additionally, we will elaborate on the @ptrCast built-in function. The "skeleton" and this detailed analysis will equip us with the knowledge to convert any multi-item type into another, effectively widening our understanding of Zig type system.