Indexing every Zig for great justice

A few months ago, I was on the lookout for a new Zig tooling project for zigtools. I wanted a change of scenery from zls, a tool that provides advanced editor features like completions and goto definition for the Zig language, but also a sense of familiarity so I could make use of what I've picked up over the years on static analysis and developer tooling.

Clearly I didn't look too far because I ran into LSIF, the Language Server Index Format, the Language Server Protocol's "static" indexer twin. I asked around and eventually reached out to Stephen Gutekanst, a tooling developer by day and game engine developer by night as well as a familiar face in the Zig space, to ask his opinion on LSIF and he informed me about SCIP, Sourcegraph's response to LSIF, which they boldly (but to be honest, quite accurately) claim is "a better code indexing format than LSIF."

So, I implemented a SCIP indexer for Zig: scip-zig!

So... what is an indexer?

Let's start with an example: you're at a coworker's desk and you want to show them pieces of scip-zig's source code. Installing Zig and zls on their machine would take a little too long, so you open Sourcegraph's Code Search and land in src/analysis/Analyzer.zig.

You see a variable named dwa is referenced and wonder where it's defined, so you click on it and then on "Go to definition," and Sourcegraph takes you to the definition site. Then you use "Find references" to find each time the variable is used. These features and more are all provided by an indexer which generated an index from the source code the last time it changed.

The index contains every symbol in the document, what kind of symbol each symbol is, and how every symbol is connected to every other relevant symbol, as well as documentation, modification sites, and other information that may be useful to you.

  const Analyzer = @This();
//      ^^^^^^^^ definition file . root unversioned src/analysis/Analyzer.zig/Analyzer#

Sample symbol (src/analysis/Analyzer.zig/Analyzer#) from a snapshot, which is a debugging-friendly format that allows you to test your index generation by projecting indices back onto their source code

The two main "standard" formats in this space are Microsoft's LSIF, derived from its Language Server Protocol, and Sourcegraph's SCIP, originating from Sourcegraph's need for a simpler and faster to implement, more powerful, and more flexible indexing format than LSIF.

Why use SCIP over LSIF?

Though Sourcegraph's article on the matter already summarizes SCIP vs LSIF quite well, here are a few quick reasons I prefer SCIP to LSIF:

SCIP has no graph structures we're forced to handle! This makes implementing a SCIP indexer a million times easier than an LSIF one.
SCIP uses Protobuf and not JSON like LSIF. Protobuf is simpler, more space efficient, and much more fun to use with a systems language; I implemented the Protobuf encoding spec and the SCIP protobuf schema in very few lines of code.
SCIP actually works - LSIF is so experimental that finding actual uses of it other than (ironically enough) Sourcegraph's code search is quite difficult.

Progress so far

So far, my SCIP Zig indexer is quite rudimentary; a lot of the work over this last month has been on getting fundamental features down, mostly derived from zls' architecture:

Generating valid code indices (this took a good few hours of tinkering)
Various scope kinds
- Structs, enums, and unions
- Functions
- Anonymous blocks
Local and global declarations (though type inference is still a WIP)
Imports across files and libraries
Successfully indexing std(!)

Future goals

Improve usability and performance to the point where it can reliably be deployed as an official indexer by Sourcegraph
Computing variables' inferred types
Properly tracking external libraries
build.zig integration
comptime evaluation

More broadly, I'll have to investigate hacking around stage2 (Zig's self-hosted compiler) / a Zig interpreter to get extremely accurate type information. This is a zls goal as well but scip-zig has many more constraints that make it an optimal testing ground for these kinds of experiments.

If you haven't already, you can check out scip-zig here!