tree-sitter/docs/section-8-code-navigation-systems.md
Patrick Thomson 1fbace136d Add examples.
2022-02-17 17:20:21 -05:00

5 KiB

title permalink
Code Navigation Systems code-navigation-systems

Code Navigation Systems

Tree-sitter can be used in conjunction with its tree query language as a part of code navigation systems. An example of such a system can be seen in the tree-sitter tag command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for search-based code navigation. This document exists to describe how to extend the

Tagging and captures

Tagging is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name.

You can use the tree-sitter tag command to test out a given set of tags

The essence of a given tag lies in two pieces of data: the kind of entity that is matched (usually a definition or a reference) and the role of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the @kind.role capture name format, and another inner capture, always called @name, that pulls out the name of a given identifier. ' You may optionally include a capture named @doc to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, #select-adjacent and #strip that are convenient for removing comment syntax from a docstring. #strip takes a capture as its first argument and a regular expression, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. #select-adjacent, when passed two capture names, filters the text associated with the first capture so that only text adjacent to the second capture is preserved. This can be useful when writing queries that would otherwise include too much information in matched comments.

Examples

An example query follows, one that recognizes Python function definitions and captures their declared name. The function_definition syntax node is defined in the Python Tree-sitter grammar.

(function_definition
  name: (identifier) @name) @definition.function

A more sophisticated query can be found in the JavaScript Tree-sitter repository:

(assignment_expression
  left: [
    (identifier) @name
    (member_expression
      property: (property_identifier) @name)
  ]
  right: [(arrow_function) (function)]
) @definition.function

An even more sophisticated query is in the Ruby Tree-sitter repository, which uses built-in functions to strip the Ruby comment character (#) from the docstrings associated with a class or singleton-class declaration, then selects only the docstrings adjacent to the node matched as @definition.class.

(
  (comment)* @doc
  .
  [
    (class
      name: [
        (constant) @name
        (scope_resolution
          name: (_) @name)
      ]) @definition.class
    (singleton_class
      value: [
        (constant) @name
        (scope_resolution
          name: (_) @name)
      ]) @definition.class
  ]
  (#strip! @doc "^#\\s*")
  (#select-adjacent! @doc @definition.class)
)

The below table describes a standard vocabulary for kinds and roles during the tagging process. User applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below when supported by a given system or language. Language communities that write tagging rules using these names can work out-of-the-box with a steadily increasing set of analysis tools.

Category Tag
Class definitions @definition.class
Function definitions @definition.function
Interface definitions @definition.interface
Method definitions @definition.method
Module definitions @definition.module
Function/method calls @reference.call
Class reference @reference.class
Interface implementation @reference.implementation

By convention, tags for a given language are made available in a queries/tags.scm file in that language's repository.