docs: add more information on supertype nodes for grammars and queries

This commit is contained in:
Amaan Qureshi 2025-09-14 05:40:19 -04:00 committed by Amaan Qureshi
parent 63f48afaeb
commit 3a911d578c
3 changed files with 89 additions and 18 deletions

View file

@ -129,8 +129,11 @@ than globally. Can only be used with parse precedence, not lexical precedence.
- **`word`** — the name of a token that will match keywords to the
[keyword extraction][keyword-extraction] optimization.
- **`supertypes`** — an array of hidden rule names which should be considered to be 'supertypes' in the generated
[*node types* file][static-node-types].
- **`supertypes`** — an array of rule names which should be considered to be 'supertypes' in the generated
[*node types* file][static-node-types-supertypes]. Supertype rules are automatically hidden from the parse tree, regardless
of whether their names start with an underscore. The main use case for supertypes is to group together multiple different
kinds of nodes under a single abstract category, such as "expression" or "declaration". See the section on [`using supertypes`][supertypes]
for more details.
- **`reserved`** — similar in structure to the main `rules` property, an object of reserved word sets associated with an
array of reserved rules. The reserved rule in the array must be a terminal token meaning it must be a string, regex, token,
@ -144,11 +147,13 @@ empty array, signifying *no* keywords are reserved.
[bison-dprec]: https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
[external-scanners]: ./4-external-scanners.md
[extras]: ./3-writing-the-grammar.html#using-extras
[extras]: ./3-writing-the-grammar.md#using-extras
[keyword-extraction]: ./3-writing-the-grammar.md#keyword-extraction
[lexical vs parse]: ./3-writing-the-grammar.md#lexical-precedence-vs-parse-precedence
[lr-conflict]: https://en.wikipedia.org/wiki/LR_parser#Conflicts_in_the_constructed_tables
[named-vs-anonymous-nodes]: ../using-parsers/2-basic-parsing.md#named-vs-anonymous-nodes
[rust regex]: https://docs.rs/regex/1.1.8/regex/#grouping-and-flags
[static-node-types]: ../using-parsers/6-static-node-types.md
[static-node-types-supertypes]: ../using-parsers/6-static-node-types.md#supertype-nodes
[supertypes]: ./3-writing-the-grammar.md#using-supertypes
[yacc-prec]: https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html

View file

@ -74,11 +74,11 @@ you might start with something like this:
return_statement: $ => seq(
'return',
$._expression,
$.expression,
';'
),
_expression: $ => choice(
expression: $ => choice(
$.identifier,
$.number
// TODO: other kinds of expressions
@ -202,7 +202,7 @@ To produce a readable syntax tree, we'd like to model JavaScript expressions usi
{
// ...
_expression: $ => choice(
expression: $ => choice(
$.identifier,
$.unary_expression,
$.binary_expression,
@ -210,14 +210,14 @@ To produce a readable syntax tree, we'd like to model JavaScript expressions usi
),
unary_expression: $ => choice(
seq('-', $._expression),
seq('!', $._expression),
seq('-', $.expression),
seq('!', $.expression),
// ...
),
binary_expression: $ => choice(
seq($._expression, '*', $._expression),
seq($._expression, '+', $._expression),
seq($.expression, '*', $.expression),
seq($.expression, '+', $.expression),
// ...
),
}
@ -252,7 +252,7 @@ ambiguity.
For an expression like `-a * b`, it's not clear whether the `-` operator applies to the `a * b` or just to the `a`. This
is where the `prec` function [described in the previous page][grammar dsl] comes into play. By wrapping a rule with `prec`,
we can indicate that certain sequence of symbols should _bind to each other more tightly_ than others. For example, the
`'-', $._expression` sequence in `unary_expression` should bind more tightly than the `$._expression, '+', $._expression`
`'-', $.expression` sequence in `unary_expression` should bind more tightly than the `$.expression, '+', $.expression`
sequence in `binary_expression`:
```js
@ -263,8 +263,8 @@ sequence in `binary_expression`:
prec(
2,
choice(
seq("-", $._expression),
seq("!", $._expression),
seq("-", $.expression),
seq("!", $.expression),
// ...
),
);
@ -299,8 +299,8 @@ This is where `prec.left` and `prec.right` come into use. We want to select the
// ...
binary_expression: $ => choice(
prec.left(2, seq($._expression, '*', $._expression)),
prec.left(1, seq($._expression, '+', $._expression)),
prec.left(2, seq($.expression, '*', $.expression)),
prec.left(1, seq($.expression, '+', $.expression)),
// ...
),
}
@ -476,6 +476,51 @@ typically in ways that don't affect the meaning of the pattern. For example, `\w
to `[ \t\n\r]`, and `\d` to `[0-9]`. If you need more complex behavior, you can always use a more explicit regex.
```
## Using Supertypes
Some rules in your grammar will represent abstract categories of syntax nodes, such as "expression", "type", or "declaration".
These rules are often defined as simple choices between several other rules. For example, in the JavaScript grammar, the
`_expression` rule is defined as a choice between many different kinds of expressions:
```js
expression: $ => choice(
$.identifier,
$.unary_expression,
$.binary_expression,
$.call_expression,
$.member_expression,
// ...
),
```
By default, Tree-sitter will generate a visible node type for each of these abstract category rules, which can lead to
unnecessarily deep and complex syntax trees. To avoid this, you can add these abstract category rules to the grammar's `supertypes`
definition. Tree-sitter will then treat these rules as _supertypes_, and will not generate visible node types for them in
the syntax tree.
```js
module.exports = grammar({
name: "javascript",
supertypes: $ => [
$.expression,
],
rules: {
expression: $ => choice(
$.identifier,
// ...
),
// ...
},
});
_
```
Although supertype rules are hidden from the syntax tree, they can still be used in queries. See the chapter on
[Query Syntax][query syntax] for more information.
# Lexical Analysis
Tree-sitter's parsing process is divided into two phases: parsing (which is described above) and [lexing][lexing] — the
@ -554,7 +599,7 @@ grammar({
word: $ => $.identifier,
rules: {
_expression: $ =>
expression: $ =>
choice(
$.identifier,
$.unary_expression,
@ -564,13 +609,13 @@ grammar({
binary_expression: $ =>
choice(
prec.left(1, seq($._expression, "instanceof", $._expression)),
prec.left(1, seq($.expression, "instanceof", $.expression)),
// ...
),
unary_expression: $ =>
choice(
prec.left(2, seq("typeof", $._expression)),
prec.left(2, seq("typeof", $.expression)),
// ...
),
@ -607,5 +652,6 @@ rule that's called something else, you should just alias the word token instead,
[field-names-section]: ../using-parsers/2-basic-parsing.md#node-field-names
[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
[peg]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
[query syntax]: ../using-parsers/queries/1-syntax.md#supertype-nodes
[tree-sitter-javascript]: https://github.com/tree-sitter/tree-sitter-javascript
[yacc]: https://en.wikipedia.org/wiki/Yacc

View file

@ -96,6 +96,26 @@ by `(ERROR)` queries. Specific missing node types can also be queried:
(MISSING ";") @missing-semicolon
```
### Supertype Nodes
Some node types are marked as _supertypes_ in a grammar. A supertype is a node type that contains multiple
subtypes. For example, in the [JavaScript grammar example][grammar], `expression` is a supertype that can represent any kind
of expression, such as a `binary_expression`, `call_expression`, or `identifier`. You can use supertypes in queries to match
any of their subtypes, rather than having to list out each subtype individually. For example, this pattern would match any
kind of expression, even though it's not a visible node in the syntax tree:
```query
(expression) @any-expression
```
To query specific subtypes of a supertype, you can use the syntax `supertype/subtype`. For example, this pattern would
match a `binary_expression` only if it is a child of `expression`:
```query
(expression/binary_expression) @binary-expression
```
[grammar]: ../../creating-parsers/3-writing-the-grammar.md#structuring-rules-well
[node-field-names]: ../2-basic-parsing.md#node-field-names
[named-vs-anonymous-nodes]: ../2-basic-parsing.md#named-vs-anonymous-nodes
[s-exp]: https://en.wikipedia.org/wiki/S-expression