docs: add more information on supertype nodes for grammars and queries
This commit is contained in:
parent
63f48afaeb
commit
3a911d578c
3 changed files with 89 additions and 18 deletions
|
|
@ -129,8 +129,11 @@ than globally. Can only be used with parse precedence, not lexical precedence.
|
|||
- **`word`** — the name of a token that will match keywords to the
|
||||
[keyword extraction][keyword-extraction] optimization.
|
||||
|
||||
- **`supertypes`** — an array of hidden rule names which should be considered to be 'supertypes' in the generated
|
||||
[*node types* file][static-node-types].
|
||||
- **`supertypes`** — an array of rule names which should be considered to be 'supertypes' in the generated
|
||||
[*node types* file][static-node-types-supertypes]. Supertype rules are automatically hidden from the parse tree, regardless
|
||||
of whether their names start with an underscore. The main use case for supertypes is to group together multiple different
|
||||
kinds of nodes under a single abstract category, such as "expression" or "declaration". See the section on [`using supertypes`][supertypes]
|
||||
for more details.
|
||||
|
||||
- **`reserved`** — similar in structure to the main `rules` property, an object of reserved word sets associated with an
|
||||
array of reserved rules. The reserved rule in the array must be a terminal token meaning it must be a string, regex, token,
|
||||
|
|
@ -144,11 +147,13 @@ empty array, signifying *no* keywords are reserved.
|
|||
[bison-dprec]: https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
|
||||
[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
|
||||
[external-scanners]: ./4-external-scanners.md
|
||||
[extras]: ./3-writing-the-grammar.html#using-extras
|
||||
[extras]: ./3-writing-the-grammar.md#using-extras
|
||||
[keyword-extraction]: ./3-writing-the-grammar.md#keyword-extraction
|
||||
[lexical vs parse]: ./3-writing-the-grammar.md#lexical-precedence-vs-parse-precedence
|
||||
[lr-conflict]: https://en.wikipedia.org/wiki/LR_parser#Conflicts_in_the_constructed_tables
|
||||
[named-vs-anonymous-nodes]: ../using-parsers/2-basic-parsing.md#named-vs-anonymous-nodes
|
||||
[rust regex]: https://docs.rs/regex/1.1.8/regex/#grouping-and-flags
|
||||
[static-node-types]: ../using-parsers/6-static-node-types.md
|
||||
[static-node-types-supertypes]: ../using-parsers/6-static-node-types.md#supertype-nodes
|
||||
[supertypes]: ./3-writing-the-grammar.md#using-supertypes
|
||||
[yacc-prec]: https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
|
||||
|
|
|
|||
|
|
@ -74,11 +74,11 @@ you might start with something like this:
|
|||
|
||||
return_statement: $ => seq(
|
||||
'return',
|
||||
$._expression,
|
||||
$.expression,
|
||||
';'
|
||||
),
|
||||
|
||||
_expression: $ => choice(
|
||||
expression: $ => choice(
|
||||
$.identifier,
|
||||
$.number
|
||||
// TODO: other kinds of expressions
|
||||
|
|
@ -202,7 +202,7 @@ To produce a readable syntax tree, we'd like to model JavaScript expressions usi
|
|||
{
|
||||
// ...
|
||||
|
||||
_expression: $ => choice(
|
||||
expression: $ => choice(
|
||||
$.identifier,
|
||||
$.unary_expression,
|
||||
$.binary_expression,
|
||||
|
|
@ -210,14 +210,14 @@ To produce a readable syntax tree, we'd like to model JavaScript expressions usi
|
|||
),
|
||||
|
||||
unary_expression: $ => choice(
|
||||
seq('-', $._expression),
|
||||
seq('!', $._expression),
|
||||
seq('-', $.expression),
|
||||
seq('!', $.expression),
|
||||
// ...
|
||||
),
|
||||
|
||||
binary_expression: $ => choice(
|
||||
seq($._expression, '*', $._expression),
|
||||
seq($._expression, '+', $._expression),
|
||||
seq($.expression, '*', $.expression),
|
||||
seq($.expression, '+', $.expression),
|
||||
// ...
|
||||
),
|
||||
}
|
||||
|
|
@ -252,7 +252,7 @@ ambiguity.
|
|||
For an expression like `-a * b`, it's not clear whether the `-` operator applies to the `a * b` or just to the `a`. This
|
||||
is where the `prec` function [described in the previous page][grammar dsl] comes into play. By wrapping a rule with `prec`,
|
||||
we can indicate that certain sequence of symbols should _bind to each other more tightly_ than others. For example, the
|
||||
`'-', $._expression` sequence in `unary_expression` should bind more tightly than the `$._expression, '+', $._expression`
|
||||
`'-', $.expression` sequence in `unary_expression` should bind more tightly than the `$.expression, '+', $.expression`
|
||||
sequence in `binary_expression`:
|
||||
|
||||
```js
|
||||
|
|
@ -263,8 +263,8 @@ sequence in `binary_expression`:
|
|||
prec(
|
||||
2,
|
||||
choice(
|
||||
seq("-", $._expression),
|
||||
seq("!", $._expression),
|
||||
seq("-", $.expression),
|
||||
seq("!", $.expression),
|
||||
// ...
|
||||
),
|
||||
);
|
||||
|
|
@ -299,8 +299,8 @@ This is where `prec.left` and `prec.right` come into use. We want to select the
|
|||
// ...
|
||||
|
||||
binary_expression: $ => choice(
|
||||
prec.left(2, seq($._expression, '*', $._expression)),
|
||||
prec.left(1, seq($._expression, '+', $._expression)),
|
||||
prec.left(2, seq($.expression, '*', $.expression)),
|
||||
prec.left(1, seq($.expression, '+', $.expression)),
|
||||
// ...
|
||||
),
|
||||
}
|
||||
|
|
@ -476,6 +476,51 @@ typically in ways that don't affect the meaning of the pattern. For example, `\w
|
|||
to `[ \t\n\r]`, and `\d` to `[0-9]`. If you need more complex behavior, you can always use a more explicit regex.
|
||||
```
|
||||
|
||||
## Using Supertypes
|
||||
|
||||
Some rules in your grammar will represent abstract categories of syntax nodes, such as "expression", "type", or "declaration".
|
||||
These rules are often defined as simple choices between several other rules. For example, in the JavaScript grammar, the
|
||||
`_expression` rule is defined as a choice between many different kinds of expressions:
|
||||
|
||||
```js
|
||||
expression: $ => choice(
|
||||
$.identifier,
|
||||
$.unary_expression,
|
||||
$.binary_expression,
|
||||
$.call_expression,
|
||||
$.member_expression,
|
||||
// ...
|
||||
),
|
||||
```
|
||||
|
||||
By default, Tree-sitter will generate a visible node type for each of these abstract category rules, which can lead to
|
||||
unnecessarily deep and complex syntax trees. To avoid this, you can add these abstract category rules to the grammar's `supertypes`
|
||||
definition. Tree-sitter will then treat these rules as _supertypes_, and will not generate visible node types for them in
|
||||
the syntax tree.
|
||||
|
||||
```js
|
||||
module.exports = grammar({
|
||||
name: "javascript",
|
||||
|
||||
supertypes: $ => [
|
||||
$.expression,
|
||||
],
|
||||
|
||||
rules: {
|
||||
expression: $ => choice(
|
||||
$.identifier,
|
||||
// ...
|
||||
),
|
||||
|
||||
// ...
|
||||
},
|
||||
});
|
||||
_
|
||||
```
|
||||
|
||||
Although supertype rules are hidden from the syntax tree, they can still be used in queries. See the chapter on
|
||||
[Query Syntax][query syntax] for more information.
|
||||
|
||||
# Lexical Analysis
|
||||
|
||||
Tree-sitter's parsing process is divided into two phases: parsing (which is described above) and [lexing][lexing] — the
|
||||
|
|
@ -554,7 +599,7 @@ grammar({
|
|||
word: $ => $.identifier,
|
||||
|
||||
rules: {
|
||||
_expression: $ =>
|
||||
expression: $ =>
|
||||
choice(
|
||||
$.identifier,
|
||||
$.unary_expression,
|
||||
|
|
@ -564,13 +609,13 @@ grammar({
|
|||
|
||||
binary_expression: $ =>
|
||||
choice(
|
||||
prec.left(1, seq($._expression, "instanceof", $._expression)),
|
||||
prec.left(1, seq($.expression, "instanceof", $.expression)),
|
||||
// ...
|
||||
),
|
||||
|
||||
unary_expression: $ =>
|
||||
choice(
|
||||
prec.left(2, seq("typeof", $._expression)),
|
||||
prec.left(2, seq("typeof", $.expression)),
|
||||
// ...
|
||||
),
|
||||
|
||||
|
|
@ -607,5 +652,6 @@ rule that's called something else, you should just alias the word token instead,
|
|||
[field-names-section]: ../using-parsers/2-basic-parsing.md#node-field-names
|
||||
[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
|
||||
[peg]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
|
||||
[query syntax]: ../using-parsers/queries/1-syntax.md#supertype-nodes
|
||||
[tree-sitter-javascript]: https://github.com/tree-sitter/tree-sitter-javascript
|
||||
[yacc]: https://en.wikipedia.org/wiki/Yacc
|
||||
|
|
|
|||
|
|
@ -96,6 +96,26 @@ by `(ERROR)` queries. Specific missing node types can also be queried:
|
|||
(MISSING ";") @missing-semicolon
|
||||
```
|
||||
|
||||
### Supertype Nodes
|
||||
|
||||
Some node types are marked as _supertypes_ in a grammar. A supertype is a node type that contains multiple
|
||||
subtypes. For example, in the [JavaScript grammar example][grammar], `expression` is a supertype that can represent any kind
|
||||
of expression, such as a `binary_expression`, `call_expression`, or `identifier`. You can use supertypes in queries to match
|
||||
any of their subtypes, rather than having to list out each subtype individually. For example, this pattern would match any
|
||||
kind of expression, even though it's not a visible node in the syntax tree:
|
||||
|
||||
```query
|
||||
(expression) @any-expression
|
||||
```
|
||||
|
||||
To query specific subtypes of a supertype, you can use the syntax `supertype/subtype`. For example, this pattern would
|
||||
match a `binary_expression` only if it is a child of `expression`:
|
||||
|
||||
```query
|
||||
(expression/binary_expression) @binary-expression
|
||||
```
|
||||
|
||||
[grammar]: ../../creating-parsers/3-writing-the-grammar.md#structuring-rules-well
|
||||
[node-field-names]: ../2-basic-parsing.md#node-field-names
|
||||
[named-vs-anonymous-nodes]: ../2-basic-parsing.md#named-vs-anonymous-nodes
|
||||
[s-exp]: https://en.wikipedia.org/wiki/S-expression
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue