Document supertypes and the node-types file

References #542
References #524
Closes #393
This commit is contained in:
Max Brunsfeld 2020-02-24 11:12:42 -08:00
parent 96c060fc6d
commit f1e4104d47
3 changed files with 154 additions and 1 deletions

View file

@ -580,3 +580,154 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match);
```
This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured.
## Static Node Types
In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This *node types* file provides structured data about every possible syntax node in a grammar. You can use this data to generate type declarations in a statically-typed programming language.
The node types file contains an array of objects, each of which describes a particular type of syntax node using the following entries:
#### Basic Info
Every object in this array has these two entries:
* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
Examples:
```json
{
"type": "string_literal",
"named": true
}
{
"type": "+",
"named": false
}
```
Together, these two fields constitute a unique identifier for a node type; no two top-level objects in the `node-types.json` should have the same values for both `"type"` and `"named"`.
#### Internal Nodes
Many syntax nodes can have *children*. The node type object describes the possible children that a node can have using the following entries:
* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are *child type* objects, described below.
* `"children"` - Another *child type* object that describes all of the node's possible *named* children *without* fields.
A *child type* object describes a set of child nodes using the following entries:
* `"required"` - A boolean indicating whether there is always *at least one* node in this set.
* `"multiple"` - A boolean indicating whether there can be *multiple* nodes in this set.
* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
Example with fields:
```json
{
"type": "method_definition",
"named": true,
"fields": {
"body": {
"multiple": false,
"required": true,
"types": [
{"type": "statement_block", "named": true}
]
},
"decorator": {
"multiple": true,
"required": false,
"types": [
{"type": "decorator", "named": true}
]
},
"name": {
"multiple": false,
"required": true,
"types": [
{"type": "computed_property_name", "named": true},
{"type": "property_identifier", "named": true},
]
},
"parameters": {
"multiple": false,
"required": true,
"types": [
{"type": "formal_parameters", "named": true}
]
}
}
}
```
Example with children:
```json
{
"type": "array",
"named": true,
"fields": {},
"children": {
"multiple": true,
"required": false,
"types": [
{"type": "_expression", "named": true},
{"type": "spread_element", "named": true}
]
}
}
```
#### Supertype Nodes
In Tree-sitter grammars, there are usually certain rules that represent abstract *categories* of syntax nodes (e.g. "expression", "type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules](./creating-parsers#hiding-rules) whose definition is a simple [`choice`](./creating-parsers#the-grammar-dsl) where each member is just a single symbol.
Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it *will* show up in the node types file, with the following special entry:
* `"subtypes"` - An array of objects that specify the *types* of nodes that this 'supertype' node can wrap.
Example:
```json
{
"type": "_declaration",
"named": true,
"subtypes": [
{"type": "class_declaration", "named": true},
{"type": "function_declaration", "named": true},
{"type": "generator_function_declaration", "named": true},
{"type": "lexical_declaration", "named": true},
{"type": "variable_declaration", "named": true}
]
}
```
Supertype nodes will also appear elsewhere in the node types file, as children of other node types, in a way that corresponds with how the supertype rule was used in the grammar. This can make the node types much shorter and easier to read, because a single supertype will take the place of multiple subtypes.
Example:
```json
{
"type": "export_statement",
"named": true,
"fields": {
"declaration": {
"multiple": false,
"required": false,
"types": [
{"type": "_declaration", "named": true}
]
},
"source": {
"multiple": false,
"required": false,
"types": [
{"type": "string", "named": true}
]
},
}
}
```

View file

@ -220,6 +220,7 @@ In addition to the `name` and `rules` fields, grammars have a few other optional
* **`conflicts`** - an array of arrays of rule names. Each inner array represents a set of rules that's involved in an *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick the subtree whose corresponding rule has the highest total *dynamic precedence*.
* **`externals`** - an array of token names which can be returned by an [*external scanner*](#external-scanners). External scanners allow you to write custom C code which runs during the lexing process in order to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions.
* **`word`** - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization.
* **`supertypes`** an array of hidden rule names which should be considered to be 'supertypes' in the generated [*node types* file][static-node-types].
## Writing the Grammar
@ -715,6 +716,7 @@ if (valid_symbols[INDENT] || valid_symbol[DEDENT]) {
[nan]: https://github.com/nodejs/nan
[node-module]: https://www.npmjs.com/package/tree-sitter-cli
[node.js]: https://nodejs.org
[static-node-types]: ./using-parsers#static-node-types
[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
[npm]: https://docs.npmjs.com
[path-env]: https://en.wikipedia.org/wiki/PATH_(variable)

View file

@ -81,7 +81,7 @@ These keys specify basic information about the parser:
* `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](https://macromates.com/manual/en/language_grammars) and by the [Linguist](https://github.com/github/linguist) library.
* `path` (optional) - A relative path from the directory containig `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
* `path` (optional) - A relative path from the directory containing `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
### Language Detection