Document supertypes and the node-types file
References #542 References #524 Closes #393
This commit is contained in:
parent
96c060fc6d
commit
f1e4104d47
3 changed files with 154 additions and 1 deletions
|
|
@ -580,3 +580,154 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match);
|
|||
```
|
||||
|
||||
This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured.
|
||||
|
||||
## Static Node Types
|
||||
|
||||
In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This *node types* file provides structured data about every possible syntax node in a grammar. You can use this data to generate type declarations in a statically-typed programming language.
|
||||
|
||||
The node types file contains an array of objects, each of which describes a particular type of syntax node using the following entries:
|
||||
|
||||
#### Basic Info
|
||||
|
||||
Every object in this array has these two entries:
|
||||
|
||||
* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
|
||||
* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
|
||||
|
||||
|
||||
Examples:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "string_literal",
|
||||
"named": true
|
||||
}
|
||||
{
|
||||
"type": "+",
|
||||
"named": false
|
||||
}
|
||||
```
|
||||
|
||||
Together, these two fields constitute a unique identifier for a node type; no two top-level objects in the `node-types.json` should have the same values for both `"type"` and `"named"`.
|
||||
|
||||
#### Internal Nodes
|
||||
|
||||
Many syntax nodes can have *children*. The node type object describes the possible children that a node can have using the following entries:
|
||||
|
||||
* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are *child type* objects, described below.
|
||||
* `"children"` - Another *child type* object that describes all of the node's possible *named* children *without* fields.
|
||||
|
||||
A *child type* object describes a set of child nodes using the following entries:
|
||||
|
||||
* `"required"` - A boolean indicating whether there is always *at least one* node in this set.
|
||||
* `"multiple"` - A boolean indicating whether there can be *multiple* nodes in this set.
|
||||
* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
|
||||
|
||||
Example with fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "method_definition",
|
||||
"named": true,
|
||||
"fields": {
|
||||
"body": {
|
||||
"multiple": false,
|
||||
"required": true,
|
||||
"types": [
|
||||
{"type": "statement_block", "named": true}
|
||||
]
|
||||
},
|
||||
"decorator": {
|
||||
"multiple": true,
|
||||
"required": false,
|
||||
"types": [
|
||||
{"type": "decorator", "named": true}
|
||||
]
|
||||
},
|
||||
"name": {
|
||||
"multiple": false,
|
||||
"required": true,
|
||||
"types": [
|
||||
{"type": "computed_property_name", "named": true},
|
||||
{"type": "property_identifier", "named": true},
|
||||
]
|
||||
},
|
||||
"parameters": {
|
||||
"multiple": false,
|
||||
"required": true,
|
||||
"types": [
|
||||
{"type": "formal_parameters", "named": true}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Example with children:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "array",
|
||||
"named": true,
|
||||
"fields": {},
|
||||
"children": {
|
||||
"multiple": true,
|
||||
"required": false,
|
||||
"types": [
|
||||
{"type": "_expression", "named": true},
|
||||
{"type": "spread_element", "named": true}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Supertype Nodes
|
||||
|
||||
In Tree-sitter grammars, there are usually certain rules that represent abstract *categories* of syntax nodes (e.g. "expression", "type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules](./creating-parsers#hiding-rules) whose definition is a simple [`choice`](./creating-parsers#the-grammar-dsl) where each member is just a single symbol.
|
||||
|
||||
Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it *will* show up in the node types file, with the following special entry:
|
||||
|
||||
* `"subtypes"` - An array of objects that specify the *types* of nodes that this 'supertype' node can wrap.
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "_declaration",
|
||||
"named": true,
|
||||
"subtypes": [
|
||||
{"type": "class_declaration", "named": true},
|
||||
{"type": "function_declaration", "named": true},
|
||||
{"type": "generator_function_declaration", "named": true},
|
||||
{"type": "lexical_declaration", "named": true},
|
||||
{"type": "variable_declaration", "named": true}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Supertype nodes will also appear elsewhere in the node types file, as children of other node types, in a way that corresponds with how the supertype rule was used in the grammar. This can make the node types much shorter and easier to read, because a single supertype will take the place of multiple subtypes.
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "export_statement",
|
||||
"named": true,
|
||||
"fields": {
|
||||
"declaration": {
|
||||
"multiple": false,
|
||||
"required": false,
|
||||
"types": [
|
||||
{"type": "_declaration", "named": true}
|
||||
]
|
||||
},
|
||||
"source": {
|
||||
"multiple": false,
|
||||
"required": false,
|
||||
"types": [
|
||||
{"type": "string", "named": true}
|
||||
]
|
||||
},
|
||||
}
|
||||
}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -220,6 +220,7 @@ In addition to the `name` and `rules` fields, grammars have a few other optional
|
|||
* **`conflicts`** - an array of arrays of rule names. Each inner array represents a set of rules that's involved in an *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick the subtree whose corresponding rule has the highest total *dynamic precedence*.
|
||||
* **`externals`** - an array of token names which can be returned by an [*external scanner*](#external-scanners). External scanners allow you to write custom C code which runs during the lexing process in order to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions.
|
||||
* **`word`** - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization.
|
||||
* **`supertypes`** an array of hidden rule names which should be considered to be 'supertypes' in the generated [*node types* file][static-node-types].
|
||||
|
||||
|
||||
## Writing the Grammar
|
||||
|
|
@ -715,6 +716,7 @@ if (valid_symbols[INDENT] || valid_symbol[DEDENT]) {
|
|||
[nan]: https://github.com/nodejs/nan
|
||||
[node-module]: https://www.npmjs.com/package/tree-sitter-cli
|
||||
[node.js]: https://nodejs.org
|
||||
[static-node-types]: ./using-parsers#static-node-types
|
||||
[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
|
||||
[npm]: https://docs.npmjs.com
|
||||
[path-env]: https://en.wikipedia.org/wiki/PATH_(variable)
|
||||
|
|
|
|||
|
|
@ -81,7 +81,7 @@ These keys specify basic information about the parser:
|
|||
|
||||
* `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](https://macromates.com/manual/en/language_grammars) and by the [Linguist](https://github.com/github/linguist) library.
|
||||
|
||||
* `path` (optional) - A relative path from the directory containig `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
|
||||
* `path` (optional) - A relative path from the directory containing `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
|
||||
|
||||
### Language Detection
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue