diff --git a/docs/section-2-using-parsers.md b/docs/section-2-using-parsers.md index 2dc87a11..589e07b9 100644 --- a/docs/section-2-using-parsers.md +++ b/docs/section-2-using-parsers.md @@ -580,3 +580,154 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match); ``` This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured. + +## Static Node Types + +In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This *node types* file provides structured data about every possible syntax node in a grammar. You can use this data to generate type declarations in a statically-typed programming language. + +The node types file contains an array of objects, each of which describes a particular type of syntax node using the following entries: + +#### Basic Info + +Every object in this array has these two entries: + +* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes). +* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info. + + +Examples: + +```json +{ + "type": "string_literal", + "named": true +} +{ + "type": "+", + "named": false +} +``` + +Together, these two fields constitute a unique identifier for a node type; no two top-level objects in the `node-types.json` should have the same values for both `"type"` and `"named"`. + +#### Internal Nodes + +Many syntax nodes can have *children*. The node type object describes the possible children that a node can have using the following entries: + +* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are *child type* objects, described below. +* `"children"` - Another *child type* object that describes all of the node's possible *named* children *without* fields. + +A *child type* object describes a set of child nodes using the following entries: + +* `"required"` - A boolean indicating whether there is always *at least one* node in this set. +* `"multiple"` - A boolean indicating whether there can be *multiple* nodes in this set. +* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above. + +Example with fields: + +```json +{ + "type": "method_definition", + "named": true, + "fields": { + "body": { + "multiple": false, + "required": true, + "types": [ + {"type": "statement_block", "named": true} + ] + }, + "decorator": { + "multiple": true, + "required": false, + "types": [ + {"type": "decorator", "named": true} + ] + }, + "name": { + "multiple": false, + "required": true, + "types": [ + {"type": "computed_property_name", "named": true}, + {"type": "property_identifier", "named": true}, + ] + }, + "parameters": { + "multiple": false, + "required": true, + "types": [ + {"type": "formal_parameters", "named": true} + ] + } + } +} +``` + +Example with children: + +```json +{ + "type": "array", + "named": true, + "fields": {}, + "children": { + "multiple": true, + "required": false, + "types": [ + {"type": "_expression", "named": true}, + {"type": "spread_element", "named": true} + ] + } +} +``` + +#### Supertype Nodes + +In Tree-sitter grammars, there are usually certain rules that represent abstract *categories* of syntax nodes (e.g. "expression", "type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules](./creating-parsers#hiding-rules) whose definition is a simple [`choice`](./creating-parsers#the-grammar-dsl) where each member is just a single symbol. + +Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it *will* show up in the node types file, with the following special entry: + +* `"subtypes"` - An array of objects that specify the *types* of nodes that this 'supertype' node can wrap. + +Example: + +```json +{ + "type": "_declaration", + "named": true, + "subtypes": [ + {"type": "class_declaration", "named": true}, + {"type": "function_declaration", "named": true}, + {"type": "generator_function_declaration", "named": true}, + {"type": "lexical_declaration", "named": true}, + {"type": "variable_declaration", "named": true} + ] +} +``` + +Supertype nodes will also appear elsewhere in the node types file, as children of other node types, in a way that corresponds with how the supertype rule was used in the grammar. This can make the node types much shorter and easier to read, because a single supertype will take the place of multiple subtypes. + +Example: + +```json +{ + "type": "export_statement", + "named": true, + "fields": { + "declaration": { + "multiple": false, + "required": false, + "types": [ + {"type": "_declaration", "named": true} + ] + }, + "source": { + "multiple": false, + "required": false, + "types": [ + {"type": "string", "named": true} + ] + }, + } +} +``` diff --git a/docs/section-3-creating-parsers.md b/docs/section-3-creating-parsers.md index 34076853..dc7285f5 100644 --- a/docs/section-3-creating-parsers.md +++ b/docs/section-3-creating-parsers.md @@ -220,6 +220,7 @@ In addition to the `name` and `rules` fields, grammars have a few other optional * **`conflicts`** - an array of arrays of rule names. Each inner array represents a set of rules that's involved in an *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick the subtree whose corresponding rule has the highest total *dynamic precedence*. * **`externals`** - an array of token names which can be returned by an [*external scanner*](#external-scanners). External scanners allow you to write custom C code which runs during the lexing process in order to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions. * **`word`** - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization. +* **`supertypes`** an array of hidden rule names which should be considered to be 'supertypes' in the generated [*node types* file][static-node-types]. ## Writing the Grammar @@ -715,6 +716,7 @@ if (valid_symbols[INDENT] || valid_symbol[DEDENT]) { [nan]: https://github.com/nodejs/nan [node-module]: https://www.npmjs.com/package/tree-sitter-cli [node.js]: https://nodejs.org +[static-node-types]: ./using-parsers#static-node-types [non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols [npm]: https://docs.npmjs.com [path-env]: https://en.wikipedia.org/wiki/PATH_(variable) diff --git a/docs/section-4-syntax-highlighting.md b/docs/section-4-syntax-highlighting.md index d67de287..84fb6abc 100644 --- a/docs/section-4-syntax-highlighting.md +++ b/docs/section-4-syntax-highlighting.md @@ -81,7 +81,7 @@ These keys specify basic information about the parser: * `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](https://macromates.com/manual/en/language_grammars) and by the [Linguist](https://github.com/github/linguist) library. -* `path` (optional) - A relative path from the directory containig `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden. +* `path` (optional) - A relative path from the directory containing `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden. ### Language Detection