Document supertypes and the node-types file

References #542 References #524 Closes #393
2020-02-24 11:12:42 -08:00 · 2020-02-24 11:12:42 -08:00 · f1e4104d47
commit f1e4104d47
parent 96c060fc6d
3 changed files with 154 additions and 1 deletions
--- a/docs/section-2-using-parsers.md
+++ b/docs/section-2-using-parsers.md
@ -580,3 +580,154 @@ bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match);
 ```

 This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about which pattern matched and which nodes were captured.
+
+## Static Node Types
+
+In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This *node types* file provides structured data about every possible syntax node in a grammar. You can use this data to generate type declarations in a statically-typed programming language.
+
+The node types file contains an array of objects, each of which describes a particular type of syntax node using the following entries:
+
+#### Basic Info
+
+Every object in this array has these two entries:
+
+* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
+* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
+
+
+Examples:
+
+```json
+{
+  "type": "string_literal",
+  "named": true
+}
+{
+  "type": "+",
+  "named": false
+}
+```
+
+Together, these two fields constitute a unique identifier for a node type; no two top-level objects in the `node-types.json` should have the same values for both `"type"` and `"named"`.
+
+#### Internal Nodes
+
+Many syntax nodes can have *children*. The node type object describes the possible children that a node can have using the following entries:
+
+* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are *child type* objects, described below.
+* `"children"` - Another *child type* object that describes all of the node's possible *named* children *without* fields.
+
+A *child type* object describes a set of child nodes using the following entries:
+
+* `"required"` - A boolean indicating whether there is always *at least one* node in this set.
+* `"multiple"` - A boolean indicating whether there can be *multiple* nodes in this set.
+* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
+
+Example with fields:
+
+```json
+{
+  "type": "method_definition",
+  "named": true,
+  "fields": {
+    "body": {
+      "multiple": false,
+      "required": true,
+      "types": [
+        {"type": "statement_block", "named": true}
+      ]
+    },
+    "decorator": {
+      "multiple": true,
+      "required": false,
+      "types": [
+        {"type": "decorator", "named": true}
+      ]
+    },
+    "name": {
+      "multiple": false,
+      "required": true,
+      "types": [
+        {"type": "computed_property_name", "named": true},
+        {"type": "property_identifier", "named": true},
+      ]
+    },
+    "parameters": {
+      "multiple": false,
+      "required": true,
+      "types": [
+        {"type": "formal_parameters", "named": true}
+      ]
+    }
+  }
+}
+```
+
+Example with children:
+
+```json
+{
+  "type": "array",
+  "named": true,
+  "fields": {},
+  "children": {
+    "multiple": true,
+    "required": false,
+    "types": [
+      {"type": "_expression", "named": true},
+      {"type": "spread_element", "named": true}
+    ]
+  }
+}
+```
+
+#### Supertype Nodes
+
+In Tree-sitter grammars, there are usually certain rules that represent abstract *categories* of syntax nodes (e.g. "expression", "type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules](./creating-parsers#hiding-rules) whose definition is a simple [`choice`](./creating-parsers#the-grammar-dsl) where each member is just a single symbol.
+
+Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it *will* show up in the node types file, with the following special entry:
+
+* `"subtypes"` - An array of objects that specify the *types* of nodes that this 'supertype' node can wrap.
+
+Example:
+
+```json
+{
+  "type": "_declaration",
+  "named": true,
+  "subtypes": [
+    {"type": "class_declaration", "named": true},
+    {"type": "function_declaration", "named": true},
+    {"type": "generator_function_declaration", "named": true},
+    {"type": "lexical_declaration", "named": true},
+    {"type": "variable_declaration", "named": true}
+  ]
+}
+```
+
+Supertype nodes will also appear elsewhere in the node types file, as children of other node types, in a way that corresponds with how the supertype rule was used in the grammar. This can make the node types much shorter and easier to read, because a single supertype will take the place of multiple subtypes.
+
+Example:
+
+```json
+{
+  "type": "export_statement",
+  "named": true,
+  "fields": {
+    "declaration": {
+      "multiple": false,
+      "required": false,
+      "types": [
+        {"type": "_declaration", "named": true}
+      ]
+    },
+    "source": {
+      "multiple": false,
+      "required": false,
+      "types": [
+        {"type": "string", "named": true}
+      ]
+    },
+  }
+}
+```
--- a/docs/section-3-creating-parsers.md
+++ b/docs/section-3-creating-parsers.md
@ -220,6 +220,7 @@ In addition to the `name` and `rules` fields, grammars have a few other optional
 * **`conflicts`** - an array of arrays of rule names. Each inner array represents a set of rules that's involved in an *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick the subtree whose corresponding rule has the highest total *dynamic precedence*.
 * **`externals`** - an array of token names which can be returned by an [*external scanner*](#external-scanners). External scanners allow you to write custom C code which runs during the lexing process in order to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions.
 * **`word`** - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization.
+* **`supertypes`** an array of hidden rule names which should be considered to be 'supertypes' in the generated [*node types* file][static-node-types].


 ## Writing the Grammar
@ -715,6 +716,7 @@ if (valid_symbols[INDENT] || valid_symbol[DEDENT]) {
 [nan]: https://github.com/nodejs/nan
 [node-module]: https://www.npmjs.com/package/tree-sitter-cli
 [node.js]: https://nodejs.org
+[static-node-types]: ./using-parsers#static-node-types
 [non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
 [npm]: https://docs.npmjs.com
 [path-env]: https://en.wikipedia.org/wiki/PATH_(variable)
--- a/docs/section-4-syntax-highlighting.md
+++ b/docs/section-4-syntax-highlighting.md
@ -81,7 +81,7 @@ These keys specify basic information about the parser:

 * `scope` (required) - A string like `"source.js"` that identifies the language. Currently, we strive to match the scope names used by popular [TextMate grammars](https://macromates.com/manual/en/language_grammars) and by the [Linguist](https://github.com/github/linguist) library.

-* `path` (optional) - A relative path from the directory containig `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.
+* `path` (optional) - A relative path from the directory containing `package.json` to another directory containing the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same folder as `package.json`), and this very rarely needs to be overridden.

 ### Language Detection