Merge remote-tracking branch 'upstream/master' into fix/wasm32-malloc

2026-01-16 01:06:59 +08:00 · 2026-01-16 01:06:59 +08:00 · 5fbb1b1ebd
commit 5fbb1b1ebd
parent e6ad0683ca 5d290a2a75
30 changed files with 146 additions and 118 deletions
--- a/crates/cli/README.md
+++ b/crates/cli/README.md
@ -7,7 +7,8 @@
 [npmjs.com]: https://www.npmjs.org/package/tree-sitter-cli
 [npmjs.com badge]: https://img.shields.io/npm/v/tree-sitter-cli.svg?color=%23BF4A4A

-The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on `MacOS`, `Linux`, and `Windows`.
+The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on `MacOS`,
+`Linux`, and `Windows`.

 ### Installation

@ -34,9 +35,11 @@ The `tree-sitter` binary itself has no dependencies, but specific commands have

 ### Commands

-* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation] for more information.
+* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current
+  working directory. See [the documentation] for more information.

-* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation] for more information.
+* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory.
+  See [the documentation] for more information.

 * `parse` - The `tree-sitter parse` command will parse a file (or list of files) using Tree-sitter parsers.

--- a/crates/language/wasm/src/stdlib.c
+++ b/crates/language/wasm/src/stdlib.c
@ -14,7 +14,6 @@ extern void tree_sitter_debug_message(const char *, size_t);

 #define PAGESIZE 0x10000
 #define MAX_HEAP_SIZE (4 * 1024 * 1024)
-#define MIN(a, b) ((a) < (b) ? (a) : (b))

 typedef struct {
  size_t size;
@ -151,7 +150,7 @@ void *realloc(void *ptr, size_t new_size) {
    return NULL;
  }

-  size_t copy_size = MIN(region->size, new_size);
+  size_t copy_size = (region->size < new_size) ? region->size : new_size;
  memcpy(result, &region->data, copy_size);
  free(ptr);
  return result;
--- a/docs/src/3-syntax-highlighting.md
+++ b/docs/src/3-syntax-highlighting.md
@ -73,9 +73,8 @@ The behaviors of these three files are described in the next section.

 ## Queries

-Tree-sitter's syntax highlighting system is based on *tree queries*, which are a general system for pattern-matching on Tree-sitter's
-syntax trees. See [this section][pattern matching] of the documentation for more information
-about tree queries.
+Tree-sitter's syntax highlighting system is based on *tree queries*, which are a general system for pattern-matching on
+Tree-sitter's syntax trees. See [this section][pattern matching] of the documentation for more information about tree queries.

 Syntax highlighting is controlled by *three* different types of query files that are usually included in the `queries` folder.
 The default names for the query files use the `.scm` file. We chose this extension because it commonly used for files written
--- a/docs/src/4-code-navigation.md
+++ b/docs/src/4-code-navigation.md
@ -3,7 +3,8 @@
 Tree-sitter can be used in conjunction with its [query language][query language] as a part of code navigation systems.
 An example of such a system can be seen in the `tree-sitter tags` command, which emits a textual dump of the interesting
 syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation][gh search].
-This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar.
+This document exists to describe how to integrate with such systems, and how to extend this functionality to any language
+with a Tree-sitter grammar.

 ## Tagging and captures

@ -12,9 +13,9 @@ entities. Having found them, you use a syntax capture to label the entity and it

 The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched
 (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used
-(i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture
-following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name
-of a given identifier.
+(i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax
+capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out
+the name of a given identifier.

 You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides
 two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring.
--- a/docs/src/6-contributing.md
+++ b/docs/src/6-contributing.md
@ -93,7 +93,8 @@ cargo xtask build-wasm-stdlib

 This command looks for the [Wasi SDK][wasi_sdk] indicated by the `TREE_SITTER_WASI_SDK_PATH`
 environment variable. If you don't have the binary, it can be downloaded from wasi-sdk's [releases][wasi-sdk-releases]
-page.
+page. Note that any changes to `crates/language/wasm/**` requires rebuilding the tree-sitter Wasm stdlib via
+`cargo xtask build-wasm-stdlib`.

 ### Debugging

--- a/docs/src/cli/build.md
+++ b/docs/src/cli/build.md
@ -19,8 +19,8 @@ will attempt to build the parser in the current working directory.
 ### `-w/--wasm`

 Compile the parser as a Wasm module. This command looks for the [Wasi SDK][wasi_sdk] indicated by the `TREE_SITTER_WASI_SDK_PATH`
-environment variable. If you don't have the binary, the CLI will attempt to download it for you to `<CACHE_DIR>/tree-sitter/wasi-sdk/`, where
-`<CACHE_DIR>` is resolved according to the [XDG base directory][XDG] or Window's [Known_Folder_Locations][Known_Folder].
+environment variable. If you don't have the binary, the CLI will attempt to download it for you to `<CACHE_DIR>/tree-sitter/wasi-sdk/`,
+where `<CACHE_DIR>` is resolved according to the [XDG base directory][XDG] or Window's [Known_Folder_Locations][Known_Folder].

 ### `-o/--output`

@ -37,7 +37,8 @@ in the external scanner does so using their allocator.

 ### `-0/--debug`

-Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or `lldb`.
+Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or
+`lldb`.

 [Known_Folder]: https://learn.microsoft.com/en-us/windows/win32/shell/knownfolderid
 [wasi_sdk]: https://github.com/WebAssembly/wasi-sdk
--- a/docs/src/cli/dump-languages.md
+++ b/docs/src/cli/dump-languages.md
@ -1,6 +1,8 @@
 # `tree-sitter dump-languages`

-The `dump-languages` command prints out a list of all the languages that the CLI knows about. This can be useful for debugging purposes, or for scripting. The paths to search comes from the config file's [`parser-directories`][parser-directories] object.
+The `dump-languages` command prints out a list of all the languages that the CLI knows about. This can be useful for debugging
+purposes, or for scripting. The paths to search comes from the config file's [`parser-directories`][parser-directories]
+object.

 ```bash
 tree-sitter dump-languages [OPTIONS] # Aliases: langs
@ -10,6 +12,7 @@ tree-sitter dump-languages [OPTIONS] # Aliases: langs

 ### `--config-path`

-The path to the configuration file. Ordinarily, the CLI will use the default location as explained in the [init-config](./init-config.md) command. This flag allows you to explicitly override that default, and use a config defined elsewhere.
+The path to the configuration file. Ordinarily, the CLI will use the default location as explained in the [init-config](./init-config.md)
+command. This flag allows you to explicitly override that default, and use a config defined elsewhere.

 [parser-directories]: ./init-config.md#parser-directories
--- a/docs/src/cli/generate.md
+++ b/docs/src/cli/generate.md
@ -1,6 +1,7 @@
 # `tree-sitter generate`

-The most important command for grammar development is `tree-sitter generate`, which reads the grammar in structured form and outputs C files that can be compiled into a shared or static library (e.g., using the [`build`](./build.md) command).
+The most important command for grammar development is `tree-sitter generate`, which reads the grammar in structured form
+and outputs C files that can be compiled into a shared or static library (e.g., using the [`build`](./build.md) command).

 ```bash
 tree-sitter generate [OPTIONS] [GRAMMAR_PATH] # Aliases: gen, g
@ -8,7 +9,8 @@ tree-sitter generate [OPTIONS] [GRAMMAR_PATH] # Aliases: gen, g

 The optional `GRAMMAR_PATH` argument should point to the structured grammar, in one of two forms:
 - `grammar.js` a (ESM or CJS) JavaScript file; if the argument is omitted, it defaults to `./grammar.js`.
- `grammar.json` a structured representation of the grammar that is created as a byproduct of `generate`; this can be used to regenerate a missing `parser.c` without requiring a JavaScript runtime (useful when distributing parsers to consumers).
+- `grammar.json` a structured representation of the grammar that is created as a byproduct of `generate`; this can be used
+to regenerate a missing `parser.c` without requiring a JavaScript runtime (useful when distributing parsers to consumers).

 If there is an ambiguity or *local ambiguity* in your grammar, Tree-sitter will detect it during parser generation, and
 it will exit with a `Unresolved conflict` error message. To learn more about conflicts and how to handle them, see
@ -21,7 +23,8 @@ in the user guide.
 - `src/tree_sitter/parser.h` provides basic C definitions that are used in the generated `parser.c` file.
 - `src/tree_sitter/alloc.h` provides memory allocation macros that can be used in an external scanner.
 - `src/tree_sitter/array.h` provides array macros that can be used in an external scanner.
- `src/grammar.json` contains a structured representation of the grammar; can be used to regenerate the parser without having to re-evaluate the `grammar.js`.
+- `src/grammar.json` contains a structured representation of the grammar; can be used to regenerate the parser without having
+to re-evaluate the `grammar.js`.
 - `src/node-types.json` provides type information about individual syntax nodes; see the section on [`Static Node Types`](../using-parsers/6-static-node-types.md).


@ -29,8 +32,8 @@ in the user guide.

 ### `-l/--log`

-Print the log of the parser generation process. This includes information such as what tokens are included in the error recovery state,
-what keywords were extracted, what states were split and why, and the entry point state.
+Print the log of the parser generation process. This includes information such as what tokens are included in the error
+recovery state, what keywords were extracted, what states were split and why, and the entry point state.

 ### `--abi <VERSION>`

@ -60,7 +63,8 @@ The path to the JavaScript runtime executable to use when generating the parser.
 Note that you can also set this with `TREE_SITTER_JS_RUNTIME`. Starting from version 0.26, you can
 also pass in `native` to use the experimental native QuickJS runtime that comes bundled with the CLI.
 This avoids the dependency on a JavaScript runtime entirely. The native QuickJS runtime is compatible
-with ESM as well as with CommonJS in strict mode. If your grammar depends on `npm` to install dependencies such as base grammars, the native runtime can be used *after* running `npm install`.
+with ESM as well as with CommonJS in strict mode. If your grammar depends on `npm` to install dependencies such as base
+grammars, the native runtime can be used *after* running `npm install`.

 ### `--disable-optimization`

--- a/docs/src/cli/highlight.md
+++ b/docs/src/cli/highlight.md
@ -52,7 +52,8 @@ The path to the directory containing the grammar.

 ### `--config-path <CONFIG_PATH>`

-The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more
+information.

 ### `-n/--test-number <TEST_NUMBER>`

--- a/docs/src/cli/index.md
+++ b/docs/src/cli/index.md
@ -1,6 +1,7 @@
 # CLI Overview

-The `tree-sitter` command-line interface is used to create, manage, test, and build tree-sitter parsers. It is controlled by
+The `tree-sitter` command-line interface is used to create, manage, test, and build tree-sitter parsers. It is controlled
+by

 - a personal `tree-sitter/config.json` config file generated by [`tree-sitter init-config`](./init-config.md)
 - a parser `tree-sitter.json` config file generated by [`tree-sitter init`](./init.md).
--- a/docs/src/cli/init.md
+++ b/docs/src/cli/init.md
@ -14,8 +14,11 @@ tree-sitter init [OPTIONS] # Aliases: i

 The following required files are always created if missing:

- `tree-sitter.json` - The main configuration file that determines how `tree-sitter` interacts with the grammar. If missing, the `init` command will prompt the user for the required fields. See [below](./init.md#structure-of-tree-sitterjson) for the full documentation of the structure of this file.
- `package.json` - The `npm` manifest for the parser. This file is required for some `tree-sitter` subcommands, and if the grammar has dependencies (e.g., another published base grammar that this grammar extends).
+- `tree-sitter.json` - The main configuration file that determines how `tree-sitter` interacts with the grammar. If missing,
+the `init` command will prompt the user for the required fields. See [below](./init.md#structure-of-tree-sitterjson) for
+the full documentation of the structure of this file.
+- `package.json` - The `npm` manifest for the parser. This file is required for some `tree-sitter` subcommands, and if the
+grammar has dependencies (e.g., another published base grammar that this grammar extends).
 - `grammar.js` - An empty template for the main grammar file; see [the section on creating parsers](../2-creating-parser).

 ### Language bindings
@ -130,8 +133,8 @@ be picked up by the cli.

 These keys help to decide whether the language applies to a given file:

- `file-types` — An array of filename suffix strings (not including the dot). The grammar will be used for files whose names end with one of
-these suffixes. Note that the suffix may match an *entire* filename.
+- `file-types` — An array of filename suffix strings (not including the dot). The grammar will be used for files whose names
+end with one of these suffixes. Note that the suffix may match an *entire* filename.

 - `first-line-regex` — A regex pattern that will be tested against the first line of a file
 to determine whether this language applies to the file. If present, this regex will be used for any file whose
@ -188,7 +191,8 @@ Each key is a language name, and the value is a boolean.

 Update outdated generated files, if possible.

-**Note:** Existing files that may have been edited manually are _not_ updated in general. To force an update to such files, remove them and call `tree-sitter init -u` again.
+**Note:** Existing files that may have been edited manually are _not_ updated in general. To force an update to such files,
+remove them and call `tree-sitter init -u` again.

 ### `-p/--grammar-path <PATH>`

--- a/docs/src/cli/parse.md
+++ b/docs/src/cli/parse.md
@ -78,7 +78,8 @@ Suppress main output.

 ### `--edits <EDITS>...`

-Apply edits after parsing the file. Edits are in the form of `row,col|position delcount insert_text` where row and col, or position are 0-indexed.
+Apply edits after parsing the file. Edits are in the form of `row,col|position delcount insert_text` where row and col,
+or position are 0-indexed.

 ### `--encoding <ENCODING>`

@ -95,7 +96,8 @@ Output parsing results in a JSON format.

 ### `--config-path <CONFIG_PATH>`

-The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more
+information.

 ### `-n/--test-number <TEST_NUMBER>`

--- a/docs/src/cli/playground.md
+++ b/docs/src/cli/playground.md
@ -7,8 +7,8 @@ tree-sitter playground [OPTIONS] # Aliases: play, pg, web-ui
 ```

 ```admonish note
-For this to work, you must have already built the parser as a Wasm module. This can be done with the [`build`](./build.md) subcommand
-(`tree-sitter build --wasm`).
+For this to work, you must have already built the parser as a Wasm module. This can be done with the [`build`](./build.md)
+subcommand (`tree-sitter build --wasm`).
 ```

 ## Options
--- a/docs/src/cli/query.md
+++ b/docs/src/cli/query.md
@ -47,8 +47,8 @@ The range of rows in which the query will be executed. The format is `start_row:

 ### `--containing-row-range <ROW_RANGE>`

-The range of rows in which the query will be executed. Only the matches that are fully contained within the provided row range
-will be returned.
+The range of rows in which the query will be executed. Only the matches that are fully contained within the provided row
+range will be returned.

 ### `--scope <SCOPE>`

@ -64,7 +64,8 @@ Whether to run query tests or not.

 ### `--config-path <CONFIG_PATH>`

-The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more
+information.

 ### `-n/--test-number <TEST_NUMBER>`

--- a/docs/src/cli/tags.md
+++ b/docs/src/cli/tags.md
@ -31,7 +31,8 @@ The path to the directory containing the grammar.

 ### `--config-path <CONFIG_PATH>`

-The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more
+information.

 ### `-n/--test-number <TEST_NUMBER>`

--- a/docs/src/cli/test.md
+++ b/docs/src/cli/test.md
@ -63,7 +63,8 @@ When using the `--debug-graph` option, open the log file in the default browser.

 ### `--config-path <CONFIG_PATH>`

-The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more
+information.

 ### `--show-fields`

--- a/docs/src/cli/version.md
+++ b/docs/src/cli/version.md
@ -25,11 +25,9 @@ tree-sitter version --bump minor # minor bump
 tree-sitter version --bump major # major bump
 ```

-As a grammar author, you should keep the version of your grammar in sync across
-different bindings. However, doing so manually is error-prone and tedious, so
-this command takes care of the burden. If you are using a version control system,
-it is recommended to commit the changes made by this command, and to tag the
-commit with the new version.
+As a grammar author, you should keep the version of your grammar in sync across different bindings. However, doing so manually
+is error-prone and tedious, so this command takes care of the burden. If you are using a version control system, it is recommended
+to commit the changes made by this command, and to tag the commit with the new version.

 To print the current version without bumping it, use:

--- a/docs/src/creating-parsers/2-the-grammar-dsl.md
+++ b/docs/src/creating-parsers/2-the-grammar-dsl.md
@ -17,8 +17,8 @@ DSL through the `RustRegex` class. Simply pass your regex pattern as a string:
  ```

  Unlike JavaScript's builtin `RegExp` class, which takes a pattern and flags as separate arguments, `RustRegex` only
-  accepts a single pattern string. While it doesn't support separate flags, you can use inline flags within the pattern itself.
-  For more details about Rust's regex syntax and capabilities, check out the [Rust regex documentation][rust regex].
+  accepts a single pattern string. While it doesn't support separate flags, you can use inline flags within the pattern
+  itself. For more details about Rust's regex syntax and capabilities, check out the [Rust regex documentation][rust regex].

  ```admonish note 
  Only a subset of the Regex engine is actually supported. This is due to certain features like lookahead and lookaround
@ -50,10 +50,10 @@ The previous `repeat` rule is implemented in `repeat1` but is included because i
 - **Options : `optional(rule)`** — This function creates a rule that matches *zero or one* occurrence of a given rule.
 It is analogous to the `[x]` (square bracket) syntax in EBNF notation.

- **Precedence : `prec(number, rule)`** — This function marks the given rule with a numerical precedence, which will be used
-to resolve [*LR(1) Conflicts*][lr-conflict] at parser-generation time. When two rules overlap in a way that represents either
-a true ambiguity or a *local* ambiguity given one token of lookahead, Tree-sitter will try to resolve the conflict by matching
-the rule with the higher precedence. The default precedence of all rules is zero. This works similarly to the
+- **Precedence : `prec(number, rule)`** — This function marks the given rule with a numerical precedence, which will be
+used to resolve [*LR(1) Conflicts*][lr-conflict] at parser-generation time. When two rules overlap in a way that represents
+either a true ambiguity or a *local* ambiguity given one token of lookahead, Tree-sitter will try to resolve the conflict
+by matching the rule with the higher precedence. The default precedence of all rules is zero. This works similarly to the
 [precedence directives][yacc-prec] in Yacc grammars.

  This function can also be used to assign lexical precedence to a given
@ -115,8 +115,8 @@ want to create syntax tree nodes at runtime.

 - **`conflicts`** — an array of arrays of rule names. Each inner array represents a set of rules that's involved in an
 *LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use
-the GLR algorithm to explore all the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick
-the subtree whose corresponding rule has the highest total *dynamic precedence*.
+the GLR algorithm to explore all the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will
+pick the subtree whose corresponding rule has the highest total *dynamic precedence*.

 - **`externals`** — an array of token names which can be returned by an
 [*external scanner*][external-scanners]. External scanners allow you to write custom C code which runs during the lexing
@ -139,10 +139,10 @@ for more details.
 array of reserved rules. The reserved rule in the array must be a terminal token meaning it must be a string, regex, token,
 or terminal rule. The reserved rule must also exist and be used in the grammar, specifying arbitrary tokens will not work.
 The *first* reserved word set in the object is the global word set, meaning it applies to every rule in every parse state.
-However, certain keywords are contextual, depending on the rule. For example, in JavaScript, keywords are typically not allowed
-as ordinary variables, however, they *can* be used as a property name. In this situation, the `reserved` function would be used,
-and the word set to pass in would be the name of the word set that is declared in the `reserved` object that corresponds to an
-empty array, signifying *no* keywords are reserved.
+However, certain keywords are contextual, depending on the rule. For example, in JavaScript, keywords are typically not
+allowed as ordinary variables, however, they *can* be used as a property name. In this situation, the `reserved` function
+would be used, and the word set to pass in would be the name of the word set that is declared in the `reserved` object that
+corresponds to an empty array, signifying *no* keywords are reserved.

 [bison-dprec]: https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
 [ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
--- a/docs/src/creating-parsers/3-writing-the-grammar.md
+++ b/docs/src/creating-parsers/3-writing-the-grammar.md
@ -1,7 +1,7 @@
 # Writing the Grammar

-Writing a grammar requires creativity. There are an infinite number of CFGs (context-free grammars) that can be used to describe
-any given language. To produce a good Tree-sitter parser, you need to create a grammar with two important properties:
+Writing a grammar requires creativity. There are an infinite number of CFGs (context-free grammars) that can be used to
+describe any given language. To produce a good Tree-sitter parser, you need to create a grammar with two important properties:

 1. **An intuitive structure** — Tree-sitter's output is a [concrete syntax tree][cst]; each node in the tree corresponds
 directly to a [terminal or non-terminal symbol][non-terminal] in the grammar. So to produce an easy-to-analyze tree, there
@ -139,8 +139,8 @@ instead. It's often useful to check your progress by trying to parse some real c
 ## Structuring Rules Well

 Imagine that you were just starting work on the [Tree-sitter JavaScript parser][tree-sitter-javascript]. Naively, you might
-try to directly mirror the structure of the [ECMAScript Language Spec][ecmascript-spec]. To illustrate the problem with this
-approach, consider the following line of code:
+try to directly mirror the structure of the [ECMAScript Language Spec][ecmascript-spec]. To illustrate the problem with
+this approach, consider the following line of code:

 ```js
 return x + y;
@ -181,16 +181,17 @@ which are unrelated to the actual code.

 ## Standard Rule Names

-Tree-sitter places no restrictions on how to name the rules of your grammar. It can be helpful, however, to follow certain conventions
-used by many other established grammars in the ecosystem. Some of these well-established patterns are listed below:
+Tree-sitter places no restrictions on how to name the rules of your grammar. It can be helpful, however, to follow certain
+conventions used by many other established grammars in the ecosystem. Some of these well-established patterns are listed
+below:

 - `source_file`: Represents an entire source file, this rule is commonly used as the root node for a grammar,
- `expression`/`statement`: Used to represent statements and expressions for a given language. Commonly defined as a choice between several
-more specific sub-expression/sub-statement rules.
+- `expression`/`statement`: Used to represent statements and expressions for a given language. Commonly defined as a choice
+between several more specific sub-expression/sub-statement rules.
 - `block`: Used as the parent node for block scopes, with its children representing the block's contents.
 - `type`: Represents the types of a language such as `int`, `char`, and `void`.
- `identifier`: Used for constructs like variable names, function arguments, and object fields; this rule is commonly used as the `word`
-token in grammars.
+- `identifier`: Used for constructs like variable names, function arguments, and object fields; this rule is commonly used
+as the `word` token in grammars.
 - `string`: Used to represent `"string literals"`.
 - `comment`: Used to represent comments, this rule is commonly used as an `extra`.

@ -308,9 +309,9 @@ This is where `prec.left` and `prec.right` come into use. We want to select the

 ## Using Conflicts

-Sometimes, conflicts are actually desirable. In our JavaScript grammar, expressions and patterns can create intentional ambiguity.
-A construct like `[x, y]` could be legitimately parsed as both an array literal (like in `let a = [x, y]`) or as a destructuring
-pattern (like in `let [x, y] = arr`).
+Sometimes, conflicts are actually desirable. In our JavaScript grammar, expressions and patterns can create intentional
+ambiguity. A construct like `[x, y]` could be legitimately parsed as both an array literal (like in `let a = [x, y]`) or
+as a destructuring pattern (like in `let [x, y] = arr`).

 ```js
 export default grammar({
@ -564,8 +565,8 @@ as mentioned in the previous page, is `token(prec(N, ...))`.
 ## Keywords

 Many languages have a set of _keyword_ tokens (e.g. `if`, `for`, `return`), as well as a more general token (e.g. `identifier`)
-that matches any word, including many of the keyword strings. For example, JavaScript has a keyword `instanceof`, which is
-used as a binary operator, like this:
+that matches any word, including many of the keyword strings. For example, JavaScript has a keyword `instanceof`, which
+is used as a binary operator, like this:

 ```js
 if (a instanceof Something) b();
--- a/docs/src/creating-parsers/4-external-scanners.md
+++ b/docs/src/creating-parsers/4-external-scanners.md
@ -143,10 +143,10 @@ the second argument, the current character will be treated as whitespace; whites
 associated with tokens emitted by the external scanner.

 - **`void (*mark_end)(TSLexer *)`** — A function for marking the end of the recognized token. This allows matching tokens
-that require multiple characters of lookahead. By default, (if you don't call `mark_end`), any character that you moved past
-using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later calls
-to `advance` will _not_ increase the size of the returned token. You can call `mark_end` multiple times to increase the size
-of the token.
+that require multiple characters of lookahead. By default, (if you don't call `mark_end`), any character that you moved
+past using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later
+calls to `advance` will _not_ increase the size of the returned token. You can call `mark_end` multiple times to increase
+the size of the token.

 - **`uint32_t (*get_column)(TSLexer *)`** — A function for querying the current column position of the lexer. It returns
 the number of codepoints since the start of the current line. The codepoint position is recalculated on every call to this
@ -185,9 +185,9 @@ if (valid_symbols[INDENT] || valid_symbols[DEDENT]) {

 ### Allocator

-Instead of using libc's `malloc`, `calloc`, `realloc`, and `free`, you should use the versions prefixed with `ts_` from `tree_sitter/alloc.h`.
-These macros can allow a potential consumer to override the default allocator with their own implementation, but by default
-will use the libc functions.
+Instead of using libc's `malloc`, `calloc`, `realloc`, and `free`, you should use the versions prefixed with `ts_` from
+`tree_sitter/alloc.h`. These macros can allow a potential consumer to override the default allocator with their own implementation,
+but by default will use the libc functions.

 As a consumer of the tree-sitter core library as well as any parser libraries that might use allocations, you can enable
 overriding the default allocator and have it use the same one as the library allocator, of which you can set with `ts_set_allocator`.
@ -195,7 +195,8 @@ To enable this overriding in scanners, you must compile them with the `TREE_SITT
 the library must be linked into your final app dynamically, since it needs to resolve the internal functions at runtime.
 If you are compiling an executable binary that uses the core library, but want to load parsers dynamically at runtime, then
 you will have to use a special linker flag on Unix. For non-Darwin systems, that would be `--dynamic-list` and for Darwin
-systems, that would be `-exported_symbols_list`. The CLI does exactly this, so you can use it as a reference (check out `cli/build.rs`).
+systems, that would be `-exported_symbols_list`. The CLI does exactly this, so you can use it as a reference (check out
+`cli/build.rs`).

 For example, assuming you wanted to allocate 100 bytes for your scanner, you'd do so like the following example:

@ -293,9 +294,10 @@ bool tree_sitter_my_language_external_scanner_scan(

 ## Other External Scanner Details

-External scanners have priority over Tree-sitter's normal lexing process. When a token listed in the externals array is valid
-at a given position, the external scanner is called first. This makes external scanners a powerful way to override Tree-sitter's
-default lexing behavior, especially for cases that can't be handled with regular lexical rules, parsing, or dynamic precedence.
+External scanners have priority over Tree-sitter's normal lexing process. When a token listed in the externals array is
+valid at a given position, the external scanner is called first. This makes external scanners a powerful way to override
+Tree-sitter's default lexing behavior, especially for cases that can't be handled with regular lexical rules, parsing, or
+dynamic precedence.

 During error recovery, Tree-sitter's first step is to call the external scanner's scan function with all tokens marked as
 valid. Your scanner should detect and handle this case appropriately. One simple approach is to add an unused "sentinel"
--- a/docs/src/creating-parsers/5-writing-tests.md
+++ b/docs/src/creating-parsers/5-writing-tests.md
@ -39,8 +39,8 @@ It only shows the *named* nodes, as described in [this section][named-vs-anonymo
 ```

  The expected output section can also *optionally* show the [*field names*][node-field-names] associated with each child
-  node. To include field names in your tests, you write a node's field name followed by a colon, before the node itself in
-  the S-expression:
+  node. To include field names in your tests, you write a node's field name followed by a colon, before the node itself
+  in the S-expression:

 ```query
 (source_file
@ -104,8 +104,8 @@ you can repeat the attribute on a new line.

 The following attributes are available:

-* `:cst` - This attribute specifies that the expected output should be in the form of a CST instead of the normal S-expression. This
-CST matches the format given by `parse --cst`.
+* `:cst` - This attribute specifies that the expected output should be in the form of a CST instead of the normal S-expression.
+This CST matches the format given by `parse --cst`.
 * `:error` — This attribute will assert that the parse tree contains an error. It's useful to just validate that a certain
 input is invalid without displaying the whole parse tree, as such you should omit the parse tree below the `---` line.
 * `:fail-fast` — This attribute will stop the testing of additional cases if the test marked with this attribute fails.
--- a/docs/src/creating-parsers/index.md
+++ b/docs/src/creating-parsers/index.md
@ -1,4 +1,4 @@
 # Creating parsers

-Developing Tree-sitter grammars can have a difficult learning curve, but once you get the hang of it, it can be fun and even
-zen-like. This document will help you to get started and to develop a useful mental model.
+Developing Tree-sitter grammars can have a difficult learning curve, but once you get the hang of it, it can be fun and
+even zen-like. This document will help you to get started and to develop a useful mental model.
--- a/docs/src/index.md
+++ b/docs/src/index.md
@ -10,7 +10,8 @@ file and efficiently update the syntax tree as the source file is edited. Tree-s
 - **General** enough to parse any programming language
 - **Fast** enough to parse on every keystroke in a text editor
 - **Robust** enough to provide useful results even in the presence of syntax errors
- **Dependency-free** so that the runtime library (which is written in pure [C11](https://github.com/tree-sitter/tree-sitter/tree/master/lib)) can be embedded in any application
+- **Dependency-free** so that the runtime library (which is written in pure [C11](https://github.com/tree-sitter/tree-sitter/tree/master/lib))
+can be embedded in any application

 ## Language Bindings

--- a/docs/src/using-parsers/2-basic-parsing.md
+++ b/docs/src/using-parsers/2-basic-parsing.md
@ -2,7 +2,8 @@

 ## Providing the Code

-In the example on the previous page, we parsed source code stored in a simple string using the `ts_parser_parse_string` function:
+In the example on the previous page, we parsed source code stored in a simple string using the `ts_parser_parse_string`
+function:

 ```c
 TSTree *ts_parser_parse_string(
@ -135,10 +136,10 @@ Consider a grammar rule like this:
 if_statement: $ => seq("if", "(", $._expression, ")", $._statement);
 ```

-A syntax node representing an `if_statement` in this language would have 5 children: the condition expression, the body statement,
-as well as the `if`, `(`, and `)` tokens. The expression and the statement would be marked as _named_ nodes, because they
-have been given explicit names in the grammar. But the `if`, `(`, and `)` nodes would _not_ be named nodes, because they
-are represented in the grammar as simple strings.
+A syntax node representing an `if_statement` in this language would have 5 children: the condition expression, the body
+statement, as well as the `if`, `(`, and `)` tokens. The expression and the statement would be marked as _named_ nodes,
+because they have been given explicit names in the grammar. But the `if`, `(`, and `)` nodes would _not_ be named nodes,
+because they are represented in the grammar as simple strings.

 You can check whether any given node is named:

--- a/docs/src/using-parsers/3-advanced-parsing.md
+++ b/docs/src/using-parsers/3-advanced-parsing.md
@ -19,8 +19,8 @@ typedef struct {
 void ts_tree_edit(TSTree *, const TSInputEdit *);
 ```

-Then, you can call `ts_parser_parse` again, passing in the old tree. This will create a new tree that internally shares structure
-with the old tree.
+Then, you can call `ts_parser_parse` again, passing in the old tree. This will create a new tree that internally shares
+structure with the old tree.

 When you edit a syntax tree, the positions of its nodes will change. If you have stored any `TSNode` instances outside of
 the `TSTree`, you must update their positions separately, using the same `TSInputEdit` value, in order to update their
--- a/docs/src/using-parsers/6-static-node-types.md
+++ b/docs/src/using-parsers/6-static-node-types.md
@ -108,9 +108,9 @@ In Tree-sitter grammars, there are usually certain rules that represent abstract
 "type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules][hidden rules]
 whose definition is a simple [`choice`][grammar dsl] where each member is just a single symbol.

-Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add
-a hidden rule to the grammar's [`supertypes` list][grammar dsl], then it _will_ show up in the node
-types file, with the following special entry:
+Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you
+add a hidden rule to the grammar's [`supertypes` list][grammar dsl], then it _will_ show up in the node types file, with
+the following special entry:

 - `"subtypes"` — An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.

--- a/docs/src/using-parsers/7-abi-versions.md
+++ b/docs/src/using-parsers/7-abi-versions.md
@ -15,8 +15,11 @@ A given version of the tree-sitter library is only able to load parsers generate
 | >=0.20.3, <=0.24    | 13                     | 14                     |
 | >=0.25              | 13                     | 15                     |

-By default, the tree-sitter CLI will generate parsers using the latest available ABI for that version, but an older ABI (supported by the CLI) can be selected by passing the [`--abi` option][abi_option] to the `generate` command.
+By default, the tree-sitter CLI will generate parsers using the latest available ABI for that version, but an older ABI
+(supported by the CLI) can be selected by passing the [`--abi` option][abi_option] to the `generate` command.

-Note that the ABI version range supported by the CLI can be smaller than for the library: When a new ABI version is released, older versions will be phased out over a deprecation period, which starts with no longer being able to generate parsers with the oldest ABI version.
+Note that the ABI version range supported by the CLI can be smaller than for the library: When a new ABI version is released,
+older versions will be phased out over a deprecation period, which starts with no longer being able to generate parsers
+with the oldest ABI version.

 [abi_option]: ../cli/generate.md#--abi-version
--- a/docs/src/using-parsers/index.md
+++ b/docs/src/using-parsers/index.md
@ -6,8 +6,8 @@ the core concepts remain the same.

 Tree-sitter's parsing functionality is implemented through its C API, with all functions documented in the [tree_sitter/api.h][api.h]
 header file, but if you're working in another language, you can use one of the following bindings found [here](../index.md#language-bindings),
-each providing idiomatic access to Tree-sitter's functionality. Of these bindings, the official ones have their own API docs
-hosted online at the following pages:
+each providing idiomatic access to Tree-sitter's functionality. Of these bindings, the official ones have their own API
+doc hosted online at the following pages:

 - [Go][go]
 - [Java]
--- a/docs/src/using-parsers/queries/1-syntax.md
+++ b/docs/src/using-parsers/queries/1-syntax.md
@ -1,9 +1,9 @@
 # Query Syntax

-A _query_ consists of one or more _patterns_, where each pattern is an [S-expression][s-exp] that matches a certain set of
-nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the
-node's type, and optionally, a series of other S-expressions that match the node's children. For example, this pattern would
-match any `binary_expression` node whose children are both `number_literal` nodes:
+A _query_ consists of one or more _patterns_, where each pattern is an [S-expression][s-exp] that matches a certain set
+of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things:
+the node's type, and optionally, a series of other S-expressions that match the node's children. For example, this pattern
+would match any `binary_expression` node whose children are both `number_literal` nodes:

 ```query
 (binary_expression (number_literal) (number_literal))
@ -99,10 +99,10 @@ by `(ERROR)` queries. Specific missing node types can also be queried:
 ### Supertype Nodes

 Some node types are marked as _supertypes_ in a grammar. A supertype is a node type that contains multiple
-subtypes. For example, in the [JavaScript grammar example][grammar], `expression` is a supertype that can represent any kind
-of expression, such as a `binary_expression`, `call_expression`, or `identifier`. You can use supertypes in queries to match
-any of their subtypes, rather than having to list out each subtype individually. For example, this pattern would match any
-kind of expression, even though it's not a visible node in the syntax tree:
+subtypes. For example, in the [JavaScript grammar example][grammar], `expression` is a supertype that can represent any
+kind of expression, such as a `binary_expression`, `call_expression`, or `identifier`. You can use supertypes in queries
+to match any of their subtypes, rather than having to list out each subtype individually. For example, this pattern would
+match any kind of expression, even though it's not a visible node in the syntax tree:

 ```query
 (expression) @any-expression
--- a/docs/src/using-parsers/queries/3-predicates-and-directives.md
+++ b/docs/src/using-parsers/queries/3-predicates-and-directives.md
@ -128,15 +128,15 @@ This pattern would match any builtin variable that is not a local variable, beca

 # Directives

-Similar to predicates, directives are a way to associate arbitrary metadata with a pattern. The only difference between predicates
-and directives is that directives end in a `!` character instead of `?` character.
+Similar to predicates, directives are a way to associate arbitrary metadata with a pattern. The only difference between
+predicates and directives is that directives end in a `!` character instead of `?` character.

 Tree-sitter's CLI supports the following directives by default:

 ## The `set!` directive

-This directive allows you to associate key-value pairs with a pattern. The key and value can be any arbitrary text that you
-see fit.
+This directive allows you to associate key-value pairs with a pattern. The key and value can be any arbitrary text that
+you see fit.

 ```query
 ((comment) @injection.content
@ -156,8 +156,8 @@ another capture are preserved. It takes two arguments, both of which are capture
 ### The `#strip!` directive

 The `#strip!` directive allows you to remove text from a capture. It takes two arguments: the first is the capture to strip
-text from, and the second is a regular expression to match against the text. Any text matched by the regular expression will
-be removed from the text associated with the capture.
+text from, and the second is a regular expression to match against the text. Any text matched by the regular expression
+will be removed from the text associated with the capture.

 For an example on the `#select-adjacent!` and `#strip!` directives,
 view the [code navigation](../../4-code-navigation.md#examples) documentation.