docs: migrate to mdbook

2024-12-24 00:43:07 -05:00 · 2024-12-24 00:43:07 -05:00 · 043969ef18
commit 043969ef18
parent 201b41cf11
57 changed files with 5114 additions and 3622 deletions
--- a/docs/src/3-syntax-highlighting.md
+++ b/docs/src/3-syntax-highlighting.md
@ -0,0 +1,433 @@
+# Syntax Highlighting
+
+Syntax highlighting is a very common feature in applications that deal with code. Tree-sitter has built-in support for
+syntax highlighting via the [`tree-sitter-highlight`][highlight crate] library, which is now used on GitHub.com for highlighting
+code written in several languages. You can also perform syntax highlighting at the command line using the
+`tree-sitter highlight` command.
+
+This document explains how the Tree-sitter syntax highlighting system works, using the command line interface. If you are
+using `tree-sitter-highlight` library (either from C or from Rust), all of these concepts are still applicable, but the
+configuration data is provided using in-memory objects, rather than files.
+
+## Overview
+
+All the files needed to highlight a given language are normally included in the same git repository as the Tree-sitter
+grammar for that language (for example, [`tree-sitter-javascript`][js grammar], [`tree-sitter-ruby`][ruby grammar]).
+To run syntax highlighting from the command-line, three types of files are needed:
+
+1. Per-user configuration in `~/.config/tree-sitter/config.json` (see the [init-config][init-config] page for more info).
+2. Language configuration in grammar repositories' `tree-sitter.json` files (see the [init][init] page for more info).
+3. Tree queries in the grammars repositories' `queries` folders.
+
+For an example of the language-specific files, see the [`tree-sitter.json` file][ts json] and [`queries` directory][queries]
+in the `tree-sitter-ruby` repository. The following sections describe the behavior of each file.
+
+## Language Configuration
+
+The `tree-sitter.json` file is used by the Tree-sitter CLI. Within this file, the CLI looks for data nested under the
+top-level `"grammars"` key. This key is expected to contain an array of objects with the following keys:
+
+### Basics
+
+These keys specify basic information about the parser:
+
+- `scope` (required) — A string like `"source.js"` that identifies the language. We strive to match the scope names used
+by popular [TextMate grammars][textmate] and by the [Linguist][linguist] library.
+
+- `path` (optional) — A relative path from the directory containing `tree-sitter.json` to another directory containing
+the `src/` folder, which contains the actual generated parser. The default value is `"."` (so that `src/` is in the same
+folder as `tree-sitter.json`), and this very rarely needs to be overridden.
+
+- `external-files` (optional) — A list of relative paths from the root dir of a
+parser to files that should be checked for modifications during recompilation.
+This is useful during development to have changes to other files besides scanner.c
+be picked up by the cli.
+
+### Language Detection
+
+These keys help to decide whether the language applies to a given file:
+
+- `file-types` — An array of filename suffix strings. The grammar will be used for files whose names end with one of these
+suffixes. Note that the suffix may match an *entire* filename.
+
+- `first-line-regex` — A regex pattern that will be tested against the first line of a file to determine whether this language
+applies to the file. If present, this regex will be used for any file whose language does not match any grammar's `file-types`.
+
+- `content-regex` — A regex pattern that will be tested against the contents of the file to break ties in cases where
+multiple grammars matched the file using the above two criteria. If the regex matches, this grammar will be preferred over
+another grammar with no `content-regex`. If the regex does not match, a grammar with no `content-regex` will be preferred
+over this one.
+
+- `injection-regex` — A regex pattern that will be tested against a *language name* ito determine whether this language
+should be used for a potential *language injection* site. Language injection is described in more detail in [a later section](#language-injection).
+
+### Query Paths
+
+These keys specify relative paths from the directory containing `tree-sitter.json` to the files that control syntax highlighting:
+
+- `highlights` — Path to a *highlight query*. Default: `queries/highlights.scm`
+- `locals` — Path to a *local variable query*. Default: `queries/locals.scm`.
+- `injections` — Path to an *injection query*. Default: `queries/injections.scm`.
+
+The behaviors of these three files are described in the next section.
+
+## Queries
+
+Tree-sitter's syntax highlighting system is based on *tree queries*, which are a general system for pattern-matching on Tree-sitter's
+syntax trees. See [this section][pattern matching] of the documentation for more information
+about tree queries.
+
+Syntax highlighting is controlled by *three* different types of query files that are usually included in the `queries` folder.
+The default names for the query files use the `.scm` file. We chose this extension because it commonly used for files written
+in [Scheme][scheme], a popular dialect of Lisp, and these query files use a Lisp-like syntax.
+
+### Highlights
+
+The most important query is called the highlights query. The highlights query uses *captures* to assign arbitrary
+*highlight names* to different nodes in the tree. Each highlight name can then be mapped to a color
+(as described in the [init-config command][theme]). Commonly used highlight names include
+`keyword`, `function`, `type`, `property`, and `string`. Names can also be dot-separated like `function.builtin`.
+
+#### Example Go Snippet
+
+For example, consider the following Go code:
+
+```go
+func increment(a int) int {
+    return a + 1
+}
+```
+
+With this syntax tree:
+
+```scheme
+(source_file
+  (function_declaration
+    name: (identifier)
+    parameters: (parameter_list
+      (parameter_declaration
+        name: (identifier)
+        type: (type_identifier)))
+    result: (type_identifier)
+    body: (block
+      (return_statement
+        (expression_list
+          (binary_expression
+            left: (identifier)
+            right: (int_literal)))))))
+```
+
+#### Example Query
+
+Suppose we wanted to render this code with the following colors:
+
+- keywords `func` and `return` in purple
+- function `increment` in blue
+- type `int` in green
+- number `5` brown
+
+We can assign each of these categories a *highlight name* using a query like this:
+
+```scheme
+; highlights.scm
+
+"func" @keyword
+"return" @keyword
+(type_identifier) @type
+(int_literal) @number
+(function_declaration name: (identifier) @function)
+```
+
+Then, in our config file, we could map each of these highlight names to a color:
+
+```json
+{
+  "theme": {
+    "keyword": "purple",
+    "function": "blue",
+    "type": "green",
+    "number": "brown"
+  }
+}
+```
+
+#### Highlights Result
+
+Running `tree-sitter highlight` on this Go file would produce output like this:
+
+<pre class='highlight' style='border: 1px solid #aaa;'>
+<span style='color: purple;'>func</span> <span style='color: #005fd7;'>increment</span>(<span>a</span> <span style='color: green;'>int</span>) <span style='color: green;'>int</span> {
+    <span style='color: purple;'>return</span> <span>a</span> <span style='font-weight: bold;color: #4e4e4e;'>+</span> <span style='font-weight: bold;color: #875f00;'>1</span>
+}
+</pre>
+
+### Local Variables
+
+Good syntax highlighting helps the reader to quickly distinguish between the different types of *entities* in their code.
+Ideally, if a given entity appears in *multiple* places, it should be colored the same in each place. The Tree-sitter syntax
+highlighting system can help you to achieve this by keeping track of local scopes and variables.
+
+The *local variables* query is different from the highlights query in that, while the highlights query uses *arbitrary*
+capture names, which can then be mapped to colors, the locals variable query uses a fixed set of capture names, each of
+which has a special meaning.
+
+The capture names are as follows:
+
+- `@local.scope` — indicates that a syntax node introduces a new local scope.
+- `@local.definition` — indicates that a syntax node contains the *name* of a definition within the current local scope.
+- `@local.reference` — indicates that a syntax node contains the *name*, which *may* refer to an earlier definition within
+some enclosing scope.
+
+When highlighting a file, Tree-sitter will keep track of the set of scopes that contains any given position, and the set
+of definitions within each scope. When processing a syntax node that is captured as a `local.reference`, Tree-sitter will
+try to find a definition for a name that matches the node's text. If it finds a match, Tree-sitter will ensure that the
+*reference*, and the *definition* are colored the same.
+
+The information produced by this query can also be *used* by the highlights query. You can *disable* a pattern for nodes,
+which have been identified as local variables by adding the predicate `(#is-not? local)` to the pattern. This is used in
+the example below:
+
+#### Example Ruby Snippet
+
+Consider this Ruby code:
+
+```ruby
+def process_list(list)
+  context = current_context
+  list.map do |item|
+    process_item(item, context)
+  end
+end
+
+item = 5
+list = [item]
+```
+
+With this syntax tree:
+
+```scheme
+(program
+  (method
+    name: (identifier)
+    parameters: (method_parameters
+      (identifier))
+    (assignment
+      left: (identifier)
+      right: (identifier))
+    (method_call
+      method: (call
+        receiver: (identifier)
+        method: (identifier))
+      block: (do_block
+        (block_parameters
+          (identifier))
+        (method_call
+          method: (identifier)
+          arguments: (argument_list
+            (identifier)
+            (identifier))))))
+  (assignment
+    left: (identifier)
+    right: (integer))
+  (assignment
+    left: (identifier)
+    right: (array
+      (identifier))))
+```
+
+There are several types of names within this method:
+
+- `process_list` is a method.
+- Within this method, `list` is a formal parameter
+- `context` is a local variable.
+- `current_context` is *not* a local variable, so it must be a method.
+- Within the `do` block, `item` is a formal parameter
+- Later on, `item` and `list` are both local variables (not formal parameters).
+
+#### Example Queries
+
+Let's write some queries that let us clearly distinguish between these types of names. First, set up the highlighting query,
+as described in the previous section. We'll assign distinct colors to method calls, method definitions, and formal parameters:
+
+```scheme
+; highlights.scm
+
+(call method: (identifier) @function.method)
+(method_call method: (identifier) @function.method)
+
+(method name: (identifier) @function.method)
+
+(method_parameters (identifier) @variable.parameter)
+(block_parameters (identifier) @variable.parameter)
+
+((identifier) @function.method
+ (#is-not? local))
+```
+
+Then, we'll set up a local variable query to keep track of the variables and scopes. Here, we're indicating that methods
+and blocks create local *scopes*, parameters and assignments create *definitions*, and other identifiers should be considered
+*references*:
+
+```scheme
+; locals.scm
+
+(method) @local.scope
+(do_block) @local.scope
+
+(method_parameters (identifier) @local.definition)
+(block_parameters (identifier) @local.definition)
+
+(assignment left:(identifier) @local.definition)
+
+(identifier) @local.reference
+```
+
+#### Locals Result
+
+Running `tree-sitter highlight` on this ruby file would produce output like this:
+
+<pre class='highlight' style='border: 1px solid #aaa;'>
+<span style='color: purple;'>def</span> <span style='color: #005fd7;'>process_list</span><span style='color: #4e4e4e;'>(</span><span style='text-decoration: underline;'>list</span><span style='color: #4e4e4e;'>)</span>
+  <span>context</span> <span style='font-weight: bold;color: #4e4e4e;'>=</span> <span style='color: #005fd7;'>current_context</span>
+  <span style='text-decoration: underline;'>list</span><span style='color: #4e4e4e;'>.</span><span style='color: #005fd7;'>map</span> <span style='color: purple;'>do</span> |<span style='text-decoration: underline;'>item</span>|
+    <span style='color: #005fd7;'>process_item</span>(<span style='text-decoration: underline;'>item</span><span style='color: #4e4e4e;'>,</span> <span>context</span><span style='color: #4e4e4e;'>)</span>
+  <span style='color: purple;'>end</span>
+<span style='color: purple;'>end</span>
+
+<span>item</span> <span style='font-weight: bold;color: #4e4e4e;'>=</span> <span style='font-weight: bold;color: #875f00;'>5</span>
+<span>list</span> <span style='font-weight: bold;color: #4e4e4e;'>=</span> [<span>item</span><span style='color: #4e4e4e;'>]</span>
+</pre>
+
+### Language Injection
+
+Some source files contain code written in multiple different languages. Examples include:
+
+- HTML files, which can contain JavaScript inside `<script>` tags and CSS inside `<style>` tags
+- [ERB][erb] files, which contain Ruby inside `<% %>` tags, and HTML outside those tags
+- PHP files, which can contain  HTML between the `<php` tags
+- JavaScript files, which contain regular expression syntax within regex literals
+- Ruby, which can contain snippets of code inside heredoc literals, where the heredoc delimiter often indicates the language
+
+All of these examples can be modeled in terms a *parent* syntax tree and one or more *injected* syntax trees, which reside
+*inside* of certain nodes in the parent tree. The language injection query allows you to specify these "injections" using
+the following captures:
+
+- `@injection.content` — indicates that the captured node should have its contents re-parsed using another language.
+- `@injection.language` — indicates that the captured node's text may contain the *name* of a language that should be used
+to re-parse the `@injection.content`.
+
+The language injection behavior can also be configured by some properties associated with patterns:
+
+- `injection.language` — can be used to hard-code the name of a specific language.
+- `injection.combined` — indicates that *all* the matching nodes in the tree
+  should have their content parsed as *one* nested document.
+- `injection.include-children` — indicates that the `@injection.content` node's
+  *entire* text should be re-parsed, including the text of its child nodes. By default,
+child nodes' text will be *excluded* from the injected document.
+- `injection.self` — indicates that the `@injection.content` node should be parsed
+  using the same language as the node itself. This is useful for cases where the
+  node's language is not known until runtime (e.g. via inheriting another language)
+- `injection.parent` indicates that the `@injection.content` node should be parsed
+  using the same language as the node's parent language. This is only meant for injections
+  that need to refer back to the parent language to parse the node's text inside
+  the injected language.
+
+#### Examples
+
+Consider this ruby code:
+
+```ruby
+system <<-BASH.strip!
+  abc --def | ghi > jkl
+BASH
+```
+
+With this syntax tree:
+
+```scheme
+(program
+  (method_call
+    method: (identifier)
+    arguments: (argument_list
+      (call
+        receiver: (heredoc_beginning)
+        method: (identifier))))
+  (heredoc_body
+    (heredoc_end)))
+```
+
+The following query would specify that the contents of the heredoc should be parsed using a language named "BASH"
+(because that is the text of the `heredoc_end` node):
+
+```scheme
+(heredoc_body
+  (heredoc_end) @injection.language) @injection.content
+```
+
+You can also force the language using the `#set!` predicate.
+For example, this will force the language to be always `ruby`.
+
+```scheme
+((heredoc_body) @injection.content
+ (#set! injection.language "ruby"))
+```
+
+## Unit Testing
+
+Tree-sitter has a built-in way to verify the results of syntax highlighting. The interface is based on [Sublime Text's system][sublime]
+for testing highlighting.
+
+Tests are written as normal source code files that contain specially-formatted *comments* that make assertions about the
+surrounding syntax highlighting. These files are stored in the `test/highlight` directory in a grammar repository.
+
+Here is an example of a syntax highlighting test for JavaScript:
+
+```js
+var abc = function(d) {
+  // <- keyword
+  //          ^ keyword
+  //               ^ variable.parameter
+  // ^ function
+
+  if (a) {
+  // <- keyword
+  // ^ punctuation.bracket
+
+    foo(`foo ${bar}`);
+    // <- function
+    //    ^ string
+    //          ^ variable
+  }
+
+  baz();
+  // <- !variable
+};
+```
+
+From the Sublime text docs:
+
+> The two types of tests are:
+>
+> **Caret**: ^ this will test the following selector against the scope on the most recent non-test line. It will test it
+> at the same column the ^ is in. Consecutive ^s will test each column against the selector.
+>
+> **Arrow**: <- this will test the following selector against the scope on the most recent non-test line. It will test it
+> at the same column as the comment character is in.
+
+Note that an exclamation mark (`!`) can be used to negate a selector. For example, `!keyword` will match any scope that is
+not the `keyword` class.
+
+[erb]: https://en.wikipedia.org/wiki/ERuby
+[highlight crate]: https://github.com/tree-sitter/tree-sitter/tree/master/highlight
+[init-config]: ./cli/init-config.md
+[init]: ./cli/init.md#structure-of-tree-sitterjson
+[js grammar]: https://github.com/tree-sitter/tree-sitter-javascript
+[linguist]: https://github.com/github/linguist
+[pattern matching]: ./using-parsers/queries/index.md
+[queries]: https://github.com/tree-sitter/tree-sitter-ruby/tree/master/queries
+[ruby grammar]: https://github.com/tree-sitter/tree-sitter-ruby
+[scheme]: https://en.wikipedia.org/wiki/Scheme_%28programming_language%29
+[sublime]: https://www.sublimetext.com/docs/3/syntax.html#testing
+[textmate]: https://macromates.com/manual/en/language_grammars
+[theme]: ./cli/init-config.md#theme
+[ts json]: https://github.com/tree-sitter/tree-sitter-ruby/blob/master/tree-sitter.json
--- a/docs/src/4-code-navigation.md
+++ b/docs/src/4-code-navigation.md
@ -0,0 +1,144 @@
+# Code Navigation Systems
+
+Tree-sitter can be used in conjunction with its [query language][query language] as a part of code navigation systems.
+An example of such a system can be seen in the `tree-sitter tags` command, which emits a textual dump of the interesting
+syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation][gh search].
+This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar.
+
+## Tagging and captures
+
+_Tagging_ is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those
+entities. Having found them, you use a syntax capture to label the entity and its name.
+
+The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched
+(i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used
+(i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture
+following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name
+of a given identifier.
+
+You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides
+two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring.
+`#strip!` takes a capture as its first argument and a regular expression as its second, expressed as a quoted string.
+Any text patterns matched by the regular expression will be removed from the text associated with the passed capture.
+`#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only nodes
+adjacent to the second capture are preserved. This can be useful when writing queries that would otherwise include too much
+information in matched comments.
+
+## Examples
+
+This [query][query] recognizes Python function definitions and captures their declared name. The `function_definition`
+syntax node is defined in the [Python Tree-sitter grammar][node].
+
+```query
+(function_definition
+  name: (identifier) @name) @definition.function
+```
+
+A more sophisticated query can be found in the [JavaScript Tree-sitter repository][js query]:
+
+```query
+(assignment_expression
+  left: [
+    (identifier) @name
+    (member_expression
+      property: (property_identifier) @name)
+  ]
+  right: [(arrow_function) (function)]
+) @definition.function
+```
+
+An even more sophisticated query is in the [Ruby Tree-sitter repository][ruby query], which uses built-in functions to
+strip the Ruby comment character (`#`) from the docstrings associated with a class or singleton-class declaration, then
+selects only the docstrings adjacent to the node matched as `@definition.class`.
+
+```query
+(
+  (comment)* @doc
+  .
+  [
+    (class
+      name: [
+        (constant) @name
+        (scope_resolution
+          name: (_) @name)
+      ]) @definition.class
+    (singleton_class
+      value: [
+        (constant) @name
+        (scope_resolution
+          name: (_) @name)
+      ]) @definition.class
+  ]
+  (#strip! @doc "^#\\s*")
+  (#select-adjacent! @doc @definition.class)
+)
+```
+
+The below table describes a standard vocabulary for kinds and roles during the tagging process. New applications may extend
+(or only recognize a subset of) these capture names, but it is desirable to standardize on the names below.
+
+| Category                 | Tag                         |
+| ------------------------ | --------------------------- |
+| Class definitions        | `@definition.class`         |
+| Function definitions     | `@definition.function`      |
+| Interface definitions    | `@definition.interface`     |
+| Method definitions       | `@definition.method`        |
+| Module definitions       | `@definition.module`        |
+| Function/method calls    | `@reference.call`           |
+| Class reference          | `@reference.class`          |
+| Interface implementation | `@reference.implementation` |
+
+## Command-line invocation
+
+You can use the `tree-sitter tags` command to test out a tags query file, passing as arguments one or more files to tag.
+We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb`:
+
+```ruby
+module Foo
+  class Bar
+    # won't be included
+
+    # is adjacent, will be
+    def baz
+    end
+  end
+end
+```
+
+Invoking `tree-sitter tags test.rb` produces the following console output, representing matched entities' name, role, location,
+first line, and docstring:
+
+```text
+    test.rb
+        Foo              | module       def (0, 7) - (0, 10) `module Foo`
+        Bar              | class        def (1, 8) - (1, 11) `class Bar`
+        baz              | method       def (2, 8) - (2, 11) `def baz`  "is adjacent, will be"
+```
+
+It is expected that tag queries for a given language are located at `queries/tags.scm` in that language's repository.
+
+## Unit Testing
+
+Tags queries may be tested with `tree-sitter test`. Files under `test/tags/` are checked using the same comment system as
+[highlights queries][unit testing]. For example, the above Ruby tags can be tested with these comments:
+
+```ruby
+module Foo
+  #     ^ definition.module
+  class Bar
+    #    ^ definition.class
+
+    def baz
+      #  ^ definition.method
+    end
+  end
+end
+```
+
+[gh search]: https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation
+[js query]: https://github.com/tree-sitter/tree-sitter-javascript/blob/fdeb68ac8d2bd5a78b943528bb68ceda3aade2eb/queries/tags.scm#L63-L70
+[node]: https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354
+[query]: https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5
+[ruby query]: https://github.com/tree-sitter/tree-sitter-ruby/blob/1ebfdb288842dae5a9233e2509a135949023dd82/queries/tags.scm#L24-L43
+[query language]: ./using-parsers/queries/index.md
+[unit testing]: ./3-syntax-highlighting.md#unit-testing
--- a/docs/src/5-implementation.md
+++ b/docs/src/5-implementation.md
@ -0,0 +1,60 @@
+# Implementation
+
+Tree-sitter consists of two components: a C library (`libtree-sitter`), and a command-line tool (the `tree-sitter` CLI).
+
+The library, `libtree-sitter`, is used in combination with the parsers
+generated by the CLI, to produce syntax trees from source code and keep the
+syntax trees up-to-date as the source code changes. `libtree-sitter` is designed to be embedded in applications. It is
+written in plain C. Its interface is specified in the header file [`tree_sitter/api.h`][api.h].
+
+The CLI is used to generate a parser for a language by supplying a [context-free grammar][cfg] describing the
+language. The CLI is a build tool; it is no longer needed once a parser has been generated. It is written in Rust, and is
+available on [crates.io][crates], [npm][npm], and as a pre-built binary [on GitHub][gh].
+
+## The CLI
+
+The `tree-sitter` CLI's most important feature is the `generate` command. This subcommand reads in a context-free grammar
+from a file called `grammar.js` and outputs a parser as a C file called `parser.c`. The source files in the [`cli/src`][src]
+directory all play a role in producing the code in `parser.c`. This section will describe some key parts of this process.
+
+### Parsing a Grammar
+
+First, Tree-sitter must evaluate the JavaScript code in `grammar.js` and convert the grammar to a JSON format. It does this
+by shelling out to `node`. The format of the grammars is formally specified by the JSON schema in [grammar.schema.json][schema].
+The parsing is implemented in [parse_grammar.rs][parse grammar].
+
+### Grammar Rules
+
+A Tree-sitter grammar is composed of a set of *rules* — objects that describe how syntax nodes can be composed of other
+syntax nodes. There are several types of rules: symbols, strings, regexes, sequences, choices, repetitions, and a few others.
+Internally, these are all represented using an [enum][enum] called [`Rule`][rules.rs].
+
+### Preparing a Grammar
+
+Once a grammar has been parsed, it must be transformed in several ways before it can be used to generate a parser. Each
+transformation is implemented by a separate file in the [`prepare_grammar`][prepare grammar] directory, and the transformations
+are ultimately composed together in `prepare_grammar/mod.rs`.
+
+At the end of these transformations, the initial grammar is split into two grammars: a *syntax grammar* and a *lexical grammar*.
+The syntax grammar describes how the language's [*non-terminal symbols*][symbols] are constructed from other grammar symbols,
+and the lexical grammar describes how the grammar's *terminal symbols* (strings and regexes) can be
+composed of individual characters.
+
+### Building Parse Tables
+
+## The Runtime
+
+WIP
+
+[api.h]: https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h
+[cfg]: https://en.wikipedia.org/wiki/Context-free_grammar
+[crates]: https://crates.io
+[npm]: https://npmjs.com
+[gh]: https://github.com/tree-sitter/tree-sitter/releases/latest
+[src]: https://github.com/tree-sitter/tree-sitter/tree/master/cli/src
+[schema]: https://tree-sitter.github.io/tree-sitter/assets/schemas/grammar.schema.json
+[parse grammar]: https://github.com/tree-sitter/tree-sitter/blob/master/cli/generate/src/parse_grammar.rs
+[enum]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html
+[rules.rs]: https://github.com/tree-sitter/tree-sitter/blob/master/cli/generate/src/rules.rs
+[prepare grammar]: https://github.com/tree-sitter/tree-sitter/tree/master/cli/generate/src/prepare_grammar
+[symbols]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
--- a/docs/src/6-contributing.md
+++ b/docs/src/6-contributing.md
@ -0,0 +1,184 @@
+# Contributing
+
+## Code of Conduct
+
+Contributors to Tree-sitter should abide by the [Contributor Covenant][covenant].
+
+## Developing Tree-sitter
+
+### Prerequisites
+
+To make changes to Tree-sitter, you should have:
+
+1. A C compiler, for compiling the core library and the generated parsers.
+2. A [Rust toolchain][rust], for compiling the Rust bindings, the highlighting library, and the CLI.
+3. Node.js and NPM, for generating parsers from `grammar.js` files.
+4. Either [Emscripten][emscripten], [Docker][docker], or [podman][podman] for
+compiling the library to WASM.
+
+### Building
+
+Clone the repository:
+
+```sh
+git clone https://github.com/tree-sitter/tree-sitter
+cd tree-sitter
+```
+
+Optionally, build the WASM library. If you skip this step, then the `tree-sitter playground` command will require an internet
+connection. If you have Emscripten installed, this will use your `emcc` compiler. Otherwise, it will use Docker or Podman:
+
+```sh
+cargo xtask build-wasm
+```
+
+Build the Rust libraries and the CLI:
+
+```sh
+cargo build --release
+```
+
+This will create the `tree-sitter` CLI executable in the `target/release` folder.
+
+If you want to automatically install the `tree-sitter` CLI in your system, you can run:
+
+```sh
+cargo install --path cli
+```
+
+If you're going to be in a fast iteration cycle and would like the CLI to build faster, you can use the `release-dev` profile:
+
+```sh
+cargo build --release --profile release-dev
+# or
+cargo install --path cli --profile release-dev
+```
+
+### Testing
+
+Before you can run the tests, you need to fetch some upstream grammars that are used for testing:
+
+```sh
+cargo xtask fetch-fixtures
+```
+
+To test any changes you've made to the CLI, you can regenerate these parsers using your current CLI code:
+
+```sh
+cargo xtask generate-fixtures
+```
+
+Then you can run the tests:
+
+```sh
+cargo xtask test
+```
+
+Similarly, to test the WASM binding, you need to compile these parsers to WASM:
+
+```sh
+cargo xtask generate-fixtures --wasm
+cargo xtask test-wasm
+```
+
+### Debugging
+
+The test script has a number of useful flags. You can list them all by running `cargo xtask test -h`.
+Here are some of the main flags:
+
+If you want to run a specific unit test, pass its name (or part of its name) as an argument:
+
+```sh
+cargo xtask test test_does_something
+```
+
+You can run the tests under the debugger (either `lldb` or `gdb`) using the `-g` flag:
+
+```sh
+cargo xtask test -g test_does_something
+```
+
+Part of the Tree-sitter test suite involves parsing the _corpus_ tests for several languages and performing randomized edits
+to each example in the corpus. If you just want to run the tests for a particular _language_, you can pass the `-l` flag.
+Additionally, if you want to run a particular _example_ from the corpus, you can pass the `-e` flag:
+
+```sh
+cargo xtask test -l javascript -e Arrays
+```
+
+## Published Packages
+
+The main [`tree-sitter/tree-sitter`][ts repo] repository contains the source code for
+several packages that are published to package registries for different languages:
+
+* Rust crates on [crates.io][crates]:
+  * [`tree-sitter`][lib crate] — A Rust binding to the core library
+  * [`tree-sitter-highlight`][highlight crate] — The syntax-highlighting library
+  * [`tree-sitter-cli`][cli crate] — The command-line tool
+
+* JavaScript modules on [npmjs.com][npmjs]:
+  * [`web-tree-sitter`][web-ts] — A WASM-based JavaScript binding to the core library
+  * [`tree-sitter-cli`][cli package] — The command-line tool
+
+There are also several other dependent repositories that contain other published packages:
+
+* [`tree-sitter/node-tree-sitter`][node ts] — Node.js bindings to the core library,
+published as [`tree-sitter`][node package] on npmjs.com
+* [`tree-sitter/py-tree-sitter`][py ts] — Python bindings to the core library,
+published as [`tree-sitter`][py package] on [PyPI.org][pypi].
+* [`tree-sitter/go-tree-sitter`][go ts] — Go bindings to the core library,
+published as [`tree_sitter`][go package] on [pkg.go.dev][go.dev].
+
+## Publishing New Releases (Maintainers Only)
+
+Publishing a new release of the CLI and lib requires these steps:
+
+1. Commit and push all outstanding changes and verify that CI passes:
+
+   ```sh
+   git commit -m "Fix things"
+   git push
+   ```
+
+2. Upgrade manifest files and create a new tag:
+
+   ```sh
+   cargo xtask bump-version --version <NEXT_VERSION>
+   ```
+
+   This will determine the current version, increment the version to the one specified, and update the relevant files for
+   Rust, Node, Zig, CMake, and Make. It will then create a commit and a tag for the new version. For more information
+   about the arguments that are allowed, see the documentation for the [`npm version`][npm version] command.
+
+3. Push the commit and the tag:
+
+   ```sh
+   git push
+   git push --tags
+   ```
+
+4. CI will build the binaries and upload them to the GitHub release and the NPM registry. It will also publish the Rust
+crates to crates.io.
+
+[cli crate]: https://crates.io/crates/tree-sitter-cli
+[cli package]: https://www.npmjs.com/package/tree-sitter-cli
+[covenant]: https://www.contributor-covenant.org/version/1/4/code-of-conduct
+[crates]: https://crates.io
+[docker]: https://www.docker.com
+[emscripten]: https://emscripten.org
+[go.dev]: https://pkg.go.dev
+[go package]: https://pkg.go.dev/github.com/tree-sitter/go-tree-sitter
+[go ts]: https://github.com/tree-sitter/go-tree-sitter
+[highlight crate]: https://crates.io/crates/tree-sitter-highlight
+[lib crate]: https://crates.io/crates/tree-sitter
+[node package]: https://www.npmjs.com/package/tree-sitter
+[node ts]: https://github.com/tree-sitter/node-tree-sitter
+[npm version]: https://docs.npmjs.com/cli/version
+[npmjs]: https://npmjs.com
+[podman]: https://podman.io
+[py package]: https://pypi.org/project/tree-sitter
+[py ts]: https://github.com/tree-sitter/py-tree-sitter
+[pypi]: https://pypi.org
+[rust]: https://rustup.rs
+[ts repo]: https://github.com/tree-sitter/tree-sitter
+[web-ts]: https://www.npmjs.com/package/web-tree-sitter
--- a/docs/src/7-playground.md
+++ b/docs/src/7-playground.md
@ -0,0 +1,106 @@
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.css">
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/clusterize.js/0.19.0/clusterize.min.css">
+
+<h1>Syntax Tree Playground</h1>
+
+<div id="playground-container" class="ts-playground" style="visibility: hidden;">
+
+<h2>Code</h2>
+
+<div class="custom-select">
+  <button id="language-button" class="select-button">
+    <span class="selected-value">JavaScript</span>
+    <svg class="arrow" width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
+      <polyline points="6 9 12 15 18 9"></polyline>
+    </svg>
+  </button>
+  <div class="select-dropdown">
+    <div class="option" data-value="bash">Bash</div>
+    <div class="option" data-value="c">C</div>
+    <div class="option" data-value="cpp">C++</div>
+    <div class="option" data-value="c_sharp">C#</div>
+    <div class="option" data-value="go">Go</div>
+    <div class="option" data-value="html">HTML</div>
+    <div class="option" data-value="java">Java</div>
+    <div class="option" data-value="javascript">JavaScript</div>
+    <div class="option" data-value="php">PHP</div>
+    <div class="option" data-value="python">Python</div>
+    <div class="option" data-value="ruby">Ruby</div>
+    <div class="option" data-value="rust">Rust</div>
+    <div class="option" data-value="toml">TOML</div>
+    <div class="option" data-value="typescript">TypeScript</div>
+    <div class="option" data-value="yaml">YAML</div>
+  </div>
+  <select id="language-select" style="display: none;">
+    <option value="bash">Bash</option>
+    <option value="c">C</option>
+    <option value="cpp">C++</option>
+    <option value="c_sharp">C#</option>
+    <option value="go">Go</option>
+    <option value="html">HTML</option>
+    <option value="java">Java</option>
+    <option value="javascript" selected="selected">JavaScript</option>
+    <option value="php">PHP</option>
+    <option value="python">Python</option>
+    <option value="ruby">Ruby</option>
+    <option value="rust">Rust</option>
+    <option value="toml">TOML</option>
+    <option value="typescript">TypeScript</option>
+    <option value="yaml">YAML</option>
+  </select>
+</div>
+
+<input id="logging-checkbox" type="checkbox"></input>
+<label for="logging-checkbox">Log</label>
+
+<input id="anonymous-nodes-checkbox" type="checkbox"></input>
+<label for="anonymous-nodes-checkbox">Show anonymous nodes</label>
+
+<input id="query-checkbox" type="checkbox"></input>
+<label for="query-checkbox">Query</label>
+
+<textarea id="code-input">
+</textarea>
+
+<div id="query-container" style="visibility: hidden; position: absolute;">
+<h2>Query</h2>
+<textarea id="query-input"></textarea>
+</div>
+
+<h2>Tree</h2>
+<span id="update-time"></span>
+<div id="output-container-scroll">
+<pre id="output-container" class="highlight"></pre>
+</div>
+
+<h2 id="about">About </h2>
+<p>You can try out tree-sitter with a few pre-selected grammars on this page.
+You can also run playground locally (with your own grammar) using the
+<a href="/cli/playground.html">CLI</a>'s <code>tree-sitter playground</code> subcommand.
+</p>
+<blockquote>
+<p><strong>Note:</strong> Logging (if enabled) can be viewed in the browser's console.</p>
+</blockquote>
+<p>The syntax tree should update as you type in the code. As you move around the
+code, the current node should be highlighted in the tree; you can also click any
+node in the tree to select the corresponding part of the code.</p>
+<p>You can enter one or more <a href="/using-parsers/queries/index.html">patterns</a>
+into the Query panel. If the query is valid, its captures will be
+highlighted both in the Code and in the Query panels. Otherwise
+the problematic parts of the query will be underlined, and detailed
+diagnostics will be available on hover. Note that to see any results
+you must use at least one capture, like <code>(node_name) @capture-name</code></p>
+
+</div>
+
+<script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.js"></script>
+
+<script>LANGUAGE_BASE_URL = "https://tree-sitter.github.io";</script>
+<script src="https://tree-sitter.github.io/tree-sitter.js"></script>
+
+<script src="https://cdnjs.cloudflare.com/ajax/libs/clusterize.js/0.19.0/clusterize.min.js"></script>
+<script>
+setTimeout(() => {
+window.initializePlayground()
+}, 1)
+</script>
--- a/docs/src/SUMMARY.md
+++ b/docs/src/SUMMARY.md
@ -0,0 +1,54 @@
+# Summary
+
+Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source
+file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
+
+General enough to parse any programming language
+Fast enough to parse on every keystroke in a text editor
+Robust enough to provide useful results even in the presence of syntax errors
+Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application
+
+[Introduction](./index.md)
+
+# User Guide
+
+- [Using Parsers](./using-parsers/index.md)
+  - [Getting Started](./using-parsers/1-getting-started.md)
+  - [Basic Parsing](./using-parsers/2-basic-parsing.md)
+  - [Advanced Parsing](./using-parsers/3-advanced-parsing.md)
+  - [Walking Trees](./using-parsers/4-walking-trees.md)
+  - [Queries](./using-parsers/queries/index.md)
+    - [Basic Syntax](./using-parsers/queries/1-syntax.md)
+    - [Operators](./using-parsers/queries/2-operators.md)
+    - [Predicates and Directives](./using-parsers/queries/3-predicates-and-directives.md)
+    - [API](./using-parsers/queries/4-api.md)
+  - [Static Node Types](./using-parsers/6-static-node-types.md)
+- [Creating Parsers](./creating-parsers/index.md)
+  - [Getting Started](./creating-parsers/1-getting-started.md)
+  - [The Grammar DSL](./creating-parsers/2-the-grammar-dsl.md)
+  - [Writing the Grammar](./creating-parsers/3-writing-the-grammar.md)
+  - [External Scanners](./creating-parsers/4-external-scanners.md)
+  - [Writing Tests](./creating-parsers/5-writing-tests.md)
+- [Syntax Highlighting](./3-syntax-highlighting.md)
+- [Code Navigation](./4-code-navigation.md)
+- [Implementation](./5-implementation.md)
+- [Contributing](./6-contributing.md)
+- [Playground](./7-playground.md)
+
+# Reference Guide
+
+- [Command Line Interface](./cli/index.md)
+  - [Init Config](./cli/init-config.md)
+  - [Init](./cli/init.md)
+  - [Generate](./cli/generate.md)
+  - [Build](./cli/build.md)
+  - [Parse](./cli/parse.md)
+  - [Test](./cli/test.md)
+  - [Version](./cli/version.md)
+  - [Fuzz](./cli/fuzz.md)
+  - [Query](./cli/query.md)
+  - [Highlight](./cli/highlight.md)
+  - [Tags](./cli/tags.md)
+  - [Playground](./cli/playground.md)
+  - [Dump Languages](./cli/dump-languages.md)
+  - [Complete](./cli/complete.md)
--- a/docs/src/cli/build.md
+++ b/docs/src/cli/build.md
@ -0,0 +1,43 @@
+# `tree-sitter build`
+
+The `build` command compiles your parser into a dynamically-loadable library,
+either as a shared object (`.so`, `.dylib`, or `.dll`) or as a WASM module.
+
+```bash
+tree-sitter build [OPTIONS] [PATH] # Aliases: b
+```
+
+You can change the compiler executable via the `CC` environment variable and add extra flags via `CFLAGS`.
+For macOS or iOS, you can set `MACOSX_DEPLOYMENT_TARGET` or `IPHONEOS_DEPLOYMENT_TARGET` respectively to define the
+minimum supported version.
+
+The path argument allows you to specify the directory of the parser to build. If you don't supply this argument, the CLI
+will attempt to build the parser in the current working directory.
+
+## Options
+
+### `-w/--wasm`
+
+Compile the parser as a WASM module.
+
+### `-d/--docker`
+
+Use Docker or Podman to supply Emscripten. This removes the need to install Emscripten on your machine locally.
+Note that this flag is only available when compiling to WASM.
+
+### `-o/--output`
+
+Specify where to output the shared object file (native or WASM). This flag accepts either an absolute path or a relative
+path. If you don't supply this flag, the CLI will attempt to figure out what the language name is based on the parent
+directory name to use for the output file. If the CLI can't figure it out, it will default to `parser`, thus generating
+`parser.so` or `parser.wasm` in the current working directory.
+
+### `--reuse-allocator`
+
+Reuse the allocator that's set in the core library for the parser's external scanner. This is useful in applications
+where the author overrides the default allocator with their own, and wants to ensure every parser that allocates memory
+in the external scanner does so using their allocator.
+
+### `-0/--debug`
+
+Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or `lldb`.
--- a/docs/src/cli/complete.md
+++ b/docs/src/cli/complete.md
@ -0,0 +1,16 @@
+# `tree-sitter complete`
+
+The `complete` command generates a completion script for your shell.
+This script can be used to enable autocompletion for the `tree-sitter` CLI.
+
+```bash
+tree-sitter complete --shell <SHELL> # Aliases: comp
+```
+
+## Options
+
+### `--shell <SHELL>`
+
+The shell for which to generate the completion script.
+
+Supported values: `bash`, `elvish`, `fish`, `power-shell`, `zsh`, and `nushell`.
--- a/docs/src/cli/dump-languages.md
+++ b/docs/src/cli/dump-languages.md
@ -0,0 +1,15 @@
+# `tree-sitter dump-languages`
+
+The `dump-languages` command prints out a list of all the languages that the CLI knows about. This can be useful for debugging purposes, or for scripting. The paths to search comes from the config file's [`parser-directories`][parser-directories] object.
+
+```bash
+tree-sitter dump-languages [OPTIONS] # Aliases: langs
+```
+
+## Options
+
+### `--config-path`
+
+The path to the configuration file. Ordinarily, the CLI will use the default location as explained in the [init-config](./init-config.md) command. This flag allows you to explicitly override that default, and use a config defined elsewhere.
+
+[parser-directories]: ./init-config.md#parser-directories
--- a/docs/src/cli/fuzz.md
+++ b/docs/src/cli/fuzz.md
@ -0,0 +1,49 @@
+# `tree-sitter fuzz`
+
+The `fuzz` command is used to fuzz a parser by performing random edits and ensuring that undoing these edits results in
+consistent parse trees. It will fail if the parse trees are not equal, or if the changed ranges are inconsistent.
+
+```bash
+tree-sitter fuzz [OPTIONS] # Aliases: f
+```
+
+## Options
+
+### `-s/--skip <SKIP>`
+
+A list of test names to skip fuzzing.
+
+### `--subdir <SUBDIR>`
+
+The directory containing the parser. This is primarily useful in multi-language repositories.
+
+### `--edits <EDITS>`
+
+The maximum number of edits to perform. The default is 3.
+
+### `--iterations <ITERATIONS>`
+
+The number of iterations to run. The default is 10.
+
+### `-i/--include <INCLUDE>`
+
+Only run tests whose names match this regex.
+
+### `-e/--exclude <EXCLUDE>`
+
+Skip tests whose names match this regex.
+
+### `--log-graphs`
+
+Outputs logs of the graphs of the stack and parse trees during parsing, as well as the actual parsing and lexing message.
+The graphs are constructed with [graphviz dot][dot], and the output is written to `log.html`.
+
+### `-l/--log`
+
+Outputs parsing and lexing logs. This logs to stderr.
+
+### `-r/--rebuild`
+
+Force a rebuild of the parser before running the fuzzer.
+
+[dot]: https://graphviz.org/doc/info/lang.html
--- a/docs/src/cli/generate.md
+++ b/docs/src/cli/generate.md
@ -0,0 +1,62 @@
+# `tree-sitter generate`
+
+The most important command you'll use is `tree-sitter generate`. This command reads the `grammar.js` file in your current
+working directory and creates a file called `src/parser.c`, which implements the parser. After making changes to your grammar,
+just run `tree-sitter generate` again.
+
+```bash
+tree-sitter generate [OPTIONS] [GRAMMAR_PATH] # Aliases: gen, g
+```
+
+The grammar path argument allows you to specify a path to a `grammar.js` JavaScript file, or `grammar.json` JSON file.
+In case your `grammar.js` file is in a non-standard path, you can specify it yourself. But, if you are using a parser
+where `grammar.json` was already generated, or it was hand-written, you can tell the CLI to generate the parser *based*
+on this JSON file. This avoids relying on a JavaScript file and avoids the dependency on a JavaScript runtime.
+
+If there is an ambiguity or *local ambiguity* in your grammar, Tree-sitter will detect it during parser generation, and
+it will exit with a `Unresolved conflict` error message. To learn more about conflicts and how to handle them, check out
+the section on [`Structuring Rules Well`](../creating-parsers/3-writing-the-grammar.md#structuring-rules-well)
+in the user guide.
+
+## Options
+
+### `-l/--log`
+
+Print the log of the parser generation process. This is really only useful if you know what you're doing, or are investigating
+a bug in the CLI itself. It logs info such as what tokens are included in the error recovery state,
+what keywords were extracted, what states were split and why, and the entry point state.
+
+### `--abi <VERSION>`
+
+The ABI to use for parser generation. The default is ABI 15, with ABI 14 being a supported target.
+
+### `-b/--build`
+
+Compile all defined languages in the current directory. The cli will automatically compile the parsers after generation,
+and place them in the cache dir.
+
+### `-0/--debug-build`
+
+Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or `lldb`.
+
+### `--libdir <PATH>`
+
+The directory to place the compiled parser(s) in.
+On Unix systems, the default path is `$XDG_CACHE_HOME/tree-sitter` if `$XDG_CACHE_HOME` is set,
+otherwise `$HOME/.config/tree-sitter` is used. On Windows, the default path is `%LOCALAPPDATA%\tree-sitter` if available,
+otherwise `$HOME\AppData\Local\tree-sitter` is used.
+
+### `-o/--output`
+
+The directory to place the generated parser in. The default is `src/` in the current directory.
+
+### `--report-states-for-rule <RULE>`
+
+Print the overview of states from the given rule. This is useful for debugging and understanding the generated parser's
+item sets for all given states in a given rule. To solely view state count numbers for rules, pass in `-` for the rule argument.
+To view the overview of states for every rule, pass in `*` for the rule argument.
+
+### `--js-runtime <EXECUTABLE>`
+
+The path to the JavaScript runtime executable to use when generating the parser. The default is `node`.
+Note that you can also set this with `TREE_SITTER_JS_RUNTIME`.
--- a/docs/src/cli/highlight.md
+++ b/docs/src/cli/highlight.md
@ -0,0 +1,51 @@
+# `tree-sitter highlight`
+
+You can run syntax highlighting on an arbitrary file using `tree-sitter highlight`. This can either output colors directly
+to your terminal using ANSI escape codes, or produce HTML (if the `--html` flag is passed). For more information, see
+[the syntax highlighting page](../3-syntax-highlighting.md).
+
+```bash
+tree-sitter highlight [OPTIONS] [PATHS]... # Aliases: hi
+```
+
+## Options
+
+### `-H/--html`
+
+Output an HTML document with syntax highlighting.
+
+### `--css-classes`
+
+Output HTML with CSS classes instead of inline styles.
+
+### `--check`
+
+Check that the highlighting captures conform strictly to the standards.
+
+### `--captures-path <CAPTURES_PATH>`
+
+The path to a file with captures. These captures would be considered the "standard" captures to compare against.
+
+### `--query-paths <QUERY_PATHS>`
+
+The paths to query files to use for syntax highlighting. These should end in `highlights.scm`.
+
+### `--scope <SCOPE>`
+
+The language scope to use for syntax highlighting. This is useful when the language is ambiguous.
+
+### `-t/--time`
+
+Print the time taken to highlight the file.
+
+### `-q/--quiet`
+
+Suppress main output.
+
+### `--paths <PATHS_FILE>`
+
+The path to a file that contains paths to source files to highlight
+
+### `--config-path <CONFIG_PATH>`
+
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
--- a/docs/src/cli/index.md
+++ b/docs/src/cli/index.md
@ -0,0 +1,4 @@
+# CLI Overview
+
+Let's go over all of the functionality of the `tree-sitter` command line interface.
+Once you feel that you have enough of a grasp on the CLI, you can move onto the grammar authoring section to learn more about writing your own parser.
--- a/docs/src/cli/init-config.md
+++ b/docs/src/cli/init-config.md
@ -0,0 +1,146 @@
+# `tree-sitter init-config`
+
+This command initializes a configuration file for the Tree-sitter CLI.
+
+```bash
+tree-sitter init-config
+```
+
+These directories are created in the "default" location for your platform:
+
+* On Unix, `$XDG_CONFIG_HOME/tree-sitter` or `$HOME/.config/tree-sitter`
+* On Windows, `%APPDATA%\tree-sitter` or `$HOME\AppData\Roaming\tree-sitter`
+
+> Note that the CLI will work if there's no config file present, falling back on default values > for each configuration
+> option.
+
+When you run the `init-config` command, it will print out the location of the file that it creates so that you can easily
+find and modify it.
+
+The configuration file is a JSON file that contains the following fields:
+
+## `parser-directories`
+
+The [`tree-sitter highlight`](./highlight.md) command takes one or more file paths, and tries to automatically determine,
+which language should be used to highlight those files. To do this, it needs to know *where* to look for Tree-sitter grammars
+on your filesystem. You can control this using the `"parser-directories"` key in your configuration file:
+
+```json
+{
+  "parser-directories": [
+    "/Users/my-name/code",
+    "/Users/my-name/other-code"
+  ]
+}
+```
+
+Any folder within one of these *parser directories* whose name begins with `tree-sitter-` will be treated as a Tree-sitter
+grammar repository.
+
+## `theme`
+
+The [Tree-sitter highlighting system](../3-syntax-highlighting.md) works by annotating ranges of source code with logical
+"highlight names" like `function.method`, `type.builtin`, `keyword`, etc. To decide what *color* should be used for rendering
+each highlight, a *theme* is needed.
+
+In your config file, the `"theme"` value is an object whose keys are dot-separated highlight names like
+`function.builtin` or `keyword`, and whose values are JSON expressions that represent text styling parameters.
+
+### Highlight Names
+
+A theme can contain multiple keys that share a common subsequence. Examples:
+
+* `variable` and `variable.parameter`
+* `function`, `function.builtin`, and `function.method`
+
+For a given highlight produced, styling will be determined based on the **longest matching theme key**. For example, the
+highlight `function.builtin.static` would match the key `function.builtin` rather than `function`.
+
+### Styling Values
+
+Styling values can be any of the following:
+
+* Integers from 0 to 255, representing ANSI terminal color ids.
+* Strings like `"#e45649"` representing hexadecimal RGB colors.
+* Strings naming basic ANSI colors like `"red"`, `"black"`, `"purple"`, or `"cyan"`.
+* Objects with the following keys:
+  * `color` — An integer or string as described above.
+  * `underline` — A boolean indicating whether the text should be underlined.
+  * `italic` — A boolean indicating whether the text should be italicized.
+  * `bold` — A boolean indicating whether the text should be bold-face.
+
+An example theme can be seen below:
+
+```json
+{
+  "function": 26,
+  "operator": {
+    "bold": true,
+    "color": 239
+  },
+  "variable.builtin": {
+    "bold": true
+  },
+  "variable.parameter": {
+    "underline": true
+  },
+  "type.builtin": {
+    "color": 23,
+    "bold": true
+  },
+  "keyword": 56,
+  "type": 23,
+  "number": {
+    "bold": true,
+    "color": 94
+  },
+  "constant": 94,
+  "attribute": {
+    "color": 124,
+    "italic": true
+  },
+  "comment": {
+    "color": 245,
+    "italic": true
+  },
+  "constant.builtin": {
+    "color": 94,
+    "bold": true
+  },
+}
+```
+
+## `parse-theme`
+
+The [`tree-sitter parse`](./parse.md) command will output a pretty-printed CST when the `-c/--cst` option is used. You can
+control what colors are used for various parts of the tree in your configuration file. Note that omitting a field will cause
+the relevant text to be rendered with its default color.
+
+```json
+{
+  "parse-theme": {
+    // The color of node kinds
+    "node-kind": [20, 20, 20],
+    // The color of text associated with a node
+    "node-text": [255, 255, 255],
+    // The color of node fields
+    "field": [42, 42, 42],
+    // The color of the range information for unnamed nodes
+    "row-color": [255, 255, 255],
+    // The color of the range information for named nodes
+    "row-color-named": [255, 130, 0],
+    // The color of extra nodes
+    "extra": [255, 0, 255],
+    // The color of ERROR nodes
+    "error": [255, 0, 0],
+    // The color of MISSING nodes and their associated text
+    "missing": [153, 75, 0],
+    // The color of newline characters
+    "line-feed": [150, 150, 150],
+    // The color of backtick characters
+    "backtick": [0, 200, 0],
+    // The color of literals
+    "literal": [0, 0, 200],
+  }
+}
+```
--- a/docs/src/cli/init.md
+++ b/docs/src/cli/init.md
@ -0,0 +1,190 @@
+# `tree-sitter init`
+
+The `init` command is your starting point for creating a new grammar. When you run it, it sets up a repository with all
+the essential files and structure needed for grammar development. Since the command includes git-related files by default,
+we recommend using git for version control of your grammar.
+
+```bash
+tree-sitter init [OPTIONS] # Aliases: i
+```
+
+## Options
+
+### `--update`
+
+Update outdated generated files, if needed.
+
+## Structure of `tree-sitter.json`
+
+The main file of interest for users to configure is `tree-sitter.json`, which tells the CLI information about your grammar,
+such as the location of queries.
+
+### The `grammars` field
+
+This field is an array of objects, though you typically only need one object in this array unless your repo has
+multiple grammars (for example, `Typescript` and `TSX`).
+
+### Example
+
+Typically, the objects in the `"tree-sitter"` array only needs to specify a few keys:
+
+```json
+{
+  "tree-sitter": [
+    {
+      "scope": "source.ruby",
+      "file-types": [
+        "rb",
+        "gemspec",
+        "Gemfile",
+        "Rakefile"
+      ],
+      "first-line-regex": "#!.*\\bruby$"
+    }
+  ]
+}
+```
+
+#### Basic Fields
+
+These keys specify basic information about the parser:
+
+- `scope` (required) — A string like `"source.js"` that identifies the language.
+We strive to match the scope names used by popular [TextMate grammars][textmate] and by the [Linguist][linguist] library.
+
+- `path` — A relative path from the directory containing `tree-sitter.json` to another directory containing the `src/`
+folder, which contains the actual generated parser. The default value is `"."`
+(so that `src/` is in the same folder as `tree-sitter.json`), and this very rarely needs to be overridden.
+
+- `external-files` — A list of relative paths from the root dir of a
+parser to files that should be checked for modifications during recompilation.
+This is useful during development to have changes to other files besides scanner.c
+be picked up by the cli.
+
+#### Language Detection
+
+These keys help to decide whether the language applies to a given file:
+
+- `file-types` — An array of filename suffix strings. The grammar will be used for files whose names end with one of
+these suffixes. Note that the suffix may match an *entire* filename.
+
+- `first-line-regex` — A regex pattern that will be tested against the first line of a file
+to determine whether this language applies to the file. If present, this regex will be used for any file whose
+language does not match any grammar's `file-types`.
+
+- `content-regex` — A regex pattern that will be tested against the contents of the file
+to break ties in cases where multiple grammars matched the file using the above two criteria. If the regex matches,
+this grammar will be preferred over another grammar with no `content-regex`. If the regex does not match, a grammar with
+no `content-regex` will be preferred over this one.
+
+- `injection-regex` — A regex pattern that will be tested against a *language name* to determine whether this language
+should be used for a potential *language injection* site.
+Language injection is described in more detail in [the relevant section](../3-syntax-highlighting.md#language-injection).
+
+#### Query Paths
+
+These keys specify relative paths from the directory containing `tree-sitter.json` to the files that control syntax highlighting:
+
+- `highlights` — Path to a *highlight query*. Default: `queries/highlights.scm`
+- `locals` — Path to a *local variable query*. Default: `queries/locals.scm`.
+- `injections` — Path to an *injection query*. Default: `queries/injections.scm`.
+- `tags` — Path to an *tag query*. Default: `queries/tags.scm`.
+
+### The `metadata` field
+
+This field contains information that tree-sitter will use to populate relevant bindings' files, especially their versions.
+Typically, this will all be set up when you run `tree-sitter init`, but you are welcome to update it as you see fit.
+
+- `version` (required) — The current version of your grammar, which should follow [semver][semver]
+- `license` — The license of your grammar, which should be a valid [SPDX license][spdx]
+- `description` — The brief description of your grammar
+- `authors` (required) — An array of objects that contain a `name` field, and optionally an `email` and `url` field.
+Each field is a string
+- `links` — An object that contains a `repository` field, and optionally a `homepage` field. Each field is a string
+- `namespace` — The namespace for the `Java` and `Kotlin` bindings, defaults to `io.github.tree-sitter` if not provided
+
+### The `bindings` field
+
+This field controls what bindings are generated when the `init` command is run.
+Each key is a language name, and the value is a boolean.
+
+- `c` (default: `true`)
+- `go` (default: `true`)
+- `java` (default: `false`)
+- `kotlin` (default: `false`)
+- `node` (default: `true`)
+- `python` (default: `true`)
+- `rust` (default: `true`)
+- `swift` (default: `false`)
+
+## Binding Files
+
+When you run `tree-sitter init`, the CLI will also generate a number of files in your repository that allow for your parser
+to be used from different language. Here is a list of these bindings files that are generated, and what their purpose is:
+
+### C/C++
+
+- `Makefile` — This file tells [`make`][make] how to compile your language.
+- `CMakeLists.txt` — This file tells [`cmake`][cmake] how to compile your language.
+- `bindings/c/tree-sitter-language.h` — This file provides the C interface of your language.
+- `bindings/c/tree-sitter-language.pc` — This file provides [pkg-config][pkg-config] metadata about your language's C library.
+- `src/tree_sitter/parser.h` — This file provides some basic C definitions that are used in your generated `parser.c` file.
+- `src/tree_sitter/alloc.h` — This file provides some memory allocation macros that are to be used in your external scanner,
+if you have one.
+- `src/tree_sitter/array.h` — This file provides some array macros that are to be used in your external scanner,
+if you have one.
+
+### Go
+
+- `go.mod` — This file is the manifest of the Go module.
+- `bindings/go/binding.go` — This file wraps your language in a Go module.
+- `bindings/go/binding_test.go` — This file contains a test for the Go package.
+
+### Node
+
+- `binding.gyp` — This file tells Node.js how to compile your language.
+- `package.json` — This file is the manifest of the Node.js package.
+- `bindings/node/binding.cc` — This file wraps your language in a JavaScript module for Node.js.
+- `bindings/node/index.js` — This is the file that Node.js initially loads when using your language.
+- `bindings/node/index.d.ts` — This file provides type hints for your parser when used in TypeScript.
+- `bindings/node/binding_test.js` — This file contains a test for the Node.js package.
+
+### Python
+
+- `pyproject.toml` — This file is the manifest of the Python package.
+- `setup.py` — This file tells Python how to compile your language.
+- `bindings/python/tree_sitter_language/binding.c` — This file wraps your language in a Python module.
+- `bindings/python/tree_sitter_language/__init__.py` — This file tells Python how to load your language.
+ `bindings/python/tree_sitter_language/__init__.pyi` — This file provides type hints for your parser when used in Python.
+- `bindings/python/tree_sitter_language/py.typed` — This file provides type hints for your parser when used in Python.
+- `bindings/python/tests/test_binding.py` — This file contains a test for the Python package.
+
+### Rust
+
+- `Cargo.toml` — This file is the manifest of the Rust package.
+- `bindings/rust/lib.rs` — This file wraps your language in a Rust crate when used in Rust.
+- `bindings/rust/build.rs` — This file wraps the building process for the Rust crate.
+
+### Swift
+
+- `Package.swift` — This file tells Swift how to compile your language.
+- `bindings/swift/TreeSitterLanguage/language.h` — This file wraps your language in a Swift module when used in Swift.
+- `bindings/swift/TreeSitterLanguageTests/TreeSitterLanguageTests.swift` — This file contains a test for the Swift package.
+
+### Additional Files
+
+Additionally, there's a few other files that are generated when you run `tree-sitter init`,
+that aim to improve the development experience:
+
+- `.editorconfig` — This file tells your editor how to format your code. More information about this file can be found [here][editorconfig]
+- `.gitattributes` — This file tells Git how to handle line endings, and tells GitHub what files are generated.
+- `.gitignore` — This file tells Git what files to ignore when committing changes.
+
+[cmake]: https://cmake.org/cmake/help/latest
+[editorconfig]: https://editorconfig.org
+[linguist]: https://github.com/github/linguist
+[make]: https://www.gnu.org/software/make/manual/make.html
+[pkg-config]: https://www.freedesktop.org/wiki/Software/pkg-config
+[semver]: https://semver.org
+[spdx]: https://spdx.org/licenses
+[textmate]: https://macromates.com/manual/en/language_grammars
--- a/docs/src/cli/parse.md
+++ b/docs/src/cli/parse.md
@ -0,0 +1,97 @@
+# `tree-sitter parse`
+
+The `parse` command parses source files using a Tree-sitter parser. You can pass any number of file paths and glob patterns
+to `tree-sitter parse`, and it will parse all the given files. The command will exit with a non-zero status code if any
+parse errors occurred.
+
+```bash
+tree-sitter parse [OPTIONS] [PATHS]... # Aliases: p
+```
+
+## Options
+
+### `--paths <PATHS_FILE>`
+
+The path to a file that contains paths to source files to parse.
+
+### `--scope <SCOPE>`
+
+The language scope to use for parsing. This is useful when the language is ambiguous.
+
+### `-d/--debug`
+
+Outputs parsing and lexing logs. This logs to stderr.
+
+### `-0/--debug-build`
+
+Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or `lldb`.
+
+### `-D/--debug-graph`
+
+Outputs logs of the graphs of the stack and parse trees during parsing, as well as the actual parsing and lexing message.
+The graphs are constructed with [graphviz dot][dot], and the output is written to `log.html`.
+
+### `--wasm`
+
+Compile and run the parser as a WASM module.
+
+### `--dot`
+
+Output the parse tree with [graphviz dot][dot].
+
+### `-x/--xml`
+
+Output the parse tree in XML format.
+
+### `-c/--cst`
+
+Output the parse tree in a pretty-printed CST format.
+
+### `-s/--stat`
+
+Show parsing statistics.
+
+### `--timeout <TIMEOUT>`
+
+Set the timeout for parsing a single file, in microseconds.
+
+### `-t/--time`
+
+Print the time taken to parse the file. If edits are provided, this will also print the time taken to parse the file after
+each edit.
+
+### `-q/--quiet`
+
+Suppress main output.
+
+### `--edits <EDITS>...`
+
+Apply edits after parsing the file. Edits are in the form of `row, col delcount insert_text` where row and col are 0-indexed.
+
+### `--encoding <ENCODING>`
+
+Set the encoding of the input file. By default, the CLI will look for the [`BOM`][bom] to determine if the file is encoded
+in `UTF-16BE` or `UTF-16LE`. If no `BOM` is present, `UTF-8` is the default. One of `utf8`, `utf16-le`, `utf16-be`.
+
+### `--open-log`
+
+When using the `--debug-graph` option, open the log file in the default browser.
+
+### `--config-path <CONFIG_PATH>`
+
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+
+### `-n/--test-number <TEST_NUMBER>`
+
+Parse a specific test in the corpus. The test number is the same number that appears in the output of `tree-sitter test`.
+
+### `-r/--rebuild`
+
+Force a rebuild of the parser before running tests.
+
+### `--no-ranges`
+
+Omit the node's ranges from the default parse output. This is useful when copying S-Expressions to a test file.
+
+[dot]: https://graphviz.org/doc/info/lang.html
+[bom]: https://en.wikipedia.org/wiki/Byte_order_mark
--- a/docs/src/cli/playground.md
+++ b/docs/src/cli/playground.md
@ -0,0 +1,20 @@
+# `tree-sitter playground`
+
+The `playground` command allows you to start a local playground to test your parser interactively.
+
+```bash
+tree-sitter playground [OPTIONS] # Aliases: play, pg, web-ui
+```
+
+Note that you must have already built the parser as a WASM module. This can be done with the [`build`](./build.md) subcommand
+(`tree-sitter build --wasm`).
+
+## Options
+
+### `-q/--quiet`
+
+Don't automatically open the playground in the default browser.
+
+### `--grammar-path <GRAMMAR_PATH>`
+
+The path to the directory containing the grammar and wasm files.
--- a/docs/src/cli/query.md
+++ b/docs/src/cli/query.md
@ -0,0 +1,45 @@
+# `tree-sitter query`
+
+The `query` command is used to run a query on a parser, and view the results.
+
+```bash
+tree-sitter query [OPTIONS] <QUERY_PATH> [PATHS]... # Aliases: q
+```
+
+## Options
+
+### `-t/--time`
+
+Print the time taken to execute the query on the file.
+
+### `-q/--quiet`
+
+Suppress main output.
+
+### `--paths <PATHS_FILE>`
+
+The path to a file that contains paths to source files in which the query will be executed.
+
+### `--byte-range <BYTE_RANGE>`
+
+The range of byte offsets in which the query will be executed. The format is `start_byte:end_byte`.
+
+### `--row-range <ROW_RANGE>`
+
+The range of rows in which the query will be executed. The format is `start_row:end_row`.
+
+### `--scope <SCOPE>`
+
+The language scope to use for parsing and querying. This is useful when the language is ambiguous.
+
+### `-c/--captures`
+
+Order the query results by captures instead of matches.
+
+### `--test`
+
+Whether to run query tests or not.
+
+### `--config-path <CONFIG_PATH>`
+
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
--- a/docs/src/cli/tags.md
+++ b/docs/src/cli/tags.md
@ -0,0 +1,30 @@
+# `tree-sitter tags`
+
+You can run symbol tagging on an arbitrary file using `tree-sitter tags`. This will output a list of tags.
+For more information, see [the code navigation page](../4-code-navigation.md#tagging-and-captures).
+
+```bash
+tree-sitter tags [OPTIONS] [PATHS]...
+```
+
+## Options
+
+### `--scope <SCOPE>`
+
+The language scope to use for symbol tagging. This is useful when the language is ambiguous.
+
+### `-t/--time`
+
+Print the time taken to generate tags for the file.
+
+### `-q/--quiet`
+
+Suppress main output.
+
+### `--paths <PATHS_FILE>`
+
+The path to a file that contains paths to source files to tag.
+
+### `--config-path <CONFIG_PATH>`
+
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
--- a/docs/src/cli/test.md
+++ b/docs/src/cli/test.md
@ -0,0 +1,68 @@
+# `tree-sitter test`
+
+The `test` command is used to run the test suite for a parser.
+
+```bash
+tree-sitter test [OPTIONS] # Aliases: t
+```
+
+## Options
+
+### `-i/--include <INCLUDE>`
+
+Only run tests whose names match this regex.
+
+### `-e/--exclude <EXCLUDE>`
+
+Skip tests whose names match this regex.
+
+### `-u/--update`
+
+Update the expected output of tests. Note that tests containing `ERROR` nodes or `MISSING` nodes will not be updated.
+
+### `-d/--debug`
+
+Outputs parsing and lexing logs. This logs to stderr.
+
+### `-0/--debug-build`
+
+Compile the parser with debug flags enabled. This is useful when debugging issues that require a debugger like `gdb` or `lldb`.
+
+### `-D/--debug-graph`
+
+Outputs logs of the graphs of the stack and parse trees during parsing, as well as the actual parsing and lexing message.
+The graphs are constructed with [graphviz dot][dot], and the output is written to `log.html`.
+
+### `--wasm`
+
+Compile and run the parser as a WASM module.
+
+### `--open-log`
+
+When using the `--debug-graph` option, open the log file in the default browser.
+
+### `--config-path <CONFIG_PATH>`
+
+The path to an alternative configuration (`config.json`) file. See [the init-config command](./init-config.md) for more information.
+
+### `--show-fields`
+
+Force showing fields in test diffs.
+
+### `--stat <STAT>`
+
+Show parsing statistics when tests are being run. One of `all`, `outliers-and-total`, or `total-only`.
+
+- `all`: Show statistics for every test.
+
+- `outliers-and-total`: Show statistics only for outliers, and total statistics.
+
+- `total-only`: Show only total statistics.
+
+### `-r/--rebuild`
+
+Force a rebuild of the parser before running tests.
+
+### `--overview-only`
+
+Only show the overview of the test results, and not the diff.
--- a/docs/src/cli/version.md
+++ b/docs/src/cli/version.md
@ -0,0 +1,24 @@
+# `tree-sitter version`
+
+The `version` command upgrades the version of your grammar.
+
+```bash
+tree-sitter version <VERSION> # Aliases: publish
+```
+
+This will update the version in several files, if they exist:
+
+* tree-sitter.json
+* Cargo.toml
+* Cargo.lock
+* package.json
+* package-lock.json
+* Makefile
+* CMakeLists.txt
+* pyproject.toml
+
+As a grammar author, you should keep the version of your grammar in sync across
+different bindings. However, doing so manually is error-prone and tedious, so
+this command takes care of the burden. If you are using a version control system,
+it is recommended to commit the changes made by this command, and to tag the
+commit with the new version.
--- a/docs/src/creating-parsers/1-getting-started.md
+++ b/docs/src/creating-parsers/1-getting-started.md
@ -0,0 +1,132 @@
+# Getting Started
+
+## Dependencies
+
+To develop a Tree-sitter parser, there are two dependencies that you need to install:
+
+- **A JavaScript runtime** — Tree-sitter grammars are written in JavaScript, and Tree-sitter uses a JavaScript runtime
+(the default being [Node.js][node.js]) to interpret JavaScript files. It requires this runtime command (default: `node`)
+to be in one of the directories in your [`PATH`][path-env].
+
+- **A C Compiler** — Tree-sitter creates parsers that are written in C. To run and test these parsers with the
+`tree-sitter parse` or `tree-sitter test` commands, you must have a C/C++ compiler installed. Tree-sitter will try to look
+for these compilers in the standard places for each platform.
+
+## Installation
+
+To create a Tree-sitter parser, you need to use [the `tree-sitter` CLI][tree-sitter-cli]. You can install the CLI in a few
+different ways:
+
+- Build the `tree-sitter-cli` [Rust crate][crate] from source using [`cargo`][cargo], the Rust package manager. This works
+on any platform. See [the contributing docs](../6-contributing.md#developing-tree-sitter) for more information.
+
+- Install the `tree-sitter-cli` [Rust crate][crate] from [crates.io][crates.io] using [`cargo`][cargo]. You can do so by
+running the following command: `cargo install tree-sitter-cli --locked`
+
+- Install the `tree-sitter-cli` [Node.js module][node-module] using [`npm`][npm], the Node package manager. This approach
+is fast, but is only works on certain platforms, because it relies on pre-built binaries.
+
+- Download a binary for your platform from [the latest GitHub release][releases], and put it into a directory on your `PATH`.
+
+## Project Setup
+
+The preferred convention is to name the parser repository "tree-sitter-" followed by the name of the language, in lowercase.
+
+```sh
+mkdir tree-sitter-${LOWER_PARSER_NAME}
+cd tree-sitter-${LOWER_PARSER_NAME}
+```
+
+Note that the `LOWER-` prefix here means the "lowercase" name of the language.
+
+### Init
+
+Once you've installed the `tree-sitter` CLI tool, you can start setting up your project, which will allow your parser to
+be used from multiple languages.
+
+```sh
+# This will prompt you for input
+tree-sitter init
+```
+
+The `init` command will create a bunch of files in the project.
+There should be a file called `grammar.js` with the following contents:
+
+```js
+/**
+ * @file PARSER_DESCRIPTION
+ * @author PARSER_AUTHOR_NAME PARSER_AUTHOR_EMAIL
+ * @license PARSER_LICENSE
+ */
+
+/// <reference types="tree-sitter-cli/dsl" />
+// @ts-check
+
+module.exports = grammar({
+  name: 'LOWER_PARSER_NAME',
+
+  rules: {
+    // TODO: add the actual grammar rules
+    source_file: $ => 'hello'
+  }
+});
+```
+
+Note that the placeholders shown above would be replaced with the corresponding data you provided in the `init` sub-command's
+prompts.
+
+To learn more about this command, check the [reference page](../cli/init.md).
+
+### Generate
+
+Next, run the following command:
+
+```sh
+tree-sitter generate
+```
+
+This will generate the C code required to parse this trivial language.
+
+You can test this parser by creating a source file with the contents "hello" and parsing it:
+
+```sh
+echo 'hello' > example-file
+tree-sitter parse example-file
+```
+
+Alternatively, in Windows PowerShell:
+
+```pwsh
+"hello" | Out-File example-file -Encoding utf8
+tree-sitter parse example-file
+```
+
+This should print the following:
+
+```text
+(source_file [0, 0] - [1, 0])
+```
+
+You now have a working parser.
+
+Finally, look back at the [triple-slash][] and [`@ts-check`][ts-check] comments in `grammar.js`; these tell your editor
+to provide documentation and type information as you edit your grammar. For these to work, you must download Tree-sitter's
+TypeScript API from npm into a `node_modules` directory in your project:
+
+```sh
+npm install # or your package manager of choice
+```
+
+To learn more about this command, check the [reference page](../cli/generate.md).
+
+[cargo]: https://doc.rust-lang.org/cargo/getting-started/installation.html
+[crate]: https://crates.io/crates/tree-sitter-cli
+[crates.io]: https://crates.io/crates/tree-sitter-cli
+[node-module]: https://www.npmjs.com/package/tree-sitter-cli
+[node.js]: https://nodejs.org
+[npm]: https://docs.npmjs.com
+[path-env]: https://en.wikipedia.org/wiki/PATH_(variable)
+[releases]: https://github.com/tree-sitter/tree-sitter/releases/latest
+[tree-sitter-cli]: https://github.com/tree-sitter/tree-sitter/tree/master/cli
+[triple-slash]: https://www.typescriptlang.org/docs/handbook/triple-slash-directives.html
+[ts-check]: https://www.typescriptlang.org/docs/handbook/intro-to-js-ts.html
--- a/docs/src/creating-parsers/2-the-grammar-dsl.md
+++ b/docs/src/creating-parsers/2-the-grammar-dsl.md
@ -0,0 +1,132 @@
+# The Grammar DSL
+
+The following is a complete list of built-in functions you can use in your `grammar.js` to define rules. Use-cases for some
+of these functions will be explained in more detail in later sections.
+
+- **Symbols (the `$` object)** — Every grammar rule is written as a JavaScript function that takes a parameter conventionally
+called `$`. The syntax `$.identifier` is how you refer to another grammar symbol within a rule. Names starting with `$.MISSING`
+or `$.UNEXPECTED` should be avoided as they have special meaning for the `tree-sitter test` command.
+- **String and Regex literals** — The terminal symbols in a grammar are described using JavaScript strings and regular
+expressions. Of course during parsing, Tree-sitter does not actually use JavaScript's regex engine to evaluate these regexes;
+it generates its own regex-matching logic as part of each parser. Regex literals are just used as a convenient way of writing
+regular expressions in your grammar.
+- **Regex Limitations** — Only a subset of the Regex engine is actually
+supported. This is due to certain features like lookahead and lookaround assertions
+not feasible to use in an LR(1) grammar, as well as certain flags being unnecessary
+for tree-sitter. However, plenty of features are supported by default:
+
+  - Character classes
+  - Character ranges
+  - Character sets
+  - Quantifiers
+  - Alternation
+  - Grouping
+  - Unicode character escapes
+  - Unicode property escapes
+
+- **Sequences : `seq(rule1, rule2, ...)`** — This function creates a rule that matches any number of other rules, one after
+another. It is analogous to simply writing multiple symbols next to each other in [EBNF notation][ebnf].
+
+- **Alternatives : `choice(rule1, rule2, ...)`** — This function creates a rule that matches *one* of a set of possible
+rules. The order of the arguments does not matter. This is analogous to the `|` (pipe) operator in EBNF notation.
+
+- **Repetitions : `repeat(rule)`** — This function creates a rule that matches *zero-or-more* occurrences of a given rule.
+It is analogous to the `{x}` (curly brace) syntax in EBNF notation.
+
+- **Repetitions : `repeat1(rule)`** — This function creates a rule that matches *one-or-more* occurrences of a given rule.
+The previous `repeat` rule is implemented in `repeat1` but is included because it is very commonly used.
+
+- **Options : `optional(rule)`** — This function creates a rule that matches *zero or one* occurrence of a given rule.
+It is analogous to the `[x]` (square bracket) syntax in EBNF notation.
+
+- **Precedence : `prec(number, rule)`** — This function marks the given rule with a numerical precedence, which will be used
+to resolve [*LR(1) Conflicts*][lr-conflict] at parser-generation time. When two rules overlap in a way that represents either
+a true ambiguity or a *local* ambiguity given one token of lookahead, Tree-sitter will try to resolve the conflict by matching
+the rule with the higher precedence. The default precedence of all rules is zero. This works similarly to the
+[precedence directives][yacc-prec] in Yacc grammars.
+
+- **Left Associativity : `prec.left([number], rule)`** — This function marks the given rule as left-associative (and optionally
+applies a numerical precedence). When an LR(1) conflict arises in which all the rules have the same numerical precedence,
+Tree-sitter will consult the rules' associativity. If there is a left-associative rule, Tree-sitter will prefer matching
+a rule that ends *earlier*. This works similarly to [associativity directives][yacc-prec] in Yacc grammars.
+
+- **Right Associativity : `prec.right([number], rule)`** — This function is like `prec.left`, but it instructs Tree-sitter
+to prefer matching a rule that ends *later*.
+
+- **Dynamic Precedence : `prec.dynamic(number, rule)`** — This function is similar to `prec`, but the given numerical precedence
+is applied at *runtime* instead of at parser generation time. This is only necessary when handling a conflict dynamically
+using the `conflicts` field in the grammar, and when there is a genuine *ambiguity*: multiple rules correctly match a given
+piece of code. In that event, Tree-sitter compares the total dynamic precedence associated with each rule, and selects the
+one with the highest total. This is similar to [dynamic precedence directives][bison-dprec] in Bison grammars.
+
+- **Tokens : `token(rule)`** — This function marks the given rule as producing only
+a single token. Tree-sitter's default is to treat each String or RegExp literal
+in the grammar as a separate token. Each token is matched separately by the lexer
+and returned as its own leaf node in the tree. The `token` function allows you to
+express a complex rule using the functions described above (rather than as a single
+regular expression) but still have Tree-sitter treat it as a single token.
+The token function will only accept terminal rules, so `token($.foo)` will not work.
+You can think of it as a shortcut for squashing complex rules of strings or regexes
+down to a single token.
+
+- **Immediate Tokens : `token.immediate(rule)`** — Usually, whitespace (and any other extras, such as comments) is optional
+before each token. This function means that the token will only match if there is no whitespace.
+
+- **Aliases : `alias(rule, name)`** — This function causes the given rule to *appear* with an alternative name in the syntax
+tree. If `name` is a *symbol*, as in `alias($.foo, $.bar)`, then the aliased rule will *appear* as a [named node][named-vs-anonymous-nodes]
+called `bar`. And if `name` is a *string literal*, as in `alias($.foo, 'bar')`, then the aliased rule will appear as an
+[anonymous node][named-vs-anonymous-nodes], as if the rule had been written as the simple string.
+
+- **Field Names : `field(name, rule)`** — This function assigns a *field name* to the child node(s) matched by the given
+rule. In the resulting syntax tree, you can then use that field name to access specific children.
+
+- **Reserved Keywords : `reserved(wordset, rule)`**  — This function will override the global reserved word set with the
+one passed into the `wordset` parameter. This is useful for contextual keywords, such as `if` in JavaScript, which cannot
+be used as a variable name in most contexts, but can be used as a property name.
+
+In addition to the `name` and `rules` fields, grammars have a few other optional public fields that influence the behavior
+of the parser.
+
+- **`extras`** — an array of tokens that may appear *anywhere* in the language. This is often used for whitespace and
+comments. The default value of `extras` is to accept whitespace. To control whitespace explicitly, specify
+`extras: $ => []` in your grammar.
+
+- **`inline`** — an array of rule names that should be automatically *removed* from the grammar by replacing all of their
+usages with a copy of their definition. This is useful for rules that are used in multiple places but for which you *don't*
+want to create syntax tree nodes at runtime.
+
+- **`conflicts`** — an array of arrays of rule names. Each inner array represents a set of rules that's involved in an
+*LR(1) conflict* that is *intended to exist* in the grammar. When these conflicts occur at runtime, Tree-sitter will use
+the GLR algorithm to explore all the possible interpretations. If *multiple* parses end up succeeding, Tree-sitter will pick
+the subtree whose corresponding rule has the highest total *dynamic precedence*.
+
+- **`externals`** — an array of token names which can be returned by an
+[*external scanner*][external-scanners]. External scanners allow you to write custom C code which runs during the lexing
+process to handle lexical rules (e.g. Python's indentation tokens) that cannot be described by regular expressions.
+
+- **`precedences`** — an array of arrays of strings, where each array of strings defines named precedence levels in descending
+order. These names can be used in the `prec` functions to define precedence relative only to other names in the array, rather
+than globally. Can only be used with parse precedence, not lexical precedence.
+
+- **`word`** — the name of a token that will match keywords to the
+[keyword extraction][keyword-extraction] optimization.
+
+- **`supertypes`** — an array of hidden rule names which should be considered to be 'supertypes' in the generated
+[*node types* file][static-node-types].
+
+- **`reserved`** — similar in structure to the main `rules` property, an object of reserved word sets associated with an
+array of reserved rules. The reserved rule in the array must be a terminal token meaning it must be a string, regex, or token,
+or a terminal rule. The *first* reserved word set in the object is the global word set, meaning it applies to every rule
+in every parse state. However, certain keywords are contextual, depending on the rule. For example, in JavaScript, keywords
+are typically not allowed as ordinary variables, however, they *can* be used as a property name. In this situation, the `reserved`
+function would be used, and the word set to pass in would be the name of the word set that is declared in the `reserved`
+object that coreesponds an empty array, signifying *no* keywords are reserved.
+
+[bison-dprec]: https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
+[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
+[external-scanners]: ./4-external-scanners.md
+[keyword-extraction]: ./3-writing-the-grammar.md#keyword-extraction
+[lr-conflict]: https://en.wikipedia.org/wiki/LR_parser#Conflicts_in_the_constructed_tables
+[named-vs-anonymous-nodes]: ../using-parsers/2-basic-parsing.md#named-vs-anonymous-nodes
+[static-node-types]: ../using-parsers/6-static-node-types.md
+[yacc-prec]: https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
--- a/docs/src/creating-parsers/3-writing-the-grammar.md
+++ b/docs/src/creating-parsers/3-writing-the-grammar.md
@ -0,0 +1,446 @@
+# Writing the Grammar
+
+Writing a grammar requires creativity. There are an infinite number of CFGs (context-free grammars) that can be used to describe
+any given language. To produce a good Tree-sitter parser, you need to create a grammar with two important properties:
+
+1. **An intuitive structure** — Tree-sitter's output is a [concrete syntax tree][cst]; each node in the tree corresponds
+directly to a [terminal or non-terminal symbol][non-terminal] in the grammar. So to produce an easy-to-analyze tree, there
+should be a direct correspondence between the symbols in your grammar and the recognizable constructs in the language.
+This might seem obvious, but it is very different from the way that context-free grammars are often written in contexts
+like [language specifications][language-spec] or [Yacc][yacc]/[Bison][bison] parsers.
+
+2. **A close adherence to LR(1)** — Tree-sitter is based on the [GLR parsing][glr-parsing] algorithm. This means that while
+it can handle any context-free grammar, it works most efficiently with a class of context-free grammars called [LR(1) Grammars][lr-grammars].
+In this respect, Tree-sitter's grammars are similar to (but less restrictive than) [Yacc][yacc] and [Bison][bison] grammars,
+but _different_ from [ANTLR grammars][antlr], [Parsing Expression Grammars][peg], or the [ambiguous grammars][ambiguous-grammar]
+commonly used in language specifications.
+
+It's unlikely that you'll be able to satisfy these two properties just by translating an existing context-free grammar directly
+into Tree-sitter's grammar format. There are a few kinds of adjustments that are often required.
+The following sections will explain these adjustments in more depth.
+
+## The First Few Rules
+
+It's usually a good idea to find a formal specification for the language you're trying to parse. This specification will
+most likely contain a context-free grammar. As you read through the rules of this CFG, you will probably discover a complex
+and cyclic graph of relationships. It might be unclear how you should navigate this graph as you define your grammar.
+
+Although languages have very different constructs, their constructs can often be categorized in to similar groups like
+_Declarations_, _Definitions_, _Statements_, _Expressions_, _Types_ and _Patterns_. In writing your grammar, a good first
+step is to create just enough structure to include all of these basic _groups_ of symbols. For a language like Go,
+you might start with something like this:
+
+```js
+{
+  // ...
+
+  rules: {
+    source_file: $ => repeat($._definition),
+
+    _definition: $ => choice(
+      $.function_definition
+      // TODO: other kinds of definitions
+    ),
+
+    function_definition: $ => seq(
+      'func',
+      $.identifier,
+      $.parameter_list,
+      $._type,
+      $.block
+    ),
+
+    parameter_list: $ => seq(
+      '(',
+       // TODO: parameters
+      ')'
+    ),
+
+    _type: $ => choice(
+      'bool'
+      // TODO: other kinds of types
+    ),
+
+    block: $ => seq(
+      '{',
+      repeat($._statement),
+      '}'
+    ),
+
+    _statement: $ => choice(
+      $.return_statement
+      // TODO: other kinds of statements
+    ),
+
+    return_statement: $ => seq(
+      'return',
+      $._expression,
+      ';'
+    ),
+
+    _expression: $ => choice(
+      $.identifier,
+      $.number
+      // TODO: other kinds of expressions
+    ),
+
+    identifier: $ => /[a-z]+/,
+
+    number: $ => /\d+/
+  }
+}
+```
+
+One important fact to know up front is that the start rule for the grammar is the first property in the `rules` object.
+In the example above, that would correspond to `source_file`, but it can be named anything.
+
+Some details of this grammar will be explained in more depth later on, but if you focus on the `TODO` comments, you can
+see that the overall strategy is _breadth-first_. Notably, this initial skeleton does not need to directly match an exact
+subset of the context-free grammar in the language specification. It just needs to touch on the major groupings of rules
+in as simple and obvious a way as possible.
+
+With this structure in place, you can now freely decide what part of the grammar to flesh out next. For example, you might
+decide to start with _types_. One-by-one, you could define the rules for writing basic types and composing them into more
+complex types:
+
+```js
+{
+  // ...
+
+  _type: $ => choice(
+    $.primitive_type,
+    $.array_type,
+    $.pointer_type
+  ),
+
+  primitive_type: $ => choice(
+    'bool',
+    'int'
+  ),
+
+  array_type: $ => seq(
+    '[',
+    ']',
+    $._type
+  ),
+
+  pointer_type: $ => seq(
+    '*',
+    $._type
+  )
+}
+```
+
+After developing the _type_ sublanguage a bit further, you might decide to switch to working on _statements_ or _expressions_
+instead. It's often useful to check your progress by trying to parse some real code using `tree-sitter parse`.
+
+**And remember to add tests for each rule in your `test/corpus` folder!**
+
+## Structuring Rules Well
+
+Imagine that you were just starting work on the [Tree-sitter JavaScript parser][tree-sitter-javascript]. Naively, you might
+try to directly mirror the structure of the [ECMAScript Language Spec][ecmascript-spec]. To illustrate the problem with this
+approach, consider the following line of code:
+
+```js
+return x + y;
+```
+
+According to the specification, this line is a `ReturnStatement`, the fragment `x + y` is an `AdditiveExpression`,
+and `x` and `y` are both `IdentifierReferences`. The relationship between these constructs is captured by a complex series
+of production rules:
+
+```text
+ReturnStatement          ->  'return' Expression
+Expression               ->  AssignmentExpression
+AssignmentExpression     ->  ConditionalExpression
+ConditionalExpression    ->  LogicalORExpression
+LogicalORExpression      ->  LogicalANDExpression
+LogicalANDExpression     ->  BitwiseORExpression
+BitwiseORExpression      ->  BitwiseXORExpression
+BitwiseXORExpression     ->  BitwiseANDExpression
+BitwiseANDExpression     ->  EqualityExpression
+EqualityExpression       ->  RelationalExpression
+RelationalExpression     ->  ShiftExpression
+ShiftExpression          ->  AdditiveExpression
+AdditiveExpression       ->  MultiplicativeExpression
+MultiplicativeExpression ->  ExponentiationExpression
+ExponentiationExpression ->  UnaryExpression
+UnaryExpression          ->  UpdateExpression
+UpdateExpression         ->  LeftHandSideExpression
+LeftHandSideExpression   ->  NewExpression
+NewExpression            ->  MemberExpression
+MemberExpression         ->  PrimaryExpression
+PrimaryExpression        ->  IdentifierReference
+```
+
+The language spec encodes the twenty different precedence levels of JavaScript expressions using twenty levels of indirection
+between `IdentifierReference` and `Expression`. If we were to create a concrete syntax tree representing this statement
+according to the language spec, it would have twenty levels of nesting, and it would contain nodes with names like `BitwiseXORExpression`,
+which are unrelated to the actual code.
+
+## Using Precedence
+
+To produce a readable syntax tree, we'd like to model JavaScript expressions using a much flatter structure like this:
+
+```js
+{
+  // ...
+
+  _expression: $ => choice(
+    $.identifier,
+    $.unary_expression,
+    $.binary_expression,
+    // ...
+  ),
+
+  unary_expression: $ => choice(
+    seq('-', $._expression),
+    seq('!', $._expression),
+    // ...
+  ),
+
+  binary_expression: $ => choice(
+    seq($._expression, '*', $._expression),
+    seq($._expression, '+', $._expression),
+    // ...
+  ),
+}
+```
+
+Of course, this flat structure is highly ambiguous. If we try to generate a parser, Tree-sitter gives us an error message:
+
+```text
+Error: Unresolved conflict for symbol sequence:
+
+  '-'  _expression  •  '*'  …
+
+Possible interpretations:
+
+  1:  '-'  (binary_expression  _expression  •  '*'  _expression)
+  2:  (unary_expression  '-'  _expression)  •  '*'  …
+
+Possible resolutions:
+
+  1:  Specify a higher precedence in `binary_expression` than in the other rules.
+  2:  Specify a higher precedence in `unary_expression` than in the other rules.
+  3:  Specify a left or right associativity in `unary_expression`
+  4:  Add a conflict for these rules: `binary_expression` `unary_expression`
+```
+
+<div class="warning">
+The • character in the error message indicates where exactly during
+parsing the conflict occurs, or in other words, where the parser is encountering
+ambiguity.
+</div>
+
+For an expression like `-a * b`, it's not clear whether the `-` operator applies to the `a * b` or just to the `a`. This
+is where the `prec` function [described in the previous page][grammar dsl] comes into play. By wrapping a rule with `prec`,
+we can indicate that certain sequence of symbols should _bind to each other more tightly_ than others. For example, the
+`'-', $._expression` sequence in `unary_expression` should bind more tightly than the `$._expression, '+', $._expression`
+sequence in `binary_expression`:
+
+```js
+{
+  // ...
+
+  unary_expression: $ =>
+    prec(
+      2,
+      choice(
+        seq("-", $._expression),
+        seq("!", $._expression),
+        // ...
+      ),
+    );
+}
+```
+
+## Using Associativity
+
+Applying a higher precedence in `unary_expression` fixes that conflict, but there is still another conflict:
+
+```text
+Error: Unresolved conflict for symbol sequence:
+
+  _expression  '*'  _expression  •  '*'  …
+
+Possible interpretations:
+
+  1:  _expression  '*'  (binary_expression  _expression  •  '*'  _expression)
+  2:  (binary_expression  _expression  '*'  _expression)  •  '*'  …
+
+Possible resolutions:
+
+  1:  Specify a left or right associativity in `binary_expression`
+  2:  Add a conflict for these rules: `binary_expression`
+```
+
+For an expression like `a * b * c`, it's not clear whether we mean `a * (b * c)` or `(a * b) * c`.
+This is where `prec.left` and `prec.right` come into use. We want to select the second interpretation, so we use `prec.left`.
+
+```js
+{
+  // ...
+
+  binary_expression: $ => choice(
+    prec.left(2, seq($._expression, '*', $._expression)),
+    prec.left(1, seq($._expression, '+', $._expression)),
+    // ...
+  ),
+}
+```
+
+## Hiding Rules
+
+You may have noticed in the above examples that some grammar rule name like `_expression` and `_type` began with an underscore.
+Starting a rule's name with an underscore causes the rule to be _hidden_ in the syntax tree. This is useful for rules like
+`_expression` in the grammars above, which always just wrap a single child node. If these nodes were not hidden, they would
+add substantial depth and noise to the syntax tree without making it any easier to understand.
+
+## Using Fields
+
+Often, it's easier to analyze a syntax node if you can refer to its children by _name_ instead of by their position in an
+ordered list. Tree-sitter grammars support this using the `field` function. This function allows you to assign unique names
+to some or all of a node's children:
+
+```js
+function_definition: $ =>
+  seq(
+    "func",
+    field("name", $.identifier),
+    field("parameters", $.parameter_list),
+    field("return_type", $._type),
+    field("body", $.block),
+  );
+```
+
+Adding fields like this allows you to retrieve nodes using the [field APIs][field-names-section].
+
+# Lexical Analysis
+
+Tree-sitter's parsing process is divided into two phases: parsing (which is described above) and [lexing][lexing] — the
+process of grouping individual characters into the language's fundamental _tokens_. There are a few important things to
+know about how Tree-sitter's lexing works.
+
+## Conflicting Tokens
+
+Grammars often contain multiple tokens that can match the same characters. For example, a grammar might contain the tokens
+(`"if"` and `/[a-z]+/`). Tree-sitter differentiates between these conflicting tokens in a few ways.
+
+1. **Context-aware Lexing** — Tree-sitter performs lexing on-demand, during the parsing process. At any given position
+in a source document, the lexer only tries to recognize tokens that are _valid_ at that position in the document.
+
+2. **Lexical Precedence** — When the precedence functions described [in the previous page][grammar dsl] are used _within_
+the `token` function, the given explicit precedence values serve as instructions to the lexer. If there are two valid tokens
+that match the characters at a given position in the document, Tree-sitter will select the one with the higher precedence.
+
+3. **Match Length** — If multiple valid tokens with the same precedence match the characters at a given position in a document,
+Tree-sitter will select the token that matches the [longest sequence of characters][longest-match].
+
+4. **Match Specificity** — If there are two valid tokens with the same precedence, and they both match the same number
+of characters, Tree-sitter will prefer a token that is specified in the grammar as a `String` over a token specified as
+a `RegExp`.
+
+5. **Rule Order** — If none of the above criteria can be used to select one token over another, Tree-sitter will prefer
+the token that appears earlier in the grammar.
+
+If there is an external scanner it may have [an additional impact][external scanner] over regular tokens
+defined in the grammar.
+
+## Lexical Precedence vs. Parse Precedence
+
+One common mistake involves not distinguishing _lexical precedence_ from _parse precedence_. Parse precedence determines
+which rule is chosen to interpret a given sequence of tokens. _Lexical precedence_ determines which token is chosen to interpret
+at a given position of text, and it is a lower-level operation that is done first. The above list fully captures Tree-sitter's
+lexical precedence rules, and you will probably refer back to this section of the documentation more often than any other.
+Most of the time when you really get stuck, you're dealing with a lexical precedence problem. Pay particular attention to
+the difference in meaning between using `prec` inside the `token` function versus outside it. The _lexical precedence_ syntax
+is `token(prec(N, ...))`.
+
+## Keywords
+
+Many languages have a set of _keyword_ tokens (e.g. `if`, `for`, `return`), as well as a more general token (e.g. `identifier`)
+that matches any word, including many of the keyword strings. For example, JavaScript has a keyword `instanceof`, which is
+used as a binary operator, like this:
+
+```js
+if (a instanceof Something) b();
+```
+
+The following, however, is not valid JavaScript:
+
+```js
+if (a instanceofSomething) b();
+```
+
+A keyword like `instanceof` cannot be followed immediately by another letter, because then it would be tokenized as an `identifier`,
+**even though an identifier is not valid at that position**. Because Tree-sitter uses context-aware lexing, as described
+[above](#conflicting-tokens), it would not normally impose this restriction. By default, Tree-sitter would recognize `instanceofSomething`
+as two separate tokens: the `instanceof` keyword followed by an `identifier`.
+
+## Keyword Extraction
+
+Fortunately, Tree-sitter has a feature that allows you to fix this, so that you can match the behavior of other standard
+parsers: the `word` token. If you specify a `word` token in your grammar, Tree-sitter will find the set of _keyword_ tokens
+that match strings also matched by the `word` token. Then, during lexing, instead of matching each of these keywords individually,
+Tree-sitter will match the keywords via a two-step process where it _first_ matches the `word` token.
+
+For example, suppose we added `identifier` as the `word` token in our JavaScript grammar:
+
+```js
+grammar({
+  name: "javascript",
+
+  word: $ => $.identifier,
+
+  rules: {
+    _expression: $ =>
+      choice(
+        $.identifier,
+        $.unary_expression,
+        $.binary_expression,
+        // ...
+      ),
+
+    binary_expression: $ =>
+      choice(
+        prec.left(1, seq($._expression, "instanceof", $._expression)),
+        // ...
+      ),
+
+    unary_expression: $ =>
+      choice(
+        prec.left(2, seq("typeof", $._expression)),
+        // ...
+      ),
+
+    identifier: $ => /[a-z_]+/,
+  },
+});
+```
+
+Tree-sitter would identify `typeof` and `instanceof` as keywords. Then, when parsing the invalid code above, rather than
+scanning for the `instanceof` token individually, it would scan for an `identifier` first, and find `instanceofSomething`.
+It would then correctly recognize the code as invalid.
+
+Aside from improving error detection, keyword extraction also has performance benefits. It allows Tree-sitter to generate
+a smaller, simpler lexing function, which means that **the parser will compile much more quickly**.
+
+[ambiguous-grammar]: https://en.wikipedia.org/wiki/Ambiguous_grammar
+[antlr]: https://www.antlr.org
+[bison]: https://en.wikipedia.org/wiki/GNU_bison
+[cst]: https://en.wikipedia.org/wiki/Parse_tree
+[ecmascript-spec]: https://262.ecma-international.org/6.0/
+[external scanner]: ./4-external-scanners.md#other-external-scanner-details
+[glr-parsing]: https://en.wikipedia.org/wiki/GLR_parser
+[grammar dsl]: ./2-the-grammar-dsl.md
+[language-spec]: https://en.wikipedia.org/wiki/Programming_language_specification
+[lexing]: https://en.wikipedia.org/wiki/Lexical_analysis
+[longest-match]: https://en.wikipedia.org/wiki/Maximal_munch
+[lr-grammars]: https://en.wikipedia.org/wiki/LR_parser
+[field-names-section]: ../using-parsers/2-basic-parsing.md#node-field-names
+[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
+[peg]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
+[tree-sitter-javascript]: https://github.com/tree-sitter/tree-sitter-javascript
+[yacc]: https://en.wikipedia.org/wiki/Yacc
--- a/docs/src/creating-parsers/4-external-scanners.md
+++ b/docs/src/creating-parsers/4-external-scanners.md
@ -0,0 +1,376 @@
+# External Scanners
+
+Many languages have some tokens whose structure is impossible or inconvenient to describe with a regular expression.
+Some examples:
+
+- [Indent and dedent][indent-tokens] tokens in Python
+- [Heredocs][heredoc] in Bash and Ruby
+- [Percent strings][percent-string] in Ruby
+
+Tree-sitter allows you to handle these kinds of tokens using _external scanners_. An external scanner is a set of C functions
+that you, the grammar author, can write by hand to add custom logic for recognizing certain tokens.
+
+To use an external scanner, there are a few steps. First, add an `externals` section to your grammar. This section should
+list the names of all of your external tokens. These names can then be used elsewhere in your grammar.
+
+```js
+grammar({
+  name: "my_language",
+
+  externals: $ => [$.indent, $.dedent, $.newline],
+
+  // ...
+});
+```
+
+Then, add another C source file to your project. Its path must be src/scanner.c for the CLI to recognize it. Be sure to add
+this file to the sources section of your `binding.gyp` file so that it will be included when your project is compiled by
+Node.js and uncomment the appropriate block in your bindings/rust/build.rs file so that it will be included in your Rust
+crate.
+
+In this new source file, define an [`enum`][enum] type containing the names of all of your external tokens. The ordering
+of this enum must match the order in your grammar's `externals` array; the actual names do not matter.
+
+```c
+#include "tree_sitter/parser.h"
+#include "tree_sitter/alloc.h"
+#include "tree_sitter/array.h"
+
+enum TokenType {
+  INDENT,
+  DEDENT,
+  NEWLINE
+}
+```
+
+Finally, you must define five functions with specific names, based on your language's name and five actions:
+_create_, _destroy_, _serialize_, _deserialize_, and _scan_.
+
+## Create
+
+```c
+void * tree_sitter_my_language_external_scanner_create() {
+  // ...
+}
+```
+
+This function should create your scanner object. It will only be called once anytime your language is set on a parser.
+Often, you will want to allocate memory on the heap and return a pointer to it. If your external scanner doesn't need to
+maintain any state, it's ok to return `NULL`.
+
+## Destroy
+
+```c
+void tree_sitter_my_language_external_scanner_destroy(void *payload) {
+  // ...
+}
+```
+
+This function should free any memory used by your scanner. It is called once when a parser is deleted or assigned a different
+language. It receives as an argument the same pointer that was returned from the _create_ function. If your _create_ function
+didn't allocate any memory, this function can be a noop.
+
+## Serialize
+
+```c
+unsigned tree_sitter_my_language_external_scanner_serialize(
+  void *payload,
+  char *buffer
+) {
+  // ...
+}
+```
+
+This function should copy the complete state of your scanner into a given byte buffer, and return the number of bytes written.
+The function is called every time the external scanner successfully recognizes a token. It receives a pointer to your scanner
+and a pointer to a buffer. The maximum number of bytes that you can write is given by the `TREE_SITTER_SERIALIZATION_BUFFER_SIZE`
+constant, defined in the `tree_sitter/parser.h` header file.
+
+The data that this function writes will ultimately be stored in the syntax tree so that the scanner can be restored to the
+right state when handling edits or ambiguities. For your parser to work correctly, the `serialize` function must store its
+entire state, and `deserialize` must restore the entire state. For good performance, you should design your scanner so that
+its state can be serialized as quickly and compactly as possible.
+
+## Deserialize
+
+```c
+void tree_sitter_my_language_external_scanner_deserialize(
+  void *payload,
+  const char *buffer,
+  unsigned length
+) {
+  // ...
+}
+```
+
+This function should _restore_ the state of your scanner based the bytes that were previously written by the `serialize`
+function. It is called with a pointer to your scanner, a pointer to the buffer of bytes, and the number of bytes that should
+be read. It is good practice to explicitly erase your scanner state variables at the start of this function, before restoring
+their values from the byte buffer.
+
+## Scan
+
+```c
+bool tree_sitter_my_language_external_scanner_scan(
+  void *payload,
+  TSLexer *lexer,
+  const bool *valid_symbols
+) {
+  // ...
+}
+```
+
+This function is responsible for recognizing external tokens. It should return `true` if a token was recognized, and `false`
+otherwise. It is called with a "lexer" struct with the following fields:
+
+- **`int32_t lookahead`** — The current next character in the input stream, represented as a 32-bit unicode code point.
+
+- **`TSSymbol result_symbol`** — The symbol that was recognized. Your scan function should _assign_ to this field one of
+the values from the `TokenType` enum, described above.
+
+- **`void (*advance)(TSLexer *, bool skip)`** — A function for advancing to the next character. If you pass `true` for
+the second argument, the current character will be treated as whitespace; whitespace won't be included in the text range
+associated with tokens emitted by the external scanner.
+
+- **`void (*mark_end)(TSLexer *)`** — A function for marking the end of the recognized token. This allows matching tokens
+that require multiple characters of lookahead. By default, (if you don't call `mark_end`), any character that you moved past
+using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later calls
+to `advance` will _not_ increase the size of the returned token. You can call `mark_end` multiple times to increase the size
+of the token.
+
+- **`uint32_t (*get_column)(TSLexer *)`** — A function for querying the current column position of the lexer. It returns
+the number of codepoints since the start of the current line. The codepoint position is recalculated on every call to this
+function by reading from the start of the line.
+
+- **`bool (*is_at_included_range_start)(const TSLexer *)`** — A function for checking whether the parser has just skipped
+some characters in the document. When parsing an embedded document using the `ts_parser_set_included_ranges` function
+(described in the [multi-language document section][multi-language-section]), the scanner may want to apply some special
+behavior when moving to a disjoint part of the document. For example, in [EJS documents][ejs], the JavaScript parser uses
+this function to enable inserting automatic semicolon tokens in between the code directives, delimited by `<%` and `%>`.
+
+- **`bool (*eof)(const TSLexer *)`** — A function for determining whether the lexer is at the end of the file. The value
+of `lookahead` will be `0` at the end of a file, but this function should be used instead of checking for that value because
+the `0` or "NUL" value is also a valid character that could be present in the file being parsed.
+
+The third argument to the `scan` function is an array of booleans that indicates which of external tokens are expected by
+the parser. You should only look for a given token if it is valid according to this array. At the same time, you cannot
+backtrack, so you may need to combine certain pieces of logic.
+
+```c
+if (valid_symbols[INDENT] || valid_symbols[DEDENT]) {
+
+  // ... logic that is common to both `INDENT` and `DEDENT`
+
+  if (valid_symbols[INDENT]) {
+
+    // ... logic that is specific to `INDENT`
+
+    lexer->result_symbol = INDENT;
+    return true;
+  }
+}
+```
+
+## External Scanner Helpers
+
+### Allocator
+
+Instead of using libc's `malloc`, `calloc`, `realloc`, and `free`, you should use the versions prefixed with `ts_` from `tree_sitter/alloc.h`.
+These macros can allow a potential consumer to override the default allocator with their own implementation, but by default
+will use the libc functions.
+
+As a consumer of the tree-sitter core library as well as any parser libraries that might use allocations, you can enable
+overriding the default allocator and have it use the same one as the library allocator, of which you can set with `ts_set_allocator`.
+To enable this overriding in scanners, you must compile them with the `TREE_SITTER_REUSE_ALLOCATOR` macro defined, and tree-sitter
+the library must be linked into your final app dynamically, since it needs to resolve the internal functions at runtime.
+If you are compiling an executable binary that uses the core library, but want to load parsers dynamically at runtime, then
+you will have to use a special linker flag on Unix. For non-Darwin systems, that would be `--dynamic-list` and for Darwin
+systems, that would be `-exported_symbols_list`. The CLI does exactly this, so you can use it as a reference (check out `cli/build.rs`).
+
+For example, assuming you wanted to allocate 100 bytes for your scanner, you'd do so like the following example:
+
+```c
+#include "tree_sitter/parser.h"
+#include "tree_sitter/alloc.h"
+
+// ...
+
+void* tree_sitter_my_language_external_scanner_create() {
+  return ts_calloc(100, 1); // or ts_malloc(100)
+}
+
+// ...
+
+```
+
+### Arrays
+
+If you need to use array-like types in your scanner, such as tracking a stack of indentations or tags, you should use the
+array macros from `tree_sitter/array.h`.
+
+There are quite a few of them provided for you, but here's how you could get started tracking some . Check out the header
+itself for more detailed documentation.
+
+<div class="warning">
+Do not use any of the array functions or macros that are prefixed with an underscore and have comments saying
+that it is not what you are looking for. These are internal functions used as helpers by other macros that are public.
+They are not meant to be used directly, nor are they what you want.
+</div>
+
+```c
+#include "tree_sitter/parser.h"
+#include "tree_sitter/array.h"
+
+enum TokenType {
+  INDENT,
+  DEDENT,
+  NEWLINE,
+  STRING,
+}
+
+// Create the array in your create function
+
+void* tree_sitter_my_language_external_scanner_create() {
+  return ts_calloc(1, sizeof(Array(int)));
+
+  // or if you want to zero out the memory yourself
+
+  Array(int) *stack = ts_malloc(sizeof(Array(int)));
+  array_init(&stack);
+  return stack;
+}
+
+bool tree_sitter_my_language_external_scanner_scan(
+  void *payload,
+  TSLexer *lexer,
+  const bool *valid_symbols
+) {
+  Array(int) *stack = payload;
+  if (valid_symbols[INDENT]) {
+    array_push(stack, lexer->get_column(lexer));
+    lexer->result_symbol = INDENT;
+    return true;
+  }
+  if (valid_symbols[DEDENT]) {
+    array_pop(stack); // this returns the popped element by value, but we don't need it
+    lexer->result_symbol = DEDENT;
+    return true;
+  }
+
+  // we can also use an array on the stack to keep track of a string
+
+  Array(char) next_string = array_new();
+
+  if (valid_symbols[STRING] && lexer->lookahead == '"') {
+    lexer->advance(lexer, false);
+    while (lexer->lookahead != '"' && lexer->lookahead != '\n' && !lexer->eof(lexer)) {
+      array_push(&next_string, lexer->lookahead);
+      lexer->advance(lexer, false);
+    }
+
+    // assume we have some arbitrary constraint of not having more than 100 characters in a string
+    if (lexer->lookahead == '"' && next_string.size <= 100) {
+      lexer->advance(lexer, false);
+      lexer->result_symbol = STRING;
+      return true;
+    }
+  }
+
+  return false;
+}
+
+```
+
+## Other External Scanner Details
+
+External scanners have priority over Tree-sitter's normal lexing process. When a token listed in the externals array is valid
+at a given position, the external scanner is called first. This makes external scanners a powerful way to override Tree-sitter's
+default lexing behavior, especially for cases that can't be handled with regular lexical rules, parsing, or dynamic precedence.
+
+During error recovery, Tree-sitter's first step is to call the external scanner's scan function with all tokens marked as
+valid. Your scanner should detect and handle this case appropriately. One simple approach is to add an unused "sentinel"
+token at the end of your externals array:
+
+```js
+{
+  name: "my_language",
+
+  externals: $ => [$.token1, $.token2, $.error_sentinel]
+
+  // ...
+}
+```
+
+You can then check if this sentinel token is marked valid to determine if Tree-sitter is in error recovery mode.
+
+If you would rather not handle the error recovery case explicitly, the easiest way to "opt-out" and let tree-sitter's internal
+lexer handle it is to return `false` from your scan function when `valid_symbols` contains the error sentinel.
+
+```c
+bool tree_sitter_my_language_external_scanner_scan(
+  void *payload,
+  TSLexer *lexer,
+  const bool *valid_symbols
+) {
+  if (valid_symbols[ERROR_SENTINEL]) {
+    return false;
+  }
+  // ...
+}
+```
+
+When you include literal keywords in the externals array, for example:
+
+```js
+externals: $ => ['if', 'then', 'else']
+```
+
+_those_ keywords will
+be tokenized by the external scanner whenever they appear in the grammar.
+
+This is equivalent to declaring named tokens and aliasing them:
+
+```js
+{
+  name: "my_language",
+
+  externals: $ => [$.if_keyword, $.then_keyword, $.else_keyword],
+
+  rules: {
+
+    // then using it in a rule like so:
+    if_statement: $ => seq(alias($.if_keyword, 'if'), ...),
+
+    // ...
+  }
+}
+```
+
+The tokenization process for external keywords works in two stages:
+
+1. The external scanner attempts to recognize the token first
+2. If the scanner returns true and sets a token, that token is used
+3. If the scanner returns false, Tree-sitter falls back to its internal lexer
+
+However, when you use rule references (like `$.if_keyword`) in the externals array without defining the corresponding rules
+in the grammar, Tree-sitter cannot fall back to its internal lexer. In this case, the external scanner is solely responsible
+for recognizing these tokens.
+
+<div class="warning">
+
+**Important Warnings**
+
+⚠️ External scanners can easily create infinite loops
+
+⚠️ Be extremely careful when emitting zero-width tokens
+
+⚠️ Always use the `eof` function when looping through characters
+
+</div>
+
+[ejs]: https://ejs.co
+[enum]: https://en.wikipedia.org/wiki/Enumerated_type#C
+[heredoc]: https://en.wikipedia.org/wiki/Here_document
+[indent-tokens]: https://en.wikipedia.org/wiki/Off-side_rule
+[multi-language-section]: ../using-parsers/3-advanced-parsing.md#multi-language-documents
+[percent-string]: https://docs.ruby-lang.org/en/2.5.0/doc/syntax/literals_rdoc.html#label-Percent+Strings
--- a/docs/src/creating-parsers/5-writing-tests.md
+++ b/docs/src/creating-parsers/5-writing-tests.md
@ -0,0 +1,163 @@
+# Writing Tests
+
+For each rule that you add to the grammar, you should first create a *test* that describes how the syntax trees should look
+when parsing that rule. These tests are written using specially-formatted text files in the `test/corpus/` directory within
+your parser's root folder.
+
+For example, you might have a file called `test/corpus/statements.txt` that contains a series of entries like this:
+
+```text
+==================
+Return statements
+==================
+
+func x() int {
+  return 1;
+}
+
+---
+
+(source_file
+  (function_definition
+    (identifier)
+    (parameter_list)
+    (primitive_type)
+    (block
+      (return_statement (number)))))
+```
+
+* The **name** of each test is written between two lines containing only `=` (equal sign) characters.
+
+* Then the **input source code** is written, followed by a line containing three or more `-` (dash) characters.
+
+* Then, the **expected output syntax tree** is written as an [S-expression][s-exp]. The exact placement of whitespace in
+the S-expression doesn't matter, but ideally the syntax tree should be legible. Note that the S-expression does not show
+syntax nodes like `func`, `(` and `;`, which are expressed as strings and regexes in the grammar. It only shows the *named*
+nodes, as described in [this section][named-vs-anonymous-nodes] of the page on parser usage.
+
+  The expected output section can also *optionally* show the [*field names*][node-field-names] associated with each child
+  node. To include field names in your tests, you write a node's field name followed by a colon, before the node itself in
+  the S-expression:
+
+```query
+(source_file
+  (function_definition
+    name: (identifier)
+    parameters: (parameter_list)
+    result: (primitive_type)
+    body: (block
+      (return_statement (number)))))
+```
+
+* If your language's syntax conflicts with the `===` and `---` test separators, you can optionally add an arbitrary identical
+suffix (in the below example, `|||`) to disambiguate them:
+
+```text
+==================|||
+Basic module
+==================|||
+
+---- MODULE Test ----
+increment(n) == n + 1
+====
+
+---|||
+
+(source_file
+  (module (identifier)
+    (operator (identifier)
+      (parameter_list (identifier))
+      (plus (identifier_ref) (number)))))
+```
+
+These tests are important. They serve as the parser's API documentation, and they can be run every time you change the grammar
+to verify that everything still parses correctly.
+
+By default, the `tree-sitter test` command runs all the tests in your `test/corpus/` folder. To run a particular test, you
+can use the `-f` flag:
+
+```sh
+tree-sitter test -f 'Return statements'
+```
+
+The recommendation is to be comprehensive in adding tests. If it's a visible node, add it to a test file in your `test/corpus`
+directory. It's typically a good idea to test all the permutations of each language construct. This increases test coverage,
+but doubly acquaints readers with a way to examine expected outputs and understand the "edges" of a language.
+
+## Attributes
+
+Tests can be annotated with a few `attributes`. Attributes must be put in the header, below the test name, and start with
+a `:`. A couple of attributes also take in a parameter, which require the use of parenthesis.
+
+**Note**: If you'd like to supply in multiple parameters, e.g. to run tests on multiple platforms or to test multiple languages,
+you can repeat the attribute on a new line.
+
+The following attributes are available:
+
+* `:skip` — This attribute will skip the test when running `tree-sitter test`.
+  This is useful when you want to temporarily disable running a test without deleting it.
+* `:error` — This attribute will assert that the parse tree contains an error. It's useful to just validate that a certain
+input is invalid without displaying the whole parse tree, as such you should omit the parse tree below the `---` line.
+* `:fail-fast` — This attribute will stop the testing additional tests if the test marked with this attribute fails.
+* `:language(LANG)` — This attribute will run the tests using the parser for the specified language. This is useful for
+multi-parser repos, such as XML and DTD, or Typescript and TSX. The default parser used will always be the first entry in
+the `grammars` field in the `tree-sitter.json` config file, so having a way to pick a second or even third parser is useful.
+* `:platform(PLATFORM)` — This attribute specifies the platform on which the test should run. It is useful to test platform-specific
+behavior (e.g. Windows newlines are different from Unix). This attribute must match up with Rust's [`std::env::consts::OS`][constants].
+
+Examples using attributes:
+
+```text
+=========================
+Test that will be skipped
+:skip
+=========================
+
+int main() {}
+
+-------------------------
+
+====================================
+Test that will run on Linux or macOS
+
+:platform(linux)
+:platform(macos)
+====================================
+
+int main() {}
+
+------------------------------------
+
+========================================================================
+Test that expects an error, and will fail fast if there's no parse error
+:fail-fast
+:error
+========================================================================
+
+int main ( {}
+
+------------------------------------------------------------------------
+
+=================================================
+Test that will parse with both Typescript and TSX
+:language(typescript)
+:language(tsx)
+=================================================
+
+console.log('Hello, world!');
+
+-------------------------------------------------
+```
+
+### Automatic Compilation
+
+You might notice that the first time you run `tree-sitter test` after regenerating your parser, it takes some extra time.
+This is because Tree-sitter automatically compiles your C code into a dynamically-loadable library. It recompiles your parser
+as-needed whenever you update it by re-running `tree-sitter generate`, or whenever the [external scanner][external-scanners]
+file is changed.
+
+[constants]: https://doc.rust-lang.org/std/env/consts/constant.OS.html
+[external-scanners]: ./4-external-scanners.md
+[named-vs-anonymous-nodes]: ../using-parsers/2-basic-parsing.md#named-vs-anonymous-nodes
+[node-field-names]: ../using-parsers/2-basic-parsing.md#node-field-names
+[s-exp]: https://en.wikipedia.org/wiki/S-expression
--- a/docs/src/creating-parsers/index.md
+++ b/docs/src/creating-parsers/index.md
@ -0,0 +1,4 @@
+# Creating parsers
+
+Developing Tree-sitter grammars can have a difficult learning curve, but once you get the hang of it, it can be fun and even
+zen-like. This document will help you to get started and to develop a useful mental model.
--- a/docs/src/index.md
+++ b/docs/src/index.md
@ -0,0 +1,91 @@
+# Introduction
+
+Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
+
+- **General** enough to parse any programming language
+- **Fast** enough to parse on every keystroke in a text editor
+- **Robust** enough to provide useful results even in the presence of syntax errors
+- **Dependency-free** so that the runtime library (which is written in pure [C11](https://github.com/tree-sitter/tree-sitter/tree/master/lib)) can be embedded in any application
+
+### Language Bindings
+
+There are currently bindings that allow Tree-sitter to be used from the following languages:
+
+#### Official
+
+- [C#](https://github.com/tree-sitter/csharp-tree-sitter)
+- [Go](https://github.com/tree-sitter/go-tree-sitter)
+- [Haskell](https://github.com/tree-sitter/haskell-tree-sitter)
+- [Java (JDK 22)](https://github.com/tree-sitter/java-tree-sitter)
+- [JavaScript (Node.js)](https://github.com/tree-sitter/node-tree-sitter)
+- [JavaScript (Wasm)](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
+- [Kotlin](https://github.com/tree-sitter/kotlin-tree-sitter)
+- [Python](https://github.com/tree-sitter/py-tree-sitter)
+- [Rust](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
+- [Zig](https://github.com/tree-sitter/zig-tree-sitter)
+
+#### Third-party
+
+- [Delphi](https://github.com/modersohn/delphi-tree-sitter)
+- [ELisp](https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-Program-Source.html)
+- [Guile](https://github.com/Z572/guile-ts)
+- [Java (JDK 8+)](https://github.com/bonede/tree-sitter-ng)
+- [Java (JDK 11+)](https://github.com/seart-group/java-tree-sitter)
+- [Julia](https://github.com/MichaelHatherly/TreeSitter.jl)
+- [Lua](https://github.com/euclidianAce/ltreesitter)
+- [Lua](https://github.com/xcb-xwii/lua-tree-sitter)
+- [OCaml](https://github.com/returntocorp/ocaml-tree-sitter-core)
+- [Odin](https://github.com/laytan/odin-tree-sitter)
+- [Perl](https://metacpan.org/pod/Text::Treesitter)
+- [R](https://github.com/DavisVaughan/r-tree-sitter)
+- [Ruby](https://github.com/Faveod/ruby-tree-sitter)
+- [Ruby](https://github.com/calicoday/ruby-tree-sitter-ffi)
+- [Swift](https://github.com/ChimeHQ/SwiftTreeSitter)
+
+### Parsers
+
+The following parsers can be found in the upstream organization:
+
+- [Agda](https://github.com/tree-sitter/tree-sitter-agda)
+- [Bash](https://github.com/tree-sitter/tree-sitter-bash)
+- [C](https://github.com/tree-sitter/tree-sitter-c)
+- [C++](https://github.com/tree-sitter/tree-sitter-cpp)
+- [C#](https://github.com/tree-sitter/tree-sitter-c-sharp)
+- [CSS](https://github.com/tree-sitter/tree-sitter-css)
+- [ERB / EJS](https://github.com/tree-sitter/tree-sitter-embedded-template)
+- [Go](https://github.com/tree-sitter/tree-sitter-go)
+- [Haskell](https://github.com/tree-sitter/tree-sitter-haskell)
+- [HTML](https://github.com/tree-sitter/tree-sitter-html)
+- [Java](https://github.com/tree-sitter/tree-sitter-java)
+- [JavaScript](https://github.com/tree-sitter/tree-sitter-javascript)
+- [JSDoc](https://github.com/tree-sitter/tree-sitter-jsdoc)
+- [JSON](https://github.com/tree-sitter/tree-sitter-json)
+- [Julia](https://github.com/tree-sitter/tree-sitter-julia)
+- [OCaml](https://github.com/tree-sitter/tree-sitter-ocaml)
+- [PHP](https://github.com/tree-sitter/tree-sitter-php)
+- [Python](https://github.com/tree-sitter/tree-sitter-python)
+- [Regex](https://github.com/tree-sitter/tree-sitter-regex)
+- [Ruby](https://github.com/tree-sitter/tree-sitter-ruby)
+- [Rust](https://github.com/tree-sitter/tree-sitter-rust)
+- [Scala](https://github.com/tree-sitter/tree-sitter-scala)
+- [TypeScript](https://github.com/tree-sitter/tree-sitter-typescript)
+- [Verilog](https://github.com/tree-sitter/tree-sitter-verilog)
+
+A list of known parsers can be found in the [wiki](https://github.com/tree-sitter/tree-sitter/wiki/List-of-parsers).
+
+### Talks on Tree-sitter
+
+- [Strange Loop 2018](https://www.thestrangeloop.com/2018/tree-sitter---a-new-parsing-system-for-programming-tools.html)
+- [FOSDEM 2018](https://www.youtube.com/watch?v=0CGzC_iss-8)
+- [GitHub Universe 2017](https://www.youtube.com/watch?v=a1rC79DHpmY)
+
+### Underlying Research
+
+The design of Tree-sitter was greatly influenced by the following research papers:
+
+- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
+- [Context Aware Scanning for Parsing Extensible Languages](https://www-users.cse.umn.edu/~evw/pubs/vanwyk07gpce/vanwyk07gpce.pdf)
+- [Efficient and Flexible Incremental Parsing](https://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf)
+- [Incremental Analysis of Real Programming Languages](https://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf)
+- [Error Detection and Recovery in LR Parsers](https://web.archive.org/web/20240302031213/https://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
+- [Error Recovery for LR Parsers](https://apps.dtic.mil/sti/pdfs/ADA043470.pdf)
--- a/docs/src/using-parsers/1-getting-started.md
+++ b/docs/src/using-parsers/1-getting-started.md
@ -0,0 +1,134 @@
+# Getting Started
+
+## Building the Library
+
+To build the library on a POSIX system, just run `make` in the Tree-sitter directory. This will create a static library
+called `libtree-sitter.a` as well as dynamic libraries.
+
+Alternatively, you can incorporate the library in a larger project's build system by adding one source file to the build.
+This source file needs two directories to be in the include path when compiled:
+
+**source file:**
+
+- `tree-sitter/lib/src/lib.c`
+
+**include directories:**
+
+- `tree-sitter/lib/src`
+- `tree-sitter/lib/include`
+
+## The Basic Objects
+
+There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes.
+In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`.
+
+- A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage`
+is generated by Tree-sitter. Many languages are already available in separate git repositories within the
+[Tree-sitter GitHub organization][ts org] and the [Tree-sitter grammars GitHub organization][tsg org].
+See [the next section][creating parsers] for how to create new languages.
+
+- A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some
+source code.
+
+- A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the
+structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the
+source code changes.
+
+- A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as
+well as its relation to other nodes like its parent, siblings and children.
+
+## An Example Program
+
+Here's an example of a simple C program that uses the Tree-sitter [JSON parser][json].
+
+```c
+// Filename - test-json-parser.c
+
+#include <assert.h>
+#include <string.h>
+#include <stdio.h>
+#include <tree_sitter/api.h>
+
+// Declare the `tree_sitter_json` function, which is
+// implemented by the `tree-sitter-json` library.
+const TSLanguage *tree_sitter_json(void);
+
+int main() {
+  // Create a parser.
+  TSParser *parser = ts_parser_new();
+
+  // Set the parser's language (JSON in this case).
+  ts_parser_set_language(parser, tree_sitter_json());
+
+  // Build a syntax tree based on source code stored in a string.
+  const char *source_code = "[1, null]";
+  TSTree *tree = ts_parser_parse_string(
+    parser,
+    NULL,
+    source_code,
+    strlen(source_code)
+  );
+
+  // Get the root node of the syntax tree.
+  TSNode root_node = ts_tree_root_node(tree);
+
+  // Get some child nodes.
+  TSNode array_node = ts_node_named_child(root_node, 0);
+  TSNode number_node = ts_node_named_child(array_node, 0);
+
+  // Check that the nodes have the expected types.
+  assert(strcmp(ts_node_type(root_node), "document") == 0);
+  assert(strcmp(ts_node_type(array_node), "array") == 0);
+  assert(strcmp(ts_node_type(number_node), "number") == 0);
+
+  // Check that the nodes have the expected child counts.
+  assert(ts_node_child_count(root_node) == 1);
+  assert(ts_node_child_count(array_node) == 5);
+  assert(ts_node_named_child_count(array_node) == 2);
+  assert(ts_node_child_count(number_node) == 0);
+
+  // Print the syntax tree as an S-expression.
+  char *string = ts_node_string(root_node);
+  printf("Syntax tree: %s\n", string);
+
+  // Free all of the heap-allocated memory.
+  free(string);
+  ts_tree_delete(tree);
+  ts_parser_delete(parser);
+  return 0;
+}
+```
+
+This program requires three components to build:
+
+1. The Tree-sitter C API from `tree-sitter/api.h` (requiring `tree-sitter/lib/include` in our include path)
+2. The Tree-sitter library (`libtree-sitter.a`)
+3. The JSON grammar's source code, which we compile directly into the binary
+
+```sh
+clang                                   \
+  -I tree-sitter/lib/include            \
+  test-json-parser.c                    \
+  tree-sitter-json/src/parser.c         \
+  tree-sitter/libtree-sitter.a          \
+  -o test-json-parser
+./test-json-parser
+```
+
+When using dynamic linking, you'll need to ensure the shared library is discoverable through `LD_LIBRARY_PATH` or your system's
+equivalent environment variable. Here's how to compile with dynamic linking:
+
+```sh
+clang                                   \
+  -I tree-sitter/lib/include            \
+  test-json-parser.c                    \
+  tree-sitter-json/src/parser.c         \
+  -ltree-sitter                         \
+  -o test-json-parser
+./test-json-parser
+```
+
+[creating parsers]: ../creating-parsers/index.md
+[json]: https://github.com/tree-sitter/tree-sitter-json
+[ts org]: https://github.com/tree-sitter
+[tsg org]: https://github.com/tree-sitter-grammars
--- a/docs/src/using-parsers/2-basic-parsing.md
+++ b/docs/src/using-parsers/2-basic-parsing.md
@ -0,0 +1,187 @@
+# Basic Parsing
+
+## Providing the Code
+
+In the example on the previous page, we parsed source code stored in a simple string using the `ts_parser_parse_string` function:
+
+```c
+TSTree *ts_parser_parse_string(
+  TSParser *self,
+  const TSTree *old_tree,
+  const char *string,
+  uint32_t length
+);
+```
+
+You may want to parse source code that's stored in a custom data structure, like a [piece table][piece table] or a [rope][rope].
+In this case, you can use the more general `ts_parser_parse` function:
+
+```c
+TSTree *ts_parser_parse(
+  TSParser *self,
+  const TSTree *old_tree,
+  TSInput input
+);
+```
+
+The `TSInput` structure lets you provide your own function for reading a chunk of text at a given byte offset and row/column
+position. The function can return text encoded in either UTF-8 or UTF-16. This interface allows you to efficiently parse
+text that is stored in your own data structure.
+
+```c
+typedef struct {
+  void *payload;
+  const char *(*read)(
+    void *payload,
+    uint32_t byte_offset,
+    TSPoint position,
+    uint32_t *bytes_read
+  );
+  TSInputEncoding encoding;
+  DecodeFunction decode;
+} TSInput;
+```
+
+If you want to decode text that is not encoded in UTF-8 or UTF-16, you can set the `decode` field of the input to your function
+that will decode text. The signature of the `DecodeFunction` is as follows:
+
+```c
+typedef uint32_t (*DecodeFunction)(
+  const uint8_t *string,
+  uint32_t length,
+  int32_t *code_point
+);
+```
+
+> Note that the `TSInputEncoding` must be set to `TSInputEncodingCustom` for the `decode` function to be called.
+
+The `string` argument is a pointer to the text to decode, which comes from the `read` function, and the `length` argument
+is the length of the `string`. The `code_point` argument is a pointer to an integer that represents the decoded code point,
+and should be written to in your `decode` callback. The function should return the number of bytes decoded.
+
+## Syntax Nodes
+
+Tree-sitter provides a [DOM][dom]-style interface for inspecting syntax trees.
+A syntax node's _type_ is a string that indicates which grammar rule the node represents.
+
+```c
+const char *ts_node_type(TSNode);
+```
+
+Syntax nodes store their position in the source code both in raw bytes and row/column coordinates:
+
+```c
+uint32_t ts_node_start_byte(TSNode);
+uint32_t ts_node_end_byte(TSNode);
+typedef struct {
+  uint32_t row;
+  uint32_t column;
+} TSPoint;
+TSPoint ts_node_start_point(TSNode);
+TSPoint ts_node_end_point(TSNode);
+```
+
+## Retrieving Nodes
+
+Every tree has a _root node_:
+
+```c
+TSNode ts_tree_root_node(const TSTree *);
+```
+
+Once you have a node, you can access the node's children:
+
+```c
+uint32_t ts_node_child_count(TSNode);
+TSNode ts_node_child(TSNode, uint32_t);
+```
+
+You can also access its siblings and parent:
+
+```c
+TSNode ts_node_next_sibling(TSNode);
+TSNode ts_node_prev_sibling(TSNode);
+TSNode ts_node_parent(TSNode);
+```
+
+These methods may all return a _null node_ to indicate, for example, that a node does not _have_ a next sibling.
+You can check if a node is null:
+
+```c
+bool ts_node_is_null(TSNode);
+```
+
+## Named vs Anonymous Nodes
+
+Tree-sitter produces [_concrete_ syntax trees][cst] — trees that contain nodes for
+every individual token in the source code, including things like commas and parentheses. This is important for use-cases
+that deal with individual tokens, like [syntax highlighting][syntax highlighting]. But some
+types of code analysis are easier to perform using an [_abstract_ syntax tree][ast] — a tree in which the less important
+details have been removed. Tree-sitter's trees support these use cases by making a distinction between
+_named_ and _anonymous_ nodes.
+
+Consider a grammar rule like this:
+
+```js
+if_statement: $ => seq("if", "(", $._expression, ")", $._statement);
+```
+
+A syntax node representing an `if_statement` in this language would have 5 children: the condition expression, the body statement,
+as well as the `if`, `(`, and `)` tokens. The expression and the statement would be marked as _named_ nodes, because they
+have been given explicit names in the grammar. But the `if`, `(`, and `)` nodes would _not_ be named nodes, because they
+are represented in the grammar as simple strings.
+
+You can check whether any given node is named:
+
+```c
+bool ts_node_is_named(TSNode);
+```
+
+When traversing the tree, you can also choose to skip over anonymous nodes by using the `_named_` variants of all of the
+methods described above:
+
+```c
+TSNode ts_node_named_child(TSNode, uint32_t);
+uint32_t ts_node_named_child_count(TSNode);
+TSNode ts_node_next_named_sibling(TSNode);
+TSNode ts_node_prev_named_sibling(TSNode);
+```
+
+If you use this group of methods, the syntax tree functions much like an abstract syntax tree.
+
+## Node Field Names
+
+To make syntax nodes easier to analyze, many grammars assign unique _field names_ to particular child nodes.
+In the [creating parsers][using fields] section, it's explained how to do this in your own grammars. If a syntax node has
+fields, you can access its children using their field name:
+
+```c
+TSNode ts_node_child_by_field_name(
+  TSNode self,
+  const char *field_name,
+  uint32_t field_name_length
+);
+```
+
+Fields also have numeric ids that you can use, if you want to avoid repeated string comparisons. You can convert between
+strings and ids using the `TSLanguage`:
+
+```c
+uint32_t ts_language_field_count(const TSLanguage *);
+const char *ts_language_field_name_for_id(const TSLanguage *, TSFieldId);
+TSFieldId ts_language_field_id_for_name(const TSLanguage *, const char *, uint32_t);
+```
+
+The field ids can be used in place of the name:
+
+```c
+TSNode ts_node_child_by_field_id(TSNode, TSFieldId);
+```
+
+[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
+[cst]: https://en.wikipedia.org/wiki/Parse_tree
+[dom]: https://en.wikipedia.org/wiki/Document_Object_Model
+[piece table]: <https://en.wikipedia.org/wiki/Piece_table>
+[rope]: <https://en.wikipedia.org/wiki/Rope_(data_structure)>
+[syntax highlighting]: https://en.wikipedia.org/wiki/Syntax_highlighting
+[using fields]: ../creating-parsers/3-writing-the-grammar.md#using-fields
--- a/docs/src/using-parsers/3-advanced-parsing.md
+++ b/docs/src/using-parsers/3-advanced-parsing.md
@ -0,0 +1,161 @@
+# Advanced Parsing
+
+## Editing
+
+In applications like text editors, you often need to re-parse a file after its source code has changed. Tree-sitter is designed
+to support this use case efficiently. There are two steps required. First, you must _edit_ the syntax tree, which adjusts
+the ranges of its nodes so that they stay in sync with the code.
+
+```c
+typedef struct {
+  uint32_t start_byte;
+  uint32_t old_end_byte;
+  uint32_t new_end_byte;
+  TSPoint start_point;
+  TSPoint old_end_point;
+  TSPoint new_end_point;
+} TSInputEdit;
+
+void ts_tree_edit(TSTree *, const TSInputEdit *);
+```
+
+Then, you can call `ts_parser_parse` again, passing in the old tree. This will create a new tree that internally shares structure
+with the old tree.
+
+When you edit a syntax tree, the positions of its nodes will change. If you have stored any `TSNode` instances outside of
+the `TSTree`, you must update their positions separately, using the same `TSInput` value, in order to update their
+cached positions.
+
+```c
+void ts_node_edit(TSNode *, const TSInputEdit *);
+```
+
+This `ts_node_edit` function is _only_ needed in the case where you have retrieved `TSNode` instances _before_ editing the
+tree, and then _after_ editing the tree, you want to continue to use those specific node instances. Often, you'll just want
+to re-fetch nodes from the edited tree, in which case `ts_node_edit` is not needed.
+
+## Multi-language Documents
+
+Sometimes, different parts of a file may be written in different languages. For example, templating languages like [EJS][ejs]
+and [ERB][erb] allow you to generate HTML by writing a mixture of HTML and another language like JavaScript or Ruby.
+
+Tree-sitter handles these types of documents by allowing you to create a syntax tree based on the text in certain
+_ranges_ of a file.
+
+```c
+typedef struct {
+  TSPoint start_point;
+  TSPoint end_point;
+  uint32_t start_byte;
+  uint32_t end_byte;
+} TSRange;
+
+void ts_parser_set_included_ranges(
+  TSParser *self,
+  const TSRange *ranges,
+  uint32_t range_count
+);
+```
+
+For example, consider this ERB document:
+
+```erb
+<ul>
+  <% people.each do |person| %>
+    <li><%= person.name %></li>
+  <% end %>
+</ul>
+```
+
+Conceptually, it can be represented by three syntax trees with overlapping ranges: an ERB syntax tree, a Ruby syntax tree,
+and an HTML syntax tree. You could generate these syntax trees with the following code:
+
+```c
+#include <string.h>
+#include <tree_sitter/api.h>
+
+// These functions are each implemented in their own repo.
+const TSLanguage *tree_sitter_embedded_template(void);
+const TSLanguage *tree_sitter_html(void);
+const TSLanguage *tree_sitter_ruby(void);
+
+int main(int argc, const char **argv) {
+  const char *text = argv[1];
+  unsigned len = strlen(text);
+
+  // Parse the entire text as ERB.
+  TSParser *parser = ts_parser_new();
+  ts_parser_set_language(parser, tree_sitter_embedded_template());
+  TSTree *erb_tree = ts_parser_parse_string(parser, NULL, text, len);
+  TSNode erb_root_node = ts_tree_root_node(erb_tree);
+
+  // In the ERB syntax tree, find the ranges of the `content` nodes,
+  // which represent the underlying HTML, and the `code` nodes, which
+  // represent the interpolated Ruby.
+  TSRange html_ranges[10];
+  TSRange ruby_ranges[10];
+  unsigned html_range_count = 0;
+  unsigned ruby_range_count = 0;
+  unsigned child_count = ts_node_child_count(erb_root_node);
+
+  for (unsigned i = 0; i < child_count; i++) {
+    TSNode node = ts_node_child(erb_root_node, i);
+    if (strcmp(ts_node_type(node), "content") == 0) {
+      html_ranges[html_range_count++] = (TSRange) {
+        ts_node_start_point(node),
+        ts_node_end_point(node),
+        ts_node_start_byte(node),
+        ts_node_end_byte(node),
+      };
+    } else {
+      TSNode code_node = ts_node_named_child(node, 0);
+      ruby_ranges[ruby_range_count++] = (TSRange) {
+        ts_node_start_point(code_node),
+        ts_node_end_point(code_node),
+        ts_node_start_byte(code_node),
+        ts_node_end_byte(code_node),
+      };
+    }
+  }
+
+  // Use the HTML ranges to parse the HTML.
+  ts_parser_set_language(parser, tree_sitter_html());
+  ts_parser_set_included_ranges(parser, html_ranges, html_range_count);
+  TSTree *html_tree = ts_parser_parse_string(parser, NULL, text, len);
+  TSNode html_root_node = ts_tree_root_node(html_tree);
+
+  // Use the Ruby ranges to parse the Ruby.
+  ts_parser_set_language(parser, tree_sitter_ruby());
+  ts_parser_set_included_ranges(parser, ruby_ranges, ruby_range_count);
+  TSTree *ruby_tree = ts_parser_parse_string(parser, NULL, text, len);
+  TSNode ruby_root_node = ts_tree_root_node(ruby_tree);
+
+  // Print all three trees.
+  char *erb_sexp = ts_node_string(erb_root_node);
+  char *html_sexp = ts_node_string(html_root_node);
+  char *ruby_sexp = ts_node_string(ruby_root_node);
+  printf("ERB: %s\n", erb_sexp);
+  printf("HTML: %s\n", html_sexp);
+  printf("Ruby: %s\n", ruby_sexp);
+  return 0;
+}
+```
+
+This API allows for great flexibility in how languages can be composed. Tree-sitter is not responsible for mediating the
+interactions between languages. Instead, you are free to do that using arbitrary application-specific logic.
+
+## Concurrency
+
+Tree-sitter supports multi-threaded use cases by making syntax trees very cheap to copy.
+
+```c
+TSTree *ts_tree_copy(const TSTree *);
+```
+
+Internally, copying a syntax tree just entails incrementing an atomic reference count. Conceptually, it provides you a new
+tree which you can freely query, edit, reparse, or delete on a new thread while continuing to use the original tree on a
+different thread. Note that individual `TSTree` instances are _not_ thread safe; you must copy a tree if you want to use
+it on multiple threads simultaneously.
+
+[ejs]: https://ejs.co
+[erb]: https://ruby-doc.org/stdlib-2.5.1/libdoc/erb/rdoc/ERB.html
--- a/docs/src/using-parsers/4-walking-trees.md
+++ b/docs/src/using-parsers/4-walking-trees.md
@ -0,0 +1,42 @@
+# Walking Trees with Tree Cursors
+
+You can access every node in a syntax tree using the `TSNode` APIs [described earlier][retrieving nodes], but if you need
+to access a large number of nodes, the fastest way to do so is with a _tree cursor_. A cursor is a stateful object that
+allows you to walk a syntax tree with maximum efficiency.
+
+<div class="warning">
+
+Note that the given input node is considered the root of the cursor, and the cursor cannot walk outside this node.
+Going to the parent or any sibling of the root node will always return `false`.
+
+This has no unexpected effects if the given input node is the actual `root` node of the tree, but is something to keep in
+mind when using cursors constructed with a node that is not the `root` node.
+</div>
+
+You can initialize a cursor from any node:
+
+```c
+TSTreeCursor ts_tree_cursor_new(TSNode);
+```
+
+You can move the cursor around the tree:
+
+```c
+bool ts_tree_cursor_goto_first_child(TSTreeCursor *);
+bool ts_tree_cursor_goto_next_sibling(TSTreeCursor *);
+bool ts_tree_cursor_goto_parent(TSTreeCursor *);
+```
+
+These methods return `true` if the cursor successfully moved and `false` if there was no node to move to.
+
+You can always retrieve the cursor's current node, as well as the [field name][node-field-names] that is associated with
+the current node.
+
+```c
+TSNode ts_tree_cursor_current_node(const TSTreeCursor *);
+const char *ts_tree_cursor_current_field_name(const TSTreeCursor *);
+TSFieldId ts_tree_cursor_current_field_id(const TSTreeCursor *);
+```
+
+[retrieving nodes]: ./2-basic-parsing.md#retrieving-nodes
+[node-field-names]: ./2-basic-parsing.md#node-field-names
--- a/docs/src/using-parsers/6-static-node-types.md
+++ b/docs/src/using-parsers/6-static-node-types.md
@ -0,0 +1,162 @@
+# Static Node Types
+
+In languages with static typing, it can be helpful for syntax trees to provide specific type information about individual
+syntax nodes. Tree-sitter makes this information available via a generated file called `node-types.json`. This _node types_
+file provides structured data about every possible syntax node in a grammar.
+
+You can use this data to generate type declarations in statically-typed programming languages.
+
+The node types file contains an array of objects, each of which describes a particular type of syntax node using the
+following entries:
+
+## Basic Info
+
+Every object in this array has these two entries:
+
+- `"type"` — A string that indicates, which grammar rule the node represents. This corresponds to the `ts_node_type` function
+described [here][syntax nodes].
+- `"named"` — A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string
+literal. See [here][named-vs-anonymous-nodes] for more info.
+
+Examples:
+
+```json
+{
+  "type": "string_literal",
+  "named": true
+}
+{
+  "type": "+",
+  "named": false
+}
+```
+
+Together, these two fields constitute a unique identifier for a node type; no two top-level objects in the `node-types.json`
+should have the same values for both `"type"` and `"named"`.
+
+## Internal Nodes
+
+Many syntax nodes can have _children_. The node type object describes the possible children that a node can have using the
+following entries:
+
+- `"fields"` — An object that describes the possible [fields][node-field-names] that the node can have. The keys of this
+object are field names, and the values are _child type_ objects, described below.
+- `"children"` — Another _child type_ object that describes all the node's possible _named_ children _without_ fields.
+
+A _child type_ object describes a set of child nodes using the following entries:
+
+- `"required"` — A boolean indicating whether there is always _at least one_ node in this set.
+- `"multiple"` — A boolean indicating whether there can be _multiple_ nodes in this set.
+- `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"`
+and `"named"`, whose meanings are described above.
+
+Example with fields:
+
+```json
+{
+  "type": "method_definition",
+  "named": true,
+  "fields": {
+    "body": {
+      "multiple": false,
+      "required": true,
+      "types": [{ "type": "statement_block", "named": true }]
+    },
+    "decorator": {
+      "multiple": true,
+      "required": false,
+      "types": [{ "type": "decorator", "named": true }]
+    },
+    "name": {
+      "multiple": false,
+      "required": true,
+      "types": [
+        { "type": "computed_property_name", "named": true },
+        { "type": "property_identifier", "named": true }
+      ]
+    },
+    "parameters": {
+      "multiple": false,
+      "required": true,
+      "types": [{ "type": "formal_parameters", "named": true }]
+    }
+  }
+}
+```
+
+Example with children:
+
+```json
+{
+  "type": "array",
+  "named": true,
+  "fields": {},
+  "children": {
+    "multiple": true,
+    "required": false,
+    "types": [
+      { "type": "_expression", "named": true },
+      { "type": "spread_element", "named": true }
+    ]
+  }
+}
+```
+
+## Supertype Nodes
+
+In Tree-sitter grammars, there are usually certain rules that represent abstract _categories_ of syntax nodes (e.g. "expression",
+"type", "declaration"). In the `grammar.js` file, these are often written as [hidden rules][hidden rules]
+whose definition is a simple [`choice`][grammar dsl] where each member is just a single symbol.
+
+Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add
+a hidden rule to the grammar's [`supertypes` list][grammar dsl], then it _will_ show up in the node
+types file, with the following special entry:
+
+- `"subtypes"` — An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.
+
+Example:
+
+```json
+{
+  "type": "_declaration",
+  "named": true,
+  "subtypes": [
+    { "type": "class_declaration", "named": true },
+    { "type": "function_declaration", "named": true },
+    { "type": "generator_function_declaration", "named": true },
+    { "type": "lexical_declaration", "named": true },
+    { "type": "variable_declaration", "named": true }
+  ]
+}
+```
+
+Supertype nodes will also appear elsewhere in the node types file, as children of other node types, in a way that corresponds
+with how the supertype rule was used in the grammar. This can make the node types much shorter and easier to read, because
+a single supertype will take the place of multiple subtypes.
+
+Example:
+
+```json
+{
+  "type": "export_statement",
+  "named": true,
+  "fields": {
+    "declaration": {
+      "multiple": false,
+      "required": false,
+      "types": [{ "type": "_declaration", "named": true }]
+    },
+    "source": {
+      "multiple": false,
+      "required": false,
+      "types": [{ "type": "string", "named": true }]
+    }
+  }
+}
+```
+
+[grammar dsl]: ../creating-parsers/2-the-grammar-dsl.md
+[hidden rules]: ../creating-parsers/3-writing-the-grammar.md#hiding-rules
+[named-vs-anonymous-nodes]: ./2-basic-parsing.md#named-vs-anonymous-nodes
+[node-field-names]: ./2-basic-parsing.md#node-field-names
+[syntax nodes]: ./2-basic-parsing.md#syntax-nodes
--- a/docs/src/using-parsers/index.md
+++ b/docs/src/using-parsers/index.md
@ -0,0 +1,27 @@
+# Using Parsers
+
+This guide covers the fundamental concepts of using Tree-sitter, which is applicable across all programming languages.
+Although we'll explore some C-specific details that are valuable for direct C API usage or creating new language bindings,
+the core concepts remain the same.
+
+Tree-sitter's parsing functionality is implemented through its C API, with all functions documented in the [tree_sitter/api.h][api.h]
+header file, but if you're working in another language, you can use one of the following bindings found [here](../index.md#language-bindings),
+each providing idiomatic access to Tree-sitter's functionality. Of these bindings, the official ones have their own API docs
+hosted online at the following pages:
+
+- [Go][go]
+- [Java]
+- [JavaScript (Node.js)][javascript]
+- [Kotlin][kotlin]
+- [Python][python]
+- [Rust][rust]
+- [Zig][zig]
+
+[api.h]: https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h
+[go]: https://pkg.go.dev/github.com/tree-sitter/go-tree-sitter
+[java]: https://tree-sitter.github.io/java-tree-sitter
+[javascript]: https://tree-sitter.github.io/node-tree-sitter
+[kotlin]: https://tree-sitter.github.io/kotlin-tree-sitter
+[python]: https://tree-sitter.github.io/py-tree-sitter
+[rust]: https://docs.rs/tree-sitter
+[zig]: https://tree-sitter.github.io/zig-tree-sitter
--- a/docs/src/using-parsers/queries/1-syntax.md
+++ b/docs/src/using-parsers/queries/1-syntax.md
@ -0,0 +1,101 @@
+# Query Syntax
+
+A _query_ consists of one or more _patterns_, where each pattern is an [S-expression][s-exp] that matches a certain set of
+nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the
+node's type, and optionally, a series of other S-expressions that match the node's children. For example, this pattern would
+match any `binary_expression` node whose children are both `number_literal` nodes:
+
+```query
+(binary_expression (number_literal) (number_literal))
+```
+
+Children can also be omitted. For example, this would match any `binary_expression` where at least _one_ of child is a
+`string_literal` node:
+
+```query
+(binary_expression (string_literal))
+```
+
+## Fields
+
+In general, it's a good idea to make patterns more specific by specifying [field names][node-field-names] associated with
+child nodes. You do this by prefixing a child pattern with a field name followed by a colon. For example, this pattern would
+match an `assignment_expression` node where the `left` child is a `member_expression` whose `object` is a `call_expression`.
+
+```query
+(assignment_expression
+  left: (member_expression
+    object: (call_expression)))
+```
+
+## Negated Fields
+
+You can also constrain a pattern so that it only matches nodes that _lack_ a certain field. To do this, add a field name
+prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
+
+```query
+(class_declaration
+  name: (identifier) @class_name
+  !type_parameters)
+```
+
+## Anonymous Nodes
+
+The parenthesized syntax for writing nodes only applies to [named nodes][named-vs-anonymous-nodes]. To match specific anonymous
+nodes, you write their name between double quotes. For example, this pattern would match any `binary_expression` where the
+operator is `!=` and the right side is `null`:
+
+```query
+(binary_expression
+  operator: "!="
+  right: (null))
+```
+
+## Special Nodes
+
+### The Wildcard Node
+
+A wildcard node is represented with an underscore (`_`), it matches any node.
+This is similar to `.` in regular expressions.
+There are two types, `(_)` will match any named node,
+and `_` will match any named or anonymous node.
+
+For example, this pattern would match any node inside a call:
+
+```query
+(call (_) @call.inner)
+```
+
+### The `ERROR` Node
+
+When the parser encounters text it does not recognize, it represents this node
+as `(ERROR)` in the syntax tree. These error nodes can be queried just like
+normal nodes:
+
+```scheme
+(ERROR) @error-node
+```
+
+### The `MISSING` Node
+
+If the parser is able to recover from erroneous text by inserting a missing token and then reducing, it will insert that
+missing node in the final tree so long as that tree has the lowest error cost. These missing nodes appear as seemingly normal
+nodes in the tree, but they are zero tokens wide, and are internally represented as a property of the actual terminal node
+that was inserted, instead of being its own kind of node, like the `ERROR` node. These special missing nodes can be queried
+using `(MISSING)`:
+
+```scheme
+(MISSING) @missing-node
+```
+
+This is useful when attempting to detect all syntax errors in a given parse tree, since these missing node are not captured
+by `(ERROR)` queries. Specific missing node types can also be queried:
+
+```scheme
+(MISSING identifier) @missing-identifier
+(MISSING ";") @missing-semicolon
+```
+
+[node-field-names]: ../2-basic-parsing.md#node-field-names
+[named-vs-anonymous-nodes]: ../2-basic-parsing.md#named-vs-anonymous-nodes
+[s-exp]: https://en.wikipedia.org/wiki/S-expression
--- a/docs/src/using-parsers/queries/2-operators.md
+++ b/docs/src/using-parsers/queries/2-operators.md
@ -0,0 +1,151 @@
+# Operators
+
+## Capturing Nodes
+
+When matching patterns, you may want to process specific nodes within the pattern. Captures allow you to associate names
+with specific nodes in a pattern, so that you can later refer to those nodes by those names. Capture names are written _after_
+the nodes that they refer to, and start with an `@` character.
+
+For example, this pattern would match any assignment of a `function` to an `identifier`, and it would associate the name
+`the-function-name` with the identifier:
+
+```query
+(assignment_expression
+  left: (identifier) @the-function-name
+  right: (function))
+```
+
+And this pattern would match all method definitions, associating the name `the-method-name` with the method name, `the-class-name`
+with the containing class name:
+
+```query
+(class_declaration
+  name: (identifier) @the-class-name
+  body: (class_body
+    (method_definition
+      name: (property_identifier) @the-method-name)))
+```
+
+## Quantification Operators
+
+You can match a repeating sequence of sibling nodes using the postfix `+` and `*` _repetition_ operators, which work analogously
+to the `+` and `*` operators [in regular expressions][regex]. The `+` operator matches _one or more_ repetitions of a pattern,
+and the `*` operator matches _zero or more_.
+
+For example, this pattern would match a sequence of one or more comments:
+
+```query
+(comment)+
+```
+
+This pattern would match a class declaration, capturing all of the decorators if any were present:
+
+```query
+(class_declaration
+  (decorator)* @the-decorator
+  name: (identifier) @the-name)
+```
+
+You can also mark a node as optional using the `?` operator. For example, this pattern would match all function calls, capturing
+a string argument if one was present:
+
+```query
+(call_expression
+  function: (identifier) @the-function
+  arguments: (arguments (string)? @the-string-arg))
+```
+
+## Grouping Sibling Nodes
+
+You can also use parentheses for grouping a sequence of _sibling_ nodes. For example, this pattern would match a comment
+followed by a function declaration:
+
+```query
+(
+  (comment)
+  (function_declaration)
+)
+```
+
+Any of the quantification operators mentioned above (`+`, `*`, and `?`) can also be applied to groups. For example, this
+pattern would match a comma-separated series of numbers:
+
+```query
+(
+  (number)
+  ("," (number))*
+)
+```
+
+## Alternations
+
+An alternation is written as a pair of square brackets (`[]`) containing a list of alternative patterns.
+This is similar to _character classes_ from regular expressions (`[abc]` matches either a, b, or c).
+
+For example, this pattern would match a call to either a variable or an object property.
+In the case of a variable, capture it as `@function`, and in the case of a property, capture it as `@method`:
+
+```query
+(call_expression
+  function: [
+    (identifier) @function
+    (member_expression
+      property: (property_identifier) @method)
+  ])
+```
+
+This pattern would match a set of possible keyword tokens, capturing them as `@keyword`:
+
+```query
+[
+  "break"
+  "delete"
+  "else"
+  "for"
+  "function"
+  "if"
+  "return"
+  "try"
+  "while"
+] @keyword
+```
+
+## Anchors
+
+The anchor operator, `.`, is used to constrain the ways in which child patterns are matched. It has different behaviors
+depending on where it's placed inside a query.
+
+When `.` is placed before the _first_ child within a parent pattern, the child will only match when it is the first named
+node in the parent. For example, the below pattern matches a given `array` node at most once, assigning the `@the-element`
+capture to the first `identifier` node in the parent `array`:
+
+```query
+(array . (identifier) @the-element)
+```
+
+Without this anchor, the pattern would match once for every identifier in the array, with `@the-element` bound
+to each matched identifier.
+
+Similarly, an anchor placed after a pattern's _last_ child will cause that child pattern to only match nodes that are the
+last named child of their parent. The below pattern matches only nodes that are the last named child within a `block`.
+
+```query
+(block (_) @last-expression .)
+```
+
+Finally, an anchor _between_ two child patterns will cause the patterns to only match nodes that are immediate siblings.
+The pattern below, given a long dotted name like `a.b.c.d`, will only match pairs of consecutive identifiers:
+`a, b`, `b, c`, and `c, d`.
+
+```query
+(dotted_name
+  (identifier) @prev-id
+  .
+  (identifier) @next-id)
+```
+
+Without the anchor, non-consecutive pairs like `a, c` and `b, d` would also be matched.
+
+The restrictions placed on a pattern by an anchor operator ignore anonymous nodes.
+
+[regex]: https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts
--- a/docs/src/using-parsers/queries/3-predicates-and-directives.md
+++ b/docs/src/using-parsers/queries/3-predicates-and-directives.md
@ -0,0 +1,199 @@
+# Predicates
+
+You can also specify arbitrary metadata and conditions associated with a pattern
+by adding _predicate_ S-expressions anywhere within your pattern. Predicate S-expressions
+start with a _predicate name_ beginning with a `#` character, and ending with a `?` character. After that, they can
+contain an arbitrary number of `@`-prefixed capture names or strings.
+
+Tree-sitter's CLI supports the following predicates by default:
+
+## The `eq?` predicate
+
+This family of predicates allows you to match against a single capture or string
+value.
+
+The first argument to this predicate must be a capture, but the second can be either a capture to
+compare the two captures' text, or a string to compare first capture's text
+against.
+
+The base predicate is `#eq?`, but its complement, `#not-eq?`, can be used to _not_
+match a value. Additionally, you can prefix either of these with `any-` to match
+if _any_ of the nodes match the predicate. This is only useful when dealing with
+quantified captures, as by default a quantified capture will only match if _all_ the captured nodes match the predicate.
+
+Thus, there are four predicates in total:
+
+- `#eq?`
+- `#not-eq?`
+- `#any-eq?`
+- `#any-not-eq?`
+
+Consider the following example targeting C:
+
+```query
+((identifier) @variable.builtin
+  (#eq? @variable.builtin "self"))
+```
+
+This pattern would match any identifier that is `self`.
+
+Now consider the following example:
+
+```query
+(
+  (pair
+    key: (property_identifier) @key-name
+    value: (identifier) @value-name)
+  (#eq? @key-name @value-name)
+)
+```
+
+This pattern would match key-value pairs where the `value` is an identifier
+with the same text as the key (meaning they are the same):
+
+As mentioned earlier, the `any-` prefix is meant for use with quantified captures. Here's
+an example finding an empty comment within a group of comments:
+
+```query
+((comment)+ @comment.empty
+  (#any-eq? @comment.empty "//"))
+```
+
+## The `match?` predicate
+
+These predicates are similar to the `eq?` predicates, but they use regular expressions
+to match against the capture's text instead of string comparisons.
+
+The first argument must be a capture, and the second must be a string containing
+a regular expression.
+
+Like the `eq?` predicate family, we can tack on `not-` to the beginning of the predicate
+to negate the match, and `any-` to match if _any_ of the nodes in a quantified capture match the predicate.
+
+This pattern matches identifiers written in `SCREAMING_SNAKE_CASE`.
+
+```query
+((identifier) @constant
+  (#match? @constant "^[A-Z][A-Z_]+"))
+```
+
+This query identifies documentation comments in C that begin with three forward slashes (`///`).
+
+```query
+((comment)+ @comment.documentation
+  (#match? @comment.documentation "^///\\s+.*"))
+```
+
+This query finds C code embedded in Go comments that appear just before a "C" import statement.
+These are known as [`Cgo`][cgo] comments and are used to inject C code into Go programs.
+
+```query
+((comment)+ @injection.content
+  .
+  (import_declaration
+    (import_spec path: (interpreted_string_literal) @_import_c))
+  (#eq? @_import_c "\"C\"")
+  (#match? @injection.content "^//"))
+```
+
+## The `any-of?` predicate
+
+The `any-of?` predicate allows you to match a capture against multiple strings,
+and will match if the capture's text is equal to any of the strings.
+
+The query below will match any of the builtin variables in JavaScript.
+
+```query
+((identifier) @variable.builtin
+  (#any-of? @variable.builtin
+        "arguments"
+        "module"
+        "console"
+        "window"
+        "document"))
+```
+
+## The `is?` predicate
+
+The `is?` predicate allows you to assert that a capture has a given property. This isn't widely used, but the CLI uses it
+to determine whether a given node is a local variable or not, for example:
+
+```query
+((identifier) @variable.builtin
+  (#match? @variable.builtin "^(arguments|module|console|window|document)$")
+  (#is-not? local))
+```
+
+This pattern would match any builtin variable that is not a local variable, because the `#is-not? local` predicate is used.
+
+# Directives
+
+Similar to predicates, directives are a way to associate arbitrary metadata with a pattern. The only difference between predicates
+and directives is that directives end in a `!` character instead of `?` character.
+
+Tree-sitter's CLI supports the following directives by default:
+
+## The `set!` directive
+
+This directive allows you to associate key-value pairs with a pattern. The key and value can be any arbitrary text that you
+see fit.
+
+```query
+((comment) @injection.content
+  (#lua-match? @injection.content "/[*\/][!*\/]<?[^a-zA-Z]")
+  (#set! injection.language "doxygen"))
+```
+
+This pattern would match any comment that contains a Doxygen-style comment, and then sets the `injection.language` key to
+`"doxygen"`. Programmatically, when iterating the captures of this pattern, you can access this property to then parse the
+comment with the Doxygen parser.
+
+### The `#select-adjacent!` directive
+
+The `#select-adjacent!` directive allows you to filter the text associated with a capture so that only nodes adjacent to
+another capture are preserved. It takes two arguments, both of which are capture names.
+
+### The `#strip!` directive
+
+The `#strip!` directive allows you to remove text from a capture. It takes two arguments: the first is the capture to strip
+text from, and the second is a regular expression to match against the text. Any text matched by the regular expression will
+be removed from the text associated with the capture.
+
+For an example on the `#select-adjacent!` and `#strip!` directives,
+view the [code navigation](../../4-code-navigation.md#examples) documentation.
+
+## Recap
+
+To recap about the predicates and directives Tree-Sitter's bindings support:
+
+- `#eq?` checks for a direct match against a capture or string
+
+- `#match?` checks for a match against a regular expression
+
+- `#any-of?` checks for a match against a list of strings
+
+- `#is?` checks for a property on a capture
+
+- Adding `not-` to the beginning of these predicates will negate the match
+
+- By default, a quantified capture will only match if _all_ the nodes match the predicate
+
+- Adding `any-` before the `eq` or `match` predicates will instead match if any of the nodes match the predicate
+
+- `#set!` associates key-value pairs with a pattern
+
+- `#select-adjacent!` filters the text associated with a capture so that only nodes adjacent to another capture are preserved
+
+- `#strip!` removes text from a capture
+
+_Note_ — Predicates and directives are not handled directly by the Tree-sitter C library.
+They are just exposed in a structured form so that higher-level code can perform
+the filtering. However, higher-level bindings to Tree-sitter like
+[the Rust Crate][rust crate]
+or the [WebAssembly binding][wasm binding]
+do implement a few common predicates like those explained above. In the future, more "standard" predicates and directives
+may be added.
+
+[cgo]: https://pkg.go.dev/cmd/cgo
+[rust crate]: https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust
+[wasm binding]: https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web
--- a/docs/src/using-parsers/queries/4-api.md
+++ b/docs/src/using-parsers/queries/4-api.md
@ -0,0 +1,61 @@
+# The Query API
+
+Create a query by specifying a string containing one or more patterns:
+
+```c
+TSQuery *ts_query_new(
+  const TSLanguage *language,
+  const char *source,
+  uint32_t source_len,
+  uint32_t *error_offset,
+  TSQueryError *error_type
+);
+```
+
+If there is an error in the query, then the `error_offset` argument will be set to the byte offset of the error, and the
+`error_type` argument will be set to a value that indicates the type of error:
+
+```c
+typedef enum {
+  TSQueryErrorNone = 0,
+  TSQueryErrorSyntax,
+  TSQueryErrorNodeType,
+  TSQueryErrorField,
+  TSQueryErrorCapture,
+} TSQueryError;
+```
+
+The `TSQuery` value is immutable and can be safely shared between threads. To execute the query, create a `TSQueryCursor`,
+which carries the state needed for processing the queries. The query cursor should not be shared between threads, but can
+be reused for many query executions.
+
+```c
+TSQueryCursor *ts_query_cursor_new(void);
+```
+
+You can then execute the query on a given syntax node:
+
+```c
+void ts_query_cursor_exec(TSQueryCursor *, const TSQuery *, TSNode);
+```
+
+You can then iterate over the matches:
+
+```c
+typedef struct {
+  TSNode node;
+  uint32_t index;
+} TSQueryCapture;
+
+typedef struct {
+  uint32_t id;
+  uint16_t pattern_index;
+  uint16_t capture_count;
+  const TSQueryCapture *captures;
+} TSQueryMatch;
+
+bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match);
+```
+
+This function will return `false` when there are no more matches. Otherwise, it will populate the `match` with data about
+which pattern matched and which nodes were captured.
--- a/docs/src/using-parsers/queries/index.md
+++ b/docs/src/using-parsers/queries/index.md
@ -0,0 +1,7 @@
+# Pattern Matching with Queries
+
+Code analysis often requires finding specific patterns in source code. Tree-sitter provides a simple pattern-matching
+language for this purpose, similar to what's used in its [unit test system][unit testing].
+This allows you to express and search for code structures without writing complex parsing logic.
+
+[unit testing]: ../../creating-parsers/5-writing-tests.md