docs: update badges; fix markdown lint complains

Linter config `.vscode/settings.json`:
```json
{
    "[markdown]": {
        "files.trimTrailingWhitespace": false,
    },
    "markdownlint.config": {
        "default": true,
        // "ul-style": {
        //     "style": "asterisk"
        // },
        "MD001": false,
        "MD024": false,
        "MD025": false,
        "MD033": false,
        "MD041": false,
        "MD053": false,
    },
}
```
This commit is contained in:
Andrew Hlynskyi 2023-04-16 21:14:19 +03:00
parent 6c520452ad
commit 613382c70a
14 changed files with 121 additions and 95 deletions

View file

@ -1,8 +1,11 @@
# tree-sitter
[![CICD](https://github.com/tree-sitter/tree-sitter/actions/workflows/CICD.yml/badge.svg)](https://github.com/tree-sitter/tree-sitter/actions/workflows/CICD.yml)
[![CICD badge]][CICD]
[![DOI](https://zenodo.org/badge/14164618.svg)](https://zenodo.org/badge/latestdoi/14164618)
[CICD badge]: https://github.com/tree-sitter/tree-sitter/actions/workflows/CICD.yml/badge.svg
[CICD]: https://github.com/tree-sitter/tree-sitter/actions/workflows/CICD.yml
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
- **General** enough to parse any programming language

View file

@ -1,7 +1,11 @@
Tree-sitter CLI
===============
# Tree-sitter CLI
[![Crates.io](https://img.shields.io/crates/v/tree-sitter-cli.svg)](https://crates.io/crates/tree-sitter-cli)
[![crates.io badge]][crates.io] [![npmjs.com badge]][npmjs.com]
[crates.io]: https://crates.io/crates/tree-sitter-cli
[crates.io badge]: https://img.shields.io/crates/v/tree-sitter-cli.svg?color=%23B48723
[npmjs.com]: https://www.npmjs.org/package/tree-sitter-cli
[npmjs.com badge]: https://img.shields.io/npm/v/tree-sitter-cli.svg?color=%23BF4A4A
The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on MacOS, Linux, and Windows.
@ -19,7 +23,7 @@ or with `npm`:
npm install tree-sitter-cli
```
You can also download a pre-built binary for your platform from [the releases page](https://github.com/tree-sitter/tree-sitter/releases/latest).
You can also download a pre-built binary for your platform from [the releases page].
### Dependencies
@ -30,8 +34,11 @@ The `tree-sitter` binary itself has no dependencies, but specific commands have
### Commands
* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation](https://tree-sitter.github.io/tree-sitter/creating-parsers) for more information.
* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation] for more information.
* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation](https://tree-sitter.github.io/tree-sitter/creating-parsers) for more information.
* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation] for more information.
* `parse` - The `tree-sitter parse` command will parse a file (or list of files) using Tree-sitter parsers.
[the documentation]: https://tree-sitter.github.io/tree-sitter/creating-parsers
[the releases page]: https://github.com/tree-sitter/tree-sitter/releases/latest

View file

@ -160,9 +160,9 @@ By convention, parsers are named with the language last, eg. tree-sitter-ruby.
The design of Tree-sitter was greatly influenced by the following research papers:
- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
- [Context Aware Scanning for Parsing Extensible Languages](https://www-users.cse.umn.edu/~evw/pubs/vanwyk07gpce/vanwyk07gpce.pdf)
- [Efficient and Flexible Incremental Parsing](https://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf)
- [Incremental Analysis of Real Programming Languages](https://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf)
- [Error Detection and Recovery in LR Parsers](https://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
- [Error Recovery for LR Parsers](https://apps.dtic.mil/sti/pdfs/ADA043470.pdf)
* [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
* [Context Aware Scanning for Parsing Extensible Languages](https://www-users.cse.umn.edu/~evw/pubs/vanwyk07gpce/vanwyk07gpce.pdf)
* [Efficient and Flexible Incremental Parsing](https://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf)
* [Incremental Analysis of Real Programming Languages](https://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf)
* [Error Detection and Recovery in LR Parsers](https://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
* [Error Recovery for LR Parsers](https://apps.dtic.mil/sti/pdfs/ADA043470.pdf)

View file

@ -21,21 +21,21 @@ Alternatively, you can incorporate the library in a larger project's build syste
**source file:**
- `tree-sitter/lib/src/lib.c`
* `tree-sitter/lib/src/lib.c`
**include directories:**
- `tree-sitter/lib/src`
- `tree-sitter/lib/include`
* `tree-sitter/lib/src`
* `tree-sitter/lib/include`
### The Basic Objects
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`.
- A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
- A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
- A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
- A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
* A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
### An Example Program
@ -442,13 +442,13 @@ Many code analysis tasks involve searching for patterns in syntax trees. Tree-si
A _query_ consists of one or more _patterns_, where each pattern is an [S-expression](https://en.wikipedia.org/wiki/S-expression) that matches a certain set of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the node's type, and optionally, a series of other S-expressions that match the node's children. For example, this pattern would match any `binary_expression` node whose children are both `number_literal` nodes:
``` scheme
```scheme
(binary_expression (number_literal) (number_literal))
```
Children can also be omitted. For example, this would match any `binary_expression` where at least _one_ of child is a `string_literal` node:
``` scheme
```scheme
(binary_expression (string_literal))
```
@ -456,7 +456,7 @@ Children can also be omitted. For example, this would match any `binary_expressi
In general, it's a good idea to make patterns more specific by specifying [field names](#node-field-names) associated with child nodes. You do this by prefixing a child pattern with a field name followed by a colon. For example, this pattern would match an `assignment_expression` node where the `left` child is a `member_expression` whose `object` is a `call_expression`.
``` scheme
```scheme
(assignment_expression
left: (member_expression
object: (call_expression)))
@ -464,9 +464,9 @@ In general, it's a good idea to make patterns more specific by specifying [field
#### Negated Fields
You can also constrain a pattern so that it only matches nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
You can also constrain a pattern so that it only matches nodes that _lack_ a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
``` scheme
```scheme
(class_declaration
name: (identifier) @class_name
!type_parameters)
@ -476,7 +476,7 @@ You can also constrain a pattern so that it only matches nodes that *lack* a cer
The parenthesized syntax for writing nodes only applies to [named nodes](#named-vs-anonymous-nodes). To match specific anonymous nodes, you write their name between double quotes. For example, this pattern would match any `binary_expression` where the operator is `!=` and the right side is `null`:
``` scheme
```scheme
(binary_expression
operator: "!="
right: (null))
@ -488,7 +488,7 @@ When matching patterns, you may want to process specific nodes within the patter
For example, this pattern would match any assignment of a `function` to an `identifier`, and it would associate the name `the-function-name` with the identifier:
``` scheme
```scheme
(assignment_expression
left: (identifier) @the-function-name
right: (function))
@ -496,7 +496,7 @@ For example, this pattern would match any assignment of a `function` to an `iden
And this pattern would match all method definitions, associating the name `the-method-name` with the method name, `the-class-name` with the containing class name:
``` scheme
```scheme
(class_declaration
name: (identifier) @the-class-name
body: (class_body
@ -510,13 +510,13 @@ You can match a repeating sequence of sibling nodes using the postfix `+` and `*
For example, this pattern would match a sequence of one or more comments:
``` scheme
```scheme
(comment)+
```
This pattern would match a class declaration, capturing all of the decorators if any were present:
``` scheme
```scheme
(class_declaration
(decorator)* @the-decorator
name: (identifier) @the-name)
@ -524,7 +524,7 @@ This pattern would match a class declaration, capturing all of the decorators if
You can also mark a node as optional using the `?` operator. For example, this pattern would match all function calls, capturing a string argument if one was present:
``` scheme
```scheme
(call_expression
function: (identifier) @the-function
arguments: (arguments (string)? @the-string-arg))
@ -534,7 +534,7 @@ You can also mark a node as optional using the `?` operator. For example, this p
You can also use parentheses for grouping a sequence of _sibling_ nodes. For example, this pattern would match a comment followed by a function declaration:
``` scheme
```scheme
(
(comment)
(function_declaration)
@ -543,7 +543,7 @@ You can also use parentheses for grouping a sequence of _sibling_ nodes. For exa
Any of the quantification operators mentioned above (`+`, `*`, and `?`) can also be applied to groups. For example, this pattern would match a comma-separated series of numbers:
``` scheme
```scheme
(
(number)
("," (number))*
@ -558,7 +558,7 @@ This is similar to _character classes_ from regular expressions (`[abc]` matches
For example, this pattern would match a call to either a variable or an object property.
In the case of a variable, capture it as `@function`, and in the case of a property, capture it as `@method`:
``` scheme
```scheme
(call_expression
function: [
(identifier) @function
@ -569,7 +569,7 @@ In the case of a variable, capture it as `@function`, and in the case of a prope
This pattern would match a set of possible keyword tokens, capturing them as `@keyword`:
``` scheme
```scheme
[
"break"
"delete"
@ -592,7 +592,7 @@ and `_` will match any named or anonymous node.
For example, this pattern would match any node inside a call:
``` scheme
```scheme
(call (_) @call.inner)
```
@ -602,7 +602,7 @@ The anchor operator, `.`, is used to constrain the ways in which child patterns
When `.` is placed before the _first_ child within a parent pattern, the child will only match when it is the first named node in the parent. For example, the below pattern matches a given `array` node at most once, assigning the `@the-element` capture to the first `identifier` node in the parent `array`:
``` scheme
```scheme
(array . (identifier) @the-element)
```
@ -610,13 +610,13 @@ Without this anchor, the pattern would match once for every identifier in the ar
Similarly, an anchor placed after a pattern's _last_ child will cause that child pattern to only match nodes that are the last named child of their parent. The below pattern matches only nodes that are the last named child within a `block`.
``` scheme
```scheme
(block (_) @last-expression .)
```
Finally, an anchor _between_ two child patterns will cause the patterns to only match nodes that are immediate siblings. The pattern below, given a long dotted name like `a.b.c.d`, will only match pairs of consecutive identifiers: `a, b`, `b, c`, and `c, d`.
``` scheme
```scheme
(dotted_name
(identifier) @prev-id
.
@ -633,7 +633,7 @@ You can also specify arbitrary metadata and conditions associated with a pattern
For example, this pattern would match identifier whose names is written in `SCREAMING_SNAKE_CASE`:
``` scheme
```scheme
(
(identifier) @constant
(#match? @constant "^[A-Z][A-Z_]+")
@ -642,7 +642,7 @@ For example, this pattern would match identifier whose names is written in `SCRE
And this pattern would match key-value pairs where the `value` is an identifier with the same name as the key:
``` scheme
```scheme
(
(pair
key: (property_identifier) @key-name
@ -723,8 +723,8 @@ The node types file contains an array of objects, each of which describes a part
Every object in this array has these two entries:
- `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
- `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
Examples:
@ -745,14 +745,14 @@ Together, these two fields constitute a unique identifier for a node type; no tw
Many syntax nodes can have _children_. The node type object describes the possible children that a node can have using the following entries:
- `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below.
- `"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields.
* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below.
* `"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields.
A _child type_ object describes a set of child nodes using the following entries:
- `"required"` - A boolean indicating whether there is always _at least one_ node in this set.
- `"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set.
- `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
* `"required"` - A boolean indicating whether there is always _at least one_ node in this set.
* `"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set.
* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
Example with fields:
@ -812,7 +812,7 @@ In Tree-sitter grammars, there are usually certain rules that represent abstract
Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it _will_ show up in the node types file, with the following special entry:
- `"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.
* `"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.
Example:

View file

@ -80,7 +80,9 @@ You can test this parser by creating a source file with the contents "hello" and
echo 'hello' > example-file
tree-sitter parse example-file
```
Alternatively, in Windows PowerShell:
```pwsh
"hello" | Out-File example-file -Encoding utf8
tree-sitter parse example-file
@ -88,7 +90,7 @@ tree-sitter parse example-file
This should print the following:
```
```text
(source_file [0, 0] - [1, 0])
```
@ -121,7 +123,7 @@ For each rule that you add to the grammar, you should first create a *test* that
For example, you might have a file called `test/corpus/statements.txt` that contains a series of entries like this:
```
```text
==================
Return statements
==================
@ -147,7 +149,7 @@ func x() int {
The expected output section can also *optionally* show the [*field names*][field-names-section] associated with each child node. To include field names in your tests, you write a node's field name followed by a colon, before the node itself in the S-expression:
```
```text
(source_file
(function_definition
name: (identifier)
@ -159,7 +161,7 @@ func x() int {
* If your language's syntax conflicts with the `===` and `---` test separators, you can optionally add an arbitrary identical suffix (in the below example, `|||`) to disambiguate them:
```
```text
==================|||
Basic module
==================|||
@ -199,7 +201,7 @@ The `tree-sitter test` command will *also* run any syntax highlighting tests in
You can run your parser on an arbitrary file using `tree-sitter parse`. This will print the resulting the syntax tree, including nodes' ranges and field names, like this:
```
```text
(source_file [0, 0] - [3, 0]
(function_declaration [0, 0] - [2, 1]
name: (identifier [0, 5] - [0, 9])
@ -251,7 +253,6 @@ In addition to the `name` and `rules` fields, grammars have a few other optional
* **`word`** - the name of a token that will match keywords for the purpose of the [keyword extraction](#keyword-extraction) optimization.
* **`supertypes`** an array of hidden rule names which should be considered to be 'supertypes' in the generated [*node types* file][static-node-types].
## Writing the Grammar
Writing a grammar requires creativity. There are an infinite number of CFGs (context-free grammars) that can be used to describe any given language. In order to produce a good Tree-sitter parser, you need to create a grammar with two important properties:
@ -375,7 +376,7 @@ return x + y;
According to the specification, this line is a `ReturnStatement`, the fragment `x + y` is an `AdditiveExpression`, and `x` and `y` are both `IdentifierReferences`. The relationship between these constructs is captured by a complex series of production rules:
```
```text
ReturnStatement -> 'return' Expression
Expression -> AssignmentExpression
AssignmentExpression -> ConditionalExpression
@ -432,7 +433,7 @@ To produce a readable syntax tree, we'd like to model JavaScript expressions usi
Of course, this flat structure is highly ambiguous. If we try to generate a parser, Tree-sitter gives us an error message:
```
```text
Error: Unresolved conflict for symbol sequence:
'-' _expression • '*' …
@ -468,7 +469,7 @@ For an expression like `-a * b`, it's not clear whether the `-` operator applies
Applying a higher precedence in `unary_expression` fixes that conflict, but there is still another conflict:
```
```text
Error: Unresolved conflict for symbol sequence:
_expression '*' _expression • '*' …
@ -606,6 +607,7 @@ Aside from improving error detection, keyword extraction also has performance be
### External Scanners
Many languages have some tokens whose structure is impossible or inconvenient to describe with a regular expression. Some examples:
* [Indent and dedent][indent-tokens] tokens in Python
* [Heredocs][heredoc] in Bash and Ruby
* [Percent strings][percent-string] in Ruby
@ -654,7 +656,6 @@ void * tree_sitter_my_language_external_scanner_create() {
This function should create your scanner object. It will only be called once anytime your language is set on a parser. Often, you will want to allocate memory on the heap and return a pointer to it. If your external scanner doesn't need to maintain any state, it's ok to return `NULL`.
#### Destroy
```c
@ -714,10 +715,10 @@ This function is responsible for recognizing external tokens. It should return `
* **`void (*advance)(TSLexer *, bool skip)`** - A function for advancing to the next character. If you pass `true` for the second argument, the current character will be treated as whitespace; whitespace won't be included in the text range associated with tokens emitted by the external scanner.
* **`void (*mark_end)(TSLexer *)`** - A function for marking the end of the recognized token. This allows matching tokens that require multiple characters of lookahead. By default (if you don't call `mark_end`), any character that you moved past using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later calls to `advance` will *not* increase the size of the returned token. You can call `mark_end` multiple times to increase the size of the token.
* **`uint32_t (*get_column)(TSLexer *)`** - A function for querying the current column position of the lexer. It returns the number of codepoints since the start of the current line. The codepoint position is recalculated on every call to this function by reading from the start of the line.
* **`bool (*is_at_included_range_start)(const TSLexer *)`** - A function for checking whether the parser has just skipped some characters in the document. When parsing an embedded document using the `ts_parser_set_included_ranges` function (described in the [multi-language document section][multi-language-section]), your scanner may want to apply some special behavior when moving to a disjoint part of the document. For example, in [EJS documents][ejs], the JavaScript parser uses this function to enable inserting automatic semicolon tokens in between the code directives, delimited by `<%` and `%>`.
* **`bool (*is_at_included_range_start)(const TSLexer *)`** - A function for checking whether the parser has just skipped some characters in the document. When parsing an embedded document using the `ts_parser_set_included_ranges` function (described in the [multi-language document section][multi-language-section]), the scanner may want to apply some special behavior when moving to a disjoint part of the document. For example, in [EJS documents][ejs], the JavaScript parser uses this function to enable inserting automatic semicolon tokens in between the code directives, delimited by `<%` and `%>`.
* **`bool (*eof)(const TSLexer *)`** - A function for determining whether the lexer is at the end of the file. The value of `lookahead` will be `0` at the end of a file, but this function should be used instead of checking for that value because the `0` or "NUL" value is also a valid character that could be present in the file being parsed.
The third argument to the `scan` function is an array of booleans that indicates which of your external tokens are currently expected by the parser. You should only look for a given token if it is valid according to this array. At the same time, you cannot backtrack, so you may need to combine certain pieces of logic.
The third argument to the `scan` function is an array of booleans that indicates which of external tokens are currently expected by the parser. You should only look for a given token if it is valid according to this array. At the same time, you cannot backtrack, so you may need to combine certain pieces of logic.
```c
if (valid_symbols[INDENT] || valid_symbol[DEDENT]) {

View file

@ -25,9 +25,9 @@ The Tree-sitter CLI automatically creates two directories in your home folder.
These directories are created in the "normal" place for your platform:
- On Linux, `~/.config/tree-sitter` and `~/.cache/tree-sitter`
- On Mac, `~/Library/Application Support/tree-sitter` and `~/Library/Caches/tree-sitter`
- On Windows, `C:\Users\[username]\AppData\Roaming\tree-sitter` and `C:\Users\[username]\AppData\Local\tree-sitter`
* On Linux, `~/.config/tree-sitter` and `~/.cache/tree-sitter`
* On Mac, `~/Library/Application Support/tree-sitter` and `~/Library/Caches/tree-sitter`
* On Windows, `C:\Users\[username]\AppData\Roaming\tree-sitter` and `C:\Users\[username]\AppData\Local\tree-sitter`
The CLI will work if there's no config file present, falling back on default values for each configuration option. To create a config file that you can edit, run this command:
@ -61,6 +61,7 @@ In your config file, the `"theme"` value is an object whose keys are dot-separat
#### Highlight Names
A theme can contain multiple keys that share a common subsequence. Examples:
* `variable` and `variable.parameter`
* `function`, `function.builtin`, and `function.method`
@ -158,7 +159,7 @@ func increment(a int) int {
With this syntax tree:
```
```scheme
(source_file
(function_declaration
name: (identifier)
@ -178,6 +179,7 @@ With this syntax tree:
#### Example Query
Suppose we wanted to render this code with the following colors:
* keywords `func` and `return` in purple
* function `increment` in blue
* type `int` in green
@ -185,7 +187,7 @@ Suppose we wanted to render this code with the following colors:
We can assign each of these categories a *highlight name* using a query like this:
```
```scheme
; highlights.scm
"func" @keyword
@ -252,7 +254,7 @@ list = [item]
With this syntax tree:
```
```scheme
(program
(method
name: (identifier)
@ -295,7 +297,7 @@ There are several different types of names within this method:
Let's write some queries that let us clearly distinguish between these types of names. First, set up the highlighting query, as described in the previous section. We'll assign distinct colors to method calls, method definitions, and formal parameters:
```
```scheme
; highlights.scm
(call method: (identifier) @function.method)
@ -312,7 +314,7 @@ Let's write some queries that let us clearly distinguish between these types of
Then, we'll set up a local variable query to keep track of the variables and scopes. Here, we're indicating that methods and blocks create local *scopes*, parameters and assignments create *definitions*, and other identifiers should be considered *references*:
```
```scheme
; locals.scm
(method) @local.scope
@ -345,6 +347,7 @@ Running `tree-sitter highlight` on this ruby file would produce output like this
### Language Injection
Some source files contain code written in multiple different languages. Examples include:
* HTML files, which can contain JavaScript inside of `<script>` tags and CSS inside of `<style>` tags
* [ERB](https://en.wikipedia.org/wiki/ERuby) files, which contain Ruby inside of `<% %>` tags, and HTML outside of those tags
* PHP files, which can contain HTML between the `<php` tags
@ -374,7 +377,7 @@ BASH
With this syntax tree:
```
```scheme
(program
(method_call
method: (identifier)
@ -388,7 +391,7 @@ With this syntax tree:
The following query would specify that the contents of the heredoc should be parsed using a language named "BASH" (because that is the text of the `heredoc_end` node):
```
```scheme
(heredoc_body
(heredoc_end) @injection.language) @injection.content
```
@ -396,7 +399,7 @@ The following query would specify that the contents of the heredoc should be par
You can also force the language using the `#set!` predicate.
For example, this will force the language to be always `ruby`.
```
```scheme
((heredoc_body) @injection.content
(#set! injection.language "ruby"))
```

View file

@ -35,8 +35,6 @@ At the end of these transformations, the initial grammar is split into two gramm
### Building Parse Tables
## The Runtime
WIP

View file

@ -96,18 +96,18 @@ script/test -l javascript -e Arrays
The main [`tree-sitter/tree-sitter`](https://github.com/tree-sitter/tree-sitter) repository contains the source code for several packages that are published to package registries for different languages:
- Rust crates on [crates.io](https://crates.io):
- [`tree-sitter`](https://crates.io/crates/tree-sitter) - A Rust binding to the core library
- [`tree-sitter-highlight`](https://crates.io/crates/tree-sitter-highlight) - The syntax-highlighting library
- [`tree-sitter-cli`](https://crates.io/crates/tree-sitter-cli) - The command-line tool
- JavaScript modules on [npmjs.com](https://npmjs.com):
- [`web-tree-sitter`](https://www.npmjs.com/package/web-tree-sitter) - A WASM-based JavaScript binding to the core library
- [`tree-sitter-cli`](https://www.npmjs.com/package/tree-sitter-cli) - The command-line tool
* Rust crates on [crates.io](https://crates.io):
* [`tree-sitter`](https://crates.io/crates/tree-sitter) - A Rust binding to the core library
* [`tree-sitter-highlight`](https://crates.io/crates/tree-sitter-highlight) - The syntax-highlighting library
* [`tree-sitter-cli`](https://crates.io/crates/tree-sitter-cli) - The command-line tool
* JavaScript modules on [npmjs.com](https://npmjs.com):
* [`web-tree-sitter`](https://www.npmjs.com/package/web-tree-sitter) - A WASM-based JavaScript binding to the core library
* [`tree-sitter-cli`](https://www.npmjs.com/package/tree-sitter-cli) - The command-line tool
There are also several other dependent repositories that contain other published packages:
- [`tree-sitter/node-tree-sitter`](https://github.com/tree-sitter/node-tree-sitter) - Node.js bindings to the core library, published as [`tree-sitter`](https://www.npmjs.com/package/tree-sitter) on npmjs.com
- [`tree-sitter/py-tree-sitter`](https://github.com/tree-sitter/py-tree-sitter) - Python bindings to the core library, published as [`tree-sitter`](https://pypi.org/project/tree-sitter) on [PyPI.org](https://pypi.org).
* [`tree-sitter/node-tree-sitter`](https://github.com/tree-sitter/node-tree-sitter) - Node.js bindings to the core library, published as [`tree-sitter`](https://www.npmjs.com/package/tree-sitter) on npmjs.com
* [`tree-sitter/py-tree-sitter`](https://github.com/tree-sitter/py-tree-sitter) - Python bindings to the core library, published as [`tree-sitter`](https://pypi.org/project/tree-sitter) on [PyPI.org](https://pypi.org).
## Publishing New Releases

View file

@ -9,7 +9,7 @@ Tree-sitter can be used in conjunction with its [tree query language](https://tr
## Tagging and captures
*Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name.
_Tagging_ is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name.
The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier.
@ -19,14 +19,14 @@ You may optionally include a capture named `@doc` to bind a docstring. For conve
This [query](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5) recognizes Python function definitions and captures their declared name. The `function_definition` syntax node is defined in the [Python Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354).
``` scheme
```scheme
(function_definition
name: (identifier) @name) @definition.function
```
A more sophisticated query can be found in the [JavaScript Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-javascript/blob/fdeb68ac8d2bd5a78b943528bb68ceda3aade2eb/queries/tags.scm#L63-L70):
``` scheme
```scheme
(assignment_expression
left: [
(identifier) @name
@ -39,7 +39,7 @@ A more sophisticated query can be found in the [JavaScript Tree-sitter repositor
An even more sophisticated query is in the [Ruby Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-ruby/blob/1ebfdb288842dae5a9233e2509a135949023dd82/queries/tags.scm#L24-L43), which uses built-in functions to strip the Ruby comment character (`#`) from the docstrings associated with a class or singleton-class declaration, then selects only the docstrings adjacent to the node matched as `@definition.class`.
``` scheme
```scheme
(
(comment)* @doc
.
@ -79,7 +79,7 @@ The below table describes a standard vocabulary for kinds and roles during the t
You can use the `tree-sitter tags` command to test out a tags query file, passing as arguments one or more files to tag. We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb`:
``` ruby
```ruby
module Foo
class Bar
# won't be included
@ -93,7 +93,7 @@ end
Invoking `tree-sitter tags test.rb` produces the following console output, representing matched entities' name, role, location, first line, and docstring:
```
```text
test.rb
Foo | module def (0, 7) - (0, 10) `module Foo`
Bar | class def (1, 8) - (1, 11) `class Bar`

View file

@ -1,6 +1,9 @@
# `tree-sitter-highlight`
[![Crates.io](https://img.shields.io/crates/v/tree-sitter-highlight.svg)](https://crates.io/crates/tree-sitter-highlight)
[![crates.io badge]][crates.io]
[crates.io]: https://crates.io/crates/tree-sitter-highlight
[crates.io badge]: https://img.shields.io/crates/v/tree-sitter-highlight.svg?color=%23B48723
### Usage

View file

@ -1,5 +1,4 @@
Subdirectories
--------------
## Subdirectories
* [`src`](./src) - C source code for the Tree-sitter library
* [`include`](./include) - C headers for the Tree-sitter library

View file

@ -1,6 +1,9 @@
# Rust Tree-sitter
[![Crates.io](https://img.shields.io/crates/v/tree-sitter.svg)](https://crates.io/crates/tree-sitter)
[![crates.io badge]][crates.io]
[crates.io]: https://crates.io/crates/tree-sitter
[crates.io badge]: https://img.shields.io/crates/v/tree-sitter.svg?color=%23B48723
Rust bindings to the [Tree-sitter][] parsing library.

View file

@ -1,5 +1,9 @@
Web Tree-sitter
===============
# Web Tree-sitter
[![npmjs.com badge]][npmjs.com]
[npmjs.com]: https://www.npmjs.org/package/web-tree-sitter
[npmjs.com badge]: https://img.shields.io/npm/v/web-tree-sitter.svg?color=%23BF4A4A
WebAssembly bindings to the [Tree-sitter](https://github.com/tree-sitter/tree-sitter) parsing library.

View file

@ -1,5 +1,10 @@
# `tree-sitter-tags`
[![crates.io badge]][crates.io]
[crates.io]: https://crates.io/crates/tree-sitter-tags
[crates.io badge]: https://img.shields.io/crates/v/tree-sitter-tags.svg?color=%23B48723
### Usage
Add this crate, and the language-specific crates for whichever languages you want to parse, to your `Cargo.toml`: