diff --git a/cli/src/tests/query_test.rs b/cli/src/tests/query_test.rs index 34cf40a9..ed1f9e25 100644 --- a/cli/src/tests/query_test.rs +++ b/cli/src/tests/query_test.rs @@ -4587,7 +4587,7 @@ fn test_query_quantified_captures() { // #[rustfmt::skip] let rows = &[ Row { - description: "doc comments where all must match the prefiix", + description: "doc comments where all must match the prefix", language: get_language("c"), code: indoc! {" /// foo diff --git a/docs/section-2-using-parsers.md b/docs/section-2-using-parsers.md index 87c049e7..5106a49c 100644 --- a/docs/section-2-using-parsers.md +++ b/docs/section-2-using-parsers.md @@ -21,21 +21,21 @@ Alternatively, you can incorporate the library in a larger project's build syste **source file:** -* `tree-sitter/lib/src/lib.c` +- `tree-sitter/lib/src/lib.c` **include directories:** -* `tree-sitter/lib/src` -* `tree-sitter/lib/include` +- `tree-sitter/lib/src` +- `tree-sitter/lib/include` ### The Basic Objects There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`. -* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages. -* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code. -* A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes. -* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children. +- A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages. +- A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code. +- A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes. +- A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children. ### An Example Program @@ -629,18 +629,36 @@ The restrictions placed on a pattern by an anchor operator ignore anonymous node #### Predicates -You can also specify arbitrary metadata and conditions associated with a pattern by adding _predicate_ S-expressions anywhere within your pattern. Predicate S-expressions start with a _predicate name_ beginning with a `#` character. After that, they can contain an arbitrary number of `@`-prefixed capture names or strings. +You can also specify arbitrary metadata and conditions associated with a pattern +by adding _predicate_ S-expressions anywhere within your pattern. Predicate S-expressions +start with a _predicate name_ beginning with a `#` character. After that, they can +contain an arbitrary number of `@`-prefixed capture names or strings. -For example, this pattern would match identifier whose names is written in `SCREAMING_SNAKE_CASE`: +Tree-Sitter's CLI supports the following predicates by default: + +##### eq?, not-eq?, any-eq?, any-not-eq? + +This family of predicates allows you to match against a single capture or string +value. + +The first argument must be a capture, but the second can be either a capture to +compare the two captures' text, or a string to compare first capture's text +against. + +The base predicate is "#eq?", but its complement "#not-eq?" can be used to _not_ +match a value. + +Consider the following example targeting C: ```scheme -( - (identifier) @constant - (#match? @constant "^[A-Z][A-Z_]+") -) +((identifier) @variable.builtin + (#eq? @variable.builtin "self")) ``` -And this pattern would match key-value pairs where the `value` is an identifier with the same name as the key: +This pattern would match any identifier that is `self` or `this`. + +And this pattern would match key-value pairs where the `value` is an identifier +with the same name as the key: ```scheme ( @@ -651,7 +669,87 @@ And this pattern would match key-value pairs where the `value` is an identifier ) ``` -_Note_ - Predicates are not handled directly by the Tree-sitter C library. They are just exposed in a structured form so that higher-level code can perform the filtering. However, higher-level bindings to Tree-sitter like [the Rust crate](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust) or the [WebAssembly binding](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web) implement a few common predicates like `#eq?` and `#match?`. +The prefix "any-" is meant for use with quantified captures. Here's +an example finding a segment of empty comments + +```scheme +((comment)+ @comment.empty + (#any-eq? @comment.empty "//")) +``` + +Note that "#any-eq?" will match a quantified capture if +_any_ of the nodes match the predicate, while by default a quantified capture +will only match if _all_ the nodes match the predicate. + +##### match?, not-match?, any-match?, any-not-match? + +These predicates are similar to the eq? predicates, but they use regular expressions +to match against the capture's text. + +The first argument must be a capture, and the second must be a string containing +a regular expression. + +For example, this pattern would match identifier whose name is written in `SCREAMING_SNAKE_CASE`: + +```scheme +((identifier) @constant + (#match? @constant "^[A-Z][A-Z_]+")) +``` + +Here's an example finding potential documentation comments in C + +```scheme +((comment)+ @comment.documentation + (#match? @comment.documentation "^///\s+.*")) +``` + +Here's another example finding Cgo comments to potentially inject with C + +```scheme +((comment)+ @injection.content + . + (import_declaration + (import_spec path: (interpreted_string_literal) @_import_c)) + (#eq? @_import_c "\"C\"") + (#match? @injection.content "^//")) +``` + +##### any-of?, not-any-of? + +The "any-of?" predicate allows you to match a capture against multiple strings, +and will match if the capture's text is equal to any of the strings. + +Consider this example that targets JavaScript: + +```scheme +((identifier) @variable.builtin + (#any-of? @variable.builtin + "arguments" + "module" + "console" + "window" + "document")) +``` + +This will match any of the builtin variables in JavaScript. + +_Note_ — Predicates are not handled directly by the Tree-sitter C library. +They are just exposed in a structured form so that higher-level code can perform +the filtering. However, higher-level bindings to Tree-sitter like +[the Rust Crate](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust) +or the [WebAssembly binding](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web) +do implement a few common predicates like the `#eq?`, `#match?`, and `#any-of?` +predicates explained above. + +To recap about the predicates Tree-Sitter's bindings support: + +- `#eq?` checks for a direct match against a capture or string +- `#match?` checks for a match against a regular expression +- `#any-of?` checks for a match against a list of strings +- Adding `not-` to the beginning of any of these predicates will negate the match +- By default, a quantified capture will only match if _all_ of the nodes match the predicate +- Adding `any-` before the `eq` or `match` predicates will instead match if any of the nodes match the predicate + ### The Query API @@ -723,8 +821,8 @@ The node types file contains an array of objects, each of which describes a part Every object in this array has these two entries: -* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes). -* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info. +- `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes). +- `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info. Examples: @@ -745,14 +843,14 @@ Together, these two fields constitute a unique identifier for a node type; no tw Many syntax nodes can have _children_. The node type object describes the possible children that a node can have using the following entries: -* `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below. -* `"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields. +- `"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below. +- `"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields. A _child type_ object describes a set of child nodes using the following entries: -* `"required"` - A boolean indicating whether there is always _at least one_ node in this set. -* `"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set. -* `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above. +- `"required"` - A boolean indicating whether there is always _at least one_ node in this set. +- `"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set. +- `"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above. Example with fields: @@ -812,7 +910,7 @@ In Tree-sitter grammars, there are usually certain rules that represent abstract Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it _will_ show up in the node types file, with the following special entry: -* `"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap. +- `"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap. Example: