Add information about queries on the docs site
This commit is contained in:
parent
4b7c36a40b
commit
6146c39b0a
1 changed files with 179 additions and 12 deletions
|
|
@ -11,7 +11,9 @@ This document will describes the general concepts of how to use Tree-sitter, whi
|
|||
|
||||
All of the API functions shown here are declared and documented in the `tree_sitter/api.h` header file.
|
||||
|
||||
## Building the Library
|
||||
## Getting Started
|
||||
|
||||
### Building the Library
|
||||
|
||||
To build the library on a POSIX system, run this script, which will create a static library called `libtree-sitter.a` in the Tree-sitter folder:
|
||||
|
||||
|
|
@ -28,15 +30,15 @@ Alternatively, you can use the library in a larger project by adding one source
|
|||
* `tree-sitter/lib/src`
|
||||
* `tree-sitter/lib/include`
|
||||
|
||||
## The Objects
|
||||
### The Basic Objects
|
||||
|
||||
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`.
|
||||
* An `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
|
||||
* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
|
||||
* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
|
||||
* A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
|
||||
* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
|
||||
|
||||
## An Example Program
|
||||
### An Example Program
|
||||
|
||||
Here's an example of a simple C program that uses the Tree-sitter [JSON parser](https://github.com/tree-sitter/tree-sitter-json).
|
||||
|
||||
|
|
@ -111,7 +113,9 @@ clang \
|
|||
./test-json-parser
|
||||
```
|
||||
|
||||
## Providing the Source Code
|
||||
## Basic Parsing
|
||||
|
||||
### Providing the Code
|
||||
|
||||
In the example above, we parsed source code stored in a simple string using the `ts_parser_parse_string` function:
|
||||
|
||||
|
|
@ -149,7 +153,7 @@ typedef struct {
|
|||
} TSInput;
|
||||
```
|
||||
|
||||
## Syntax Nodes
|
||||
### Syntax Nodes
|
||||
|
||||
Tree-sitter provides a [DOM](https://en.wikipedia.org/wiki/Document_Object_Model)-style interface for inspecting syntax trees. A syntax node's *type* is a string that indicates which grammar rule the node represents.
|
||||
|
||||
|
|
@ -172,7 +176,7 @@ TSPoint ts_node_start_point(TSNode);
|
|||
TSPoint ts_node_end_point(TSNode);
|
||||
```
|
||||
|
||||
## Retrieving Nodes
|
||||
### Retrieving Nodes
|
||||
|
||||
Every tree has a *root node*:
|
||||
|
||||
|
|
@ -201,7 +205,7 @@ These methods may all return a *null node* to indicate, for example, that a node
|
|||
bool ts_node_is_null(TSNode);
|
||||
```
|
||||
|
||||
## Named vs Anonymous Nodes
|
||||
### Named vs Anonymous Nodes
|
||||
|
||||
Tree-sitter produces [*concrete* syntax trees](https://en.wikipedia.org/wiki/Parse_tree) - trees that contain nodes for every individual token in the source code, including things like commas and parentheses. This is important for use-cases that deal with individual tokens, like [syntax highlighting](https://en.wikipedia.org/wiki/Syntax_highlighting). But some types of code analysis are easier to perform using an [*abstract* syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) - a tree in which the less important details have been removed. Tree-sitter's trees support these use cases by making a distinction between *named* and *anonymous* nodes.
|
||||
|
||||
|
|
@ -236,7 +240,7 @@ TSNode ts_node_prev_named_sibling(TSNode);
|
|||
|
||||
If you use this group of methods, the syntax tree functions much like an abstract syntax tree.
|
||||
|
||||
## Node Field Names
|
||||
### Node Field Names
|
||||
|
||||
To make syntax nodes easier to analyze, many grammars assign unique *field names* to particular child nodes. The next page [explains](./creating-parsers#using-fields) how to do this on your own grammars. If a syntax node has fields, you can access its children using their field name:
|
||||
|
||||
|
|
@ -262,7 +266,9 @@ The field ids can be used in place of the name:
|
|||
TSNode ts_node_child_by_field_id(TSNode, TSFieldId);
|
||||
```
|
||||
|
||||
## Editing
|
||||
## Advanced Parsing
|
||||
|
||||
### Editing
|
||||
|
||||
In applications like text editors, you often need to re-parse a file after its source code has changed. Tree-sitter is designed to support this use case efficiently. There are two steps required. First, you must *edit* the syntax tree, which adjusts the ranges of its nodes so that they stay in sync with the code.
|
||||
|
||||
|
|
@ -289,7 +295,7 @@ void ts_node_edit(TSNode *, const TSInputEdit *);
|
|||
|
||||
This `ts_node_edit` function is *only* needed in the case where you have retrieved `TSNode` instances *before* editing the tree, and then *after* editing the tree, you want to continue to use those specific node instances. Often, you'll just want to re-fetch nodes from the edited tree, in which case `ts_node_edit` is not needed.
|
||||
|
||||
## Multi-language Documents
|
||||
### Multi-language Documents
|
||||
|
||||
Sometimes, different parts of a file may be written in different languages. For example, templating languages like [EJS](http://ejs.co) and [ERB](https://ruby-doc.org/stdlib-2.5.1/libdoc/erb/rdoc/ERB.html) allow you to generate HTML by writing a mixture of HTML and another language like JavaScript or Ruby.
|
||||
|
||||
|
|
@ -395,7 +401,7 @@ int main(int argc, const char **argv) {
|
|||
|
||||
This API allows for great flexibility in how languages can be composed. Tree-sitter is not responsible for mediating the interactions between languages. Instead, you are free to do that using arbitrary application-specific logic.
|
||||
|
||||
## Concurrency
|
||||
### Concurrency
|
||||
|
||||
Tree-sitter supports multi-threaded use cases by making syntax trees very cheap to copy.
|
||||
|
||||
|
|
@ -404,3 +410,164 @@ TSTree *ts_tree_copy(const TSTree *);
|
|||
```
|
||||
|
||||
Internally, copying a syntax tree just entails incrementing an atomic reference count. Conceptually, it provides you a new tree which you can freely query, edit, reparse, or delete on a new thread while continuing to use the original tree on a different thread. Note that individual `TSTree` instances are *not* thread safe; you must copy a tree if you want to use it on multiple threads simultaneously.
|
||||
|
||||
## Other Tree Operations
|
||||
|
||||
### Walking Trees with Tree Cursors
|
||||
|
||||
You can access every node in a syntax tree using the `TSNode` APIs [described above](#retrieving-nodes), but if you need to access a large number of nodes, the most efficient way to do it is with a *tree cursor*. A cursor is a stateful object that allows you to walk a syntax tree with maximum efficiency.
|
||||
|
||||
You can initialize a cursor from any node:
|
||||
|
||||
```c
|
||||
TSTreeCursor ts_tree_cursor_new(TSNode);
|
||||
```
|
||||
|
||||
You can move the cursor around the tree:
|
||||
|
||||
|
||||
```c
|
||||
bool ts_tree_cursor_goto_first_child(TSTreeCursor *);
|
||||
bool ts_tree_cursor_goto_next_sibling(TSTreeCursor *);
|
||||
bool ts_tree_cursor_goto_parent(TSTreeCursor *);
|
||||
```
|
||||
|
||||
These methods return `true` if the cursor successfully moved and `false` if there was no node to move to.
|
||||
|
||||
You can always retrieve the cursor's current node, as well as the [field name](#node-field-names) that is associated with the current node.
|
||||
|
||||
```c
|
||||
TSNode ts_tree_cursor_current_node(const TSTreeCursor *);
|
||||
const char *ts_tree_cursor_current_field_name(const TSTreeCursor *);
|
||||
TSFieldId ts_tree_cursor_current_field_id(const TSTreeCursor *);
|
||||
```
|
||||
|
||||
### Pattern Matching with Queries
|
||||
|
||||
Many code analysis tasks involve searching for patterns in syntax trees. Tree-sitter provides a small declarative language for expressing these patterns and searching for matches.
|
||||
|
||||
The language is similar to the format of Tree-sitter's [unit test system](./creating-parsers#command-test).
|
||||
|
||||
#### Basics
|
||||
|
||||
Syntax trees are written as [S-expressions](https://en.wikipedia.org/wiki/S-expression). An S-expression representation of a node node consists of a pair of parentheses containing the node's name and, optionally, a series of S-expressions representations representing the node's children.
|
||||
|
||||
For example, this pattern would match a `binary_expression` node whose children are both `number_literal` nodes:
|
||||
|
||||
```
|
||||
(binary_expression (number_literal) (number_literal))
|
||||
```
|
||||
|
||||
Children can also be omitted. For example, this would match any `binary_expression` where at least *one* of child is a `string_literal` node:
|
||||
|
||||
```
|
||||
(binary_expression (string_literal))
|
||||
```
|
||||
|
||||
#### Fields
|
||||
|
||||
In general, it's a good idea to make patterns more specific by specifying field names associated with child nodes. For example, this pattern would match an `assignment_expression` node whose *left* child is a `member_expression`, whose `object` is a `call_expression`.
|
||||
|
||||
```
|
||||
(assignment_expression
|
||||
left: (member_expression
|
||||
object: (call_expression)))
|
||||
```
|
||||
|
||||
#### Anonymous Nodes
|
||||
|
||||
The parenthesized syntax for writing nodes only applies to [named nodes](#named-vs-anonymous-nodes). To match specific anonymous nodes, you write their name between double quotes. For example, this pattern would match any `binary_expression` where the operator is `!=` and the right side is `null`:
|
||||
|
||||
```
|
||||
(binary_expression
|
||||
operator: "!="
|
||||
right: (null))
|
||||
```
|
||||
|
||||
#### Capturing Nodes
|
||||
|
||||
When matching patterns, you may want to process specific nodes within the pattern. Captures allow you to associate names with specific nodes in a pattern, so that you can later refer to those nodes by those names. Capture names are written *after* the nodes that they refer to, and start with an `@` character.
|
||||
|
||||
For example, this pattern would match any assignment of a `function` to an `identifier`, and it would associate the name `function-definition` with the identifier:
|
||||
|
||||
```
|
||||
(assignment_expression
|
||||
(identifier) @function-definition
|
||||
(function))
|
||||
```
|
||||
|
||||
And this pattern would match all method definitions, associating the name `the-method-name` with the method name, `the-class-name` with the containing class name:
|
||||
|
||||
```
|
||||
(class_declaration
|
||||
name: (identifier) @the-class-name
|
||||
body: (class_body
|
||||
(method_definition
|
||||
name: (property_identifier) @the-method-name)))
|
||||
```
|
||||
|
||||
#### Predicates
|
||||
|
||||
You can also specify other conditions that should restrict the nodes that match a given pattern. You do this by enclosing the pattern in an additional pair of parentheses, and specifying one or more *predicate* S-expressions after your main pattern. Predicate S-expressions must start with a predicate name, and contain either `@`-prefixed capture names or strings.
|
||||
|
||||
For example, this pattern would match identifier nodes whose names contain only capital letters:
|
||||
|
||||
```
|
||||
((identifier) @constant
|
||||
(match? @constant "^[A-Z][A-Z_]+"))
|
||||
```
|
||||
|
||||
*Note* - Predicates are not handled directly by the Tree-sitter library. They are just exposed in a structured form so that higher-level code can perform the filtering.
|
||||
|
||||
#### The Query API
|
||||
|
||||
Create a query by specifying a string containing one or more patterns:
|
||||
|
||||
```c
|
||||
TSQuery *ts_query_new(
|
||||
const TSLanguage *language,
|
||||
const char *source,
|
||||
uint32_t source_len,
|
||||
uint32_t *error_offset,
|
||||
TSQueryError *error_type
|
||||
);
|
||||
```
|
||||
|
||||
If there is an error in the query, then the `error_offset` argument will be set to the byte offset of the error, and the `error_type` argument will be set to a value that indicates the type of error:
|
||||
|
||||
```c
|
||||
typedef enum {
|
||||
TSQueryErrorNone = 0,
|
||||
TSQueryErrorSyntax,
|
||||
TSQueryErrorNodeType,
|
||||
TSQueryErrorField,
|
||||
TSQueryErrorCapture,
|
||||
} TSQueryError;
|
||||
```
|
||||
|
||||
The `TSQuery` value is immutable and can be safely shared between threads. To execute the query, create a `TSQueryCursor`, which carries the state needed for processing the queries. The query cursor should not be shared between threads, but can be reused for many query executions.
|
||||
|
||||
```c
|
||||
TSQueryCursor *ts_query_cursor_new(void);
|
||||
```
|
||||
|
||||
You can then execute the query on a given syntax node:
|
||||
|
||||
```c
|
||||
void ts_query_cursor_exec(TSQueryCursor *, const TSQuery *, TSNode);
|
||||
```
|
||||
|
||||
You can then iterate over the matches:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
uint32_t id;
|
||||
uint16_t pattern_index;
|
||||
uint16_t capture_count;
|
||||
const TSQueryCapture *captures;
|
||||
} TSQueryMatch;
|
||||
|
||||
bool ts_query_cursor_next_match(TSQueryCursor *, TSQueryMatch *match);
|
||||
```
|
||||
|
||||
This function will return `false` when there are no more matches.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue