135 lines
4.6 KiB
Markdown
135 lines
4.6 KiB
Markdown
|
|
# Getting Started
|
||
|
|
|
||
|
|
## Building the Library
|
||
|
|
|
||
|
|
To build the library on a POSIX system, just run `make` in the Tree-sitter directory. This will create a static library
|
||
|
|
called `libtree-sitter.a` as well as dynamic libraries.
|
||
|
|
|
||
|
|
Alternatively, you can incorporate the library in a larger project's build system by adding one source file to the build.
|
||
|
|
This source file needs two directories to be in the include path when compiled:
|
||
|
|
|
||
|
|
**source file:**
|
||
|
|
|
||
|
|
- `tree-sitter/lib/src/lib.c`
|
||
|
|
|
||
|
|
**include directories:**
|
||
|
|
|
||
|
|
- `tree-sitter/lib/src`
|
||
|
|
- `tree-sitter/lib/include`
|
||
|
|
|
||
|
|
## The Basic Objects
|
||
|
|
|
||
|
|
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes.
|
||
|
|
In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`.
|
||
|
|
|
||
|
|
- A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage`
|
||
|
|
is generated by Tree-sitter. Many languages are already available in separate git repositories within the
|
||
|
|
[Tree-sitter GitHub organization][ts org] and the [Tree-sitter grammars GitHub organization][tsg org].
|
||
|
|
See [the next section][creating parsers] for how to create new languages.
|
||
|
|
|
||
|
|
- A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some
|
||
|
|
source code.
|
||
|
|
|
||
|
|
- A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the
|
||
|
|
structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the
|
||
|
|
source code changes.
|
||
|
|
|
||
|
|
- A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as
|
||
|
|
well as its relation to other nodes like its parent, siblings and children.
|
||
|
|
|
||
|
|
## An Example Program
|
||
|
|
|
||
|
|
Here's an example of a simple C program that uses the Tree-sitter [JSON parser][json].
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Filename - test-json-parser.c
|
||
|
|
|
||
|
|
#include <assert.h>
|
||
|
|
#include <string.h>
|
||
|
|
#include <stdio.h>
|
||
|
|
#include <tree_sitter/api.h>
|
||
|
|
|
||
|
|
// Declare the `tree_sitter_json` function, which is
|
||
|
|
// implemented by the `tree-sitter-json` library.
|
||
|
|
const TSLanguage *tree_sitter_json(void);
|
||
|
|
|
||
|
|
int main() {
|
||
|
|
// Create a parser.
|
||
|
|
TSParser *parser = ts_parser_new();
|
||
|
|
|
||
|
|
// Set the parser's language (JSON in this case).
|
||
|
|
ts_parser_set_language(parser, tree_sitter_json());
|
||
|
|
|
||
|
|
// Build a syntax tree based on source code stored in a string.
|
||
|
|
const char *source_code = "[1, null]";
|
||
|
|
TSTree *tree = ts_parser_parse_string(
|
||
|
|
parser,
|
||
|
|
NULL,
|
||
|
|
source_code,
|
||
|
|
strlen(source_code)
|
||
|
|
);
|
||
|
|
|
||
|
|
// Get the root node of the syntax tree.
|
||
|
|
TSNode root_node = ts_tree_root_node(tree);
|
||
|
|
|
||
|
|
// Get some child nodes.
|
||
|
|
TSNode array_node = ts_node_named_child(root_node, 0);
|
||
|
|
TSNode number_node = ts_node_named_child(array_node, 0);
|
||
|
|
|
||
|
|
// Check that the nodes have the expected types.
|
||
|
|
assert(strcmp(ts_node_type(root_node), "document") == 0);
|
||
|
|
assert(strcmp(ts_node_type(array_node), "array") == 0);
|
||
|
|
assert(strcmp(ts_node_type(number_node), "number") == 0);
|
||
|
|
|
||
|
|
// Check that the nodes have the expected child counts.
|
||
|
|
assert(ts_node_child_count(root_node) == 1);
|
||
|
|
assert(ts_node_child_count(array_node) == 5);
|
||
|
|
assert(ts_node_named_child_count(array_node) == 2);
|
||
|
|
assert(ts_node_child_count(number_node) == 0);
|
||
|
|
|
||
|
|
// Print the syntax tree as an S-expression.
|
||
|
|
char *string = ts_node_string(root_node);
|
||
|
|
printf("Syntax tree: %s\n", string);
|
||
|
|
|
||
|
|
// Free all of the heap-allocated memory.
|
||
|
|
free(string);
|
||
|
|
ts_tree_delete(tree);
|
||
|
|
ts_parser_delete(parser);
|
||
|
|
return 0;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
This program requires three components to build:
|
||
|
|
|
||
|
|
1. The Tree-sitter C API from `tree-sitter/api.h` (requiring `tree-sitter/lib/include` in our include path)
|
||
|
|
2. The Tree-sitter library (`libtree-sitter.a`)
|
||
|
|
3. The JSON grammar's source code, which we compile directly into the binary
|
||
|
|
|
||
|
|
```sh
|
||
|
|
clang \
|
||
|
|
-I tree-sitter/lib/include \
|
||
|
|
test-json-parser.c \
|
||
|
|
tree-sitter-json/src/parser.c \
|
||
|
|
tree-sitter/libtree-sitter.a \
|
||
|
|
-o test-json-parser
|
||
|
|
./test-json-parser
|
||
|
|
```
|
||
|
|
|
||
|
|
When using dynamic linking, you'll need to ensure the shared library is discoverable through `LD_LIBRARY_PATH` or your system's
|
||
|
|
equivalent environment variable. Here's how to compile with dynamic linking:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
clang \
|
||
|
|
-I tree-sitter/lib/include \
|
||
|
|
test-json-parser.c \
|
||
|
|
tree-sitter-json/src/parser.c \
|
||
|
|
-ltree-sitter \
|
||
|
|
-o test-json-parser
|
||
|
|
./test-json-parser
|
||
|
|
```
|
||
|
|
|
||
|
|
[creating parsers]: ../creating-parsers/index.md
|
||
|
|
[json]: https://github.com/tree-sitter/tree-sitter-json
|
||
|
|
[ts org]: https://github.com/tree-sitter
|
||
|
|
[tsg org]: https://github.com/tree-sitter-grammars
|