Link to the documentation site from the README
This commit is contained in:
parent
a7ffbd022f
commit
5b18fe672b
2 changed files with 7 additions and 221 deletions
226
README.md
226
README.md
|
|
@ -3,225 +3,11 @@
|
|||
[](https://travis-ci.org/tree-sitter/tree-sitter)
|
||||
[](https://ci.appveyor.com/project/maxbrunsfeld/tree-sitter/branch/master)
|
||||
|
||||
Tree-sitter is a C library for incremental parsing, intended to be used via
|
||||
[bindings](https://github.com/tree-sitter/node-tree-sitter) to higher-level
|
||||
languages. It can be used to build a concrete syntax tree for a program and
|
||||
efficiently update the syntax tree as the program is edited. This makes it suitable
|
||||
for use in text-editing programs.
|
||||
Tree-sitter is an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
|
||||
|
||||
Tree-sitter uses an incremental [LR parsing](https://en.wikipedia.org/wiki/LR_parser)
|
||||
algorithm, as described in the paper *[Incremental Analysis of Real Programming Languages](https://www.semanticscholar.org/paper/Incremental-Analysis-of-real-Programming-Languages-Wagner-Graham/163592ac3777ee396f32318fcd83b1c563f2e496)*
|
||||
by Tim Wagner & Susan Graham. It handles ambiguity at compile-time via [precedence annotations](https://en.wikipedia.org/wiki/Operator-precedence_parser),
|
||||
and at run-time via the [GLR algorithm](https://en.wikipedia.org/wiki/GLR_parser).
|
||||
This allows it to generate a fast parser for any language that can be described with a context-free grammar.
|
||||
* **General** enough to parse any programming language
|
||||
* **Fast** enough to parse on every keystroke in a text editor
|
||||
* **Robust** enough to provide useful results even in the presence of syntax errors,
|
||||
* **Dependency-free** (and written in pure C) so that it can be embedded in any application
|
||||
|
||||
### Installation
|
||||
|
||||
```sh
|
||||
script/configure # Generate a Makefile
|
||||
make # Build static libraries for the compiler and runtime
|
||||
```
|
||||
|
||||
### Overview
|
||||
|
||||
Tree-sitter consists of two libraries. The first library, `libcompiler`, can be
|
||||
used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the
|
||||
language. Once the parser has been generated, `libcompiler` is no longer needed.
|
||||
|
||||
The second library, `libruntime`, is used in combination with the parsers
|
||||
generated by `libcompiler`, to generate syntax trees based on text documents, and keep the
|
||||
syntax trees up-to-date as changes are made to the documents.
|
||||
|
||||
### Writing a grammar
|
||||
|
||||
Tree-sitter's grammars are specified as JSON strings. This format allows them
|
||||
to be easily created and manipulated in high-level languages like [JavaScript](https://github.com/tree-sitter/node-tree-sitter-compiler).
|
||||
The structure of a grammar is formally specified by [this JSON schema](./src/compiler/grammar-schema.json).
|
||||
You can generate a parser for a grammar using the `ts_compile_grammar` function
|
||||
provided by `libcompiler`.
|
||||
|
||||
Here's a simple example of using `ts_compile_grammar` to create a parser for basic
|
||||
arithmetic expressions. It uses C++11 multi-line strings for readability.
|
||||
|
||||
```cpp
|
||||
// arithmetic_grammar.cc
|
||||
|
||||
#include <stdio.h>
|
||||
#include "tree_sitter/compiler.h"
|
||||
|
||||
int main() {
|
||||
TSCompileResult result = ts_compile_grammar(R"JSON(
|
||||
{
|
||||
"name": "arithmetic",
|
||||
|
||||
// Things that can appear anywhere in the language, like comments
|
||||
// and whitespace, are expressed as 'extras'.
|
||||
"extras": [
|
||||
{"type": "PATTERN", "value": "\\s"},
|
||||
{"type": "SYMBOL", "name": "comment"}
|
||||
],
|
||||
|
||||
"rules": {
|
||||
|
||||
// The first rule listed in the grammar becomes the 'start rule'.
|
||||
"expression": {
|
||||
"type": "CHOICE",
|
||||
"members": [
|
||||
{"type": "SYMBOL", "name": "sum"},
|
||||
{"type": "SYMBOL", "name": "product"},
|
||||
{"type": "SYMBOL", "name": "number"},
|
||||
{"type": "SYMBOL", "name": "variable"},
|
||||
{
|
||||
"type": "SEQ",
|
||||
"members": [
|
||||
{"type": "STRING", "value": "("},
|
||||
{"type": "SYMBOL", "name": "expression"},
|
||||
{"type": "STRING", "value": ")"}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
// Tokens like '+' and '*' are described directly within the
|
||||
// grammar's rules, as opposed to in a seperate lexer description.
|
||||
"sum": {
|
||||
"type": "PREC_LEFT",
|
||||
"value": 1,
|
||||
"content": {
|
||||
"type": "SEQ",
|
||||
"members": [
|
||||
{"type": "SYMBOL", "name": "expression"},
|
||||
{"type": "STRING", "value": "+"},
|
||||
{"type": "SYMBOL", "name": "expression"}
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
// Ambiguities can be resolved at compile time by assigning precedence
|
||||
// values to rule subtrees.
|
||||
"product": {
|
||||
"type": "PREC_LEFT",
|
||||
"value": 2,
|
||||
"content": {
|
||||
"type": "SEQ",
|
||||
"members": [
|
||||
{"type": "SYMBOL", "name": "expression"},
|
||||
{"type": "STRING", "value": "*"},
|
||||
{"type": "SYMBOL", "name": "expression"}
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
// Tokens can be specified using ECMAScript regexps.
|
||||
"number": {"type": "PATTERN", "value": "\\d+"},
|
||||
"comment": {"type": "PATTERN", "value": "#.*"},
|
||||
"variable": {"type": "PATTERN", "value": "[a-zA-Z]\\w*"},
|
||||
}
|
||||
}
|
||||
)JSON");
|
||||
|
||||
if (result.error_type != TSCompileErrorTypeNone) {
|
||||
fprintf(stderr, "Compilation failed: %s\n", result.error_message);
|
||||
return 1;
|
||||
}
|
||||
|
||||
puts(result.code);
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
To create the parser, compile this file like this:
|
||||
|
||||
```sh
|
||||
clang++ -std=c++11 \
|
||||
-I tree-sitter/include \
|
||||
arithmetic_grammar.cc \
|
||||
"$(find tree-sitter/out/Release -name libcompiler.a)" \
|
||||
-o arithmetic_grammar
|
||||
```
|
||||
|
||||
Then run the executable to print out the C code for the parser:
|
||||
|
||||
```sh
|
||||
./arithmetic_grammar > arithmetic_parser.c
|
||||
```
|
||||
|
||||
### Using the parser
|
||||
|
||||
#### Providing the text to parse
|
||||
|
||||
Text input is provided to a tree-sitter parser via a `TSInput` struct, which
|
||||
contains function pointers for seeking to positions in the text, and for reading
|
||||
chunks of text. The text can be encoded in either UTF8 or UTF16. This interface
|
||||
allows you to efficiently parse text that is stored in your own data structure.
|
||||
|
||||
#### Querying the syntax tree
|
||||
|
||||
The `libruntime` API provides a DOM-style interface for inspecting
|
||||
syntax trees. Functions like `ts_node_child(node, index)` and `ts_node_next_sibling(node)`
|
||||
expose every node in the concrete syntax tree. This is useful for operations
|
||||
like syntax-highlighting, which operate on a token-by-token basis. You can also
|
||||
traverse the tree in a more abstract way by using functions like
|
||||
`ts_node_named_child(node, index)` and `ts_node_next_named_sibling(node)`. These
|
||||
functions don't expose nodes that were specified in the grammar as anonymous
|
||||
tokens, like `(` and `+`. This is useful when analyzing the meaning of a document.
|
||||
|
||||
```c
|
||||
// test_parser.c
|
||||
|
||||
#include <assert.h>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include "tree_sitter/runtime.h"
|
||||
|
||||
// Declare the language function that was generated from your grammar.
|
||||
TSLanguage *tree_sitter_arithmetic();
|
||||
|
||||
int main() {
|
||||
TSParser *parser = ts_parser_new();
|
||||
ts_parser_set_language(parser, tree_sitter_arithmetic());
|
||||
|
||||
const char *source_code = "a + b * 5";
|
||||
TSTree *tree = ts_parser_parse(parser, NULL, source_code, strlen(source_code));
|
||||
|
||||
TSNode root_node = ts_tree_root_node(tree);
|
||||
assert(!strcmp(ts_node_type(root_node), "expression"));
|
||||
assert(ts_node_named_child_count(root_node) == 1);
|
||||
|
||||
TSNode sum_node = ts_node_named_child(root_node, 0);
|
||||
assert(!strcmp(ts_node_type(sum_node), "sum"));
|
||||
assert(ts_node_named_child_count(sum_node) == 2);
|
||||
|
||||
TSNode product_node = ts_node_child(ts_node_named_child(sum_node, 1), 0);
|
||||
assert(!strcmp(ts_node_type(product_node), "product"));
|
||||
assert(ts_node_named_child_count(product_node) == 2);
|
||||
|
||||
printf("Syntax tree: %s\n", ts_node_string(root_node));
|
||||
|
||||
ts_tree_delete(tree);
|
||||
ts_parser_delete(parser);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
To demo this parser's capabilities, compile this program like this:
|
||||
|
||||
```sh
|
||||
clang \
|
||||
-I tree-sitter/include \
|
||||
test_parser.c arithmetic_parser.c \
|
||||
"$(find tree-sitter/out/Release -name libruntime.a)" \
|
||||
-o test_parser
|
||||
|
||||
./test_parser
|
||||
```
|
||||
|
||||
### References
|
||||
|
||||
- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
|
||||
- [Context Aware Scanning for Parsing Extensible Languages](http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible)
|
||||
- [Efficient and Flexible Incremental Parsing](http://ftp.cs.berkeley.edu/sggs/toplas-parsing.ps)
|
||||
- [Incremental Analysis of Real Programming Languages](https://pdfs.semanticscholar.org/ca69/018c29cc415820ed207d7e1d391e2da1656f.pdf)
|
||||
- [Error Detection and Recovery in LR Parsers](http://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
|
||||
- [Error Recovery for LR Parsers](http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf)
|
||||
[Documentation](http://tree-sitter.github.io/tree-sitter/)
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ title: Introduction
|
|||
|
||||
# Introduction
|
||||
|
||||
Tree-sitter is an incremental parsing library. It can be used to build a concrete syntax tree for a source file and to efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
|
||||
Tree-sitter is an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
|
||||
|
||||
* **General** enough to parse any programming language
|
||||
* **Fast** enough to parse on every keystroke in a text editor
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue