This commit is contained in:
Max Brunsfeld 2018-06-11 19:17:10 -07:00
parent 7ad50f2731
commit d1665da21c
5 changed files with 100 additions and 7 deletions

View file

@ -8,7 +8,8 @@
}
#table-of-contents {
border-right: 1px solid #ddd;
border-right: 1px solid #ccc;
border-bottom: 1px solid #ccc;
}
.nav-link.active {
@ -21,7 +22,7 @@
display: block;
}
.toc-section, .logo {
.toc-section:not(:last-child), .logo {
border-bottom: 1px solid #ccc;
}

View file

@ -4,11 +4,11 @@ title: Introduction
# Introduction
Tree-sitter is a library for parsing source code. It aims to be:
Tree-sitter is an incremental parsing library. It can be used to build a concrete syntax tree for a source file and to efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
* **Fast** and incremental so that it can be used in a text editor
* **Robust** enough to provide useful results even in the presence of syntax errors
* **General** enough to parse any programming language
* **Fast** enough to parse on every keystroke in a text editor
* **Robust** enough to provide useful results even in the presence of syntax errors,
* **Dependency-free** (and written in pure C) so that it can be embedded in any application
### Language Bindings
@ -48,3 +48,14 @@ There are parsers in development for these languages:
* [FOSDEM 2018](https://www.youtube.com/watch?v=0CGzC_iss-8)
* [GitHub Universe 2017](https://www.youtube.com/watch?v=a1rC79DHpmY)
### Underlying Research
The design of Tree-sitter was greatly influenced by the following research papers:
- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
- [Context Aware Scanning for Parsing Extensible Languages](http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible)
- [Efficient and Flexible Incremental Parsing](http://ftp.cs.berkeley.edu/sggs/toplas-parsing.ps)
- [Incremental Analysis of Real Programming Languages](https://pdfs.semanticscholar.org/ca69/018c29cc415820ed207d7e1d391e2da1656f.pdf)
- [Error Detection and Recovery in LR Parsers](http://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
- [Error Recovery for LR Parsers](http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf)

View file

@ -9,10 +9,16 @@ Tree-sitter consists of two separate libraries, both of which expose C APIs.
The first library, `libcompiler`, is
used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the
language. `libcompiler` is a build tool; once the parser has been generated, it is no longer needed. Its public interface is specified in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h).
language. `libcompiler` is a build tool; it is no longer needed once a parser has been generated. Its public interface is specified in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h).
The second library, `libruntime`, is used in combination with the parsers
generated by `libcompiler`, to produce syntax trees from source code and keep the
syntax trees up-to-date as the source code changes. `libruntime` is designed to be embedded in applications. Its interface is specified in the header file [`runtime.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/runtime.h).
## The Compiler
WIP
## The Runtime
WIP

View file

@ -357,6 +357,8 @@ You may have noticed in the above examples that some of the grammar rule name li
## Dealing with LR conflicts
TODO
[cst]: https://en.wikipedia.org/wiki/Parse_tree
[non-terminal]: https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols
[language-spec]: https://en.wikipedia.org/wiki/Programming_language_specification

View file

@ -5,4 +5,77 @@ permalink: using-parsers
# Using Parsers
WIP
A Tree-sitter parser consists of a single C source file which exports one function with the naming scheme `tree_sitter_${LANGUAGE_NAME}`. This function returns a pointer to a `TSLanguage` struct, which can be used in conjunction with a `TSParser` to produce a syntax trees.
## The Raw C API
Here's an example of a simple C program that uses the Tree-sitter [JSON parser](https://github.com/tree-sitter/tree-sitter-json).
```c
// Filename - test-json-parser.c
#include <assert.h>
#include <string.h>
#include <stdio.h>
#include "tree_sitter/runtime.h"
TSLanguage *tree_sitter_json();
int main() {
// Create a parser with the JSON language.
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_json());
// Parse some source code.
const char *source_code = "[1, null]";
TSTree *tree = ts_parser_parse_string(parser, NULL, source_code, strlen(source_code));
// Find some syntax tree nodes.
TSNode root_node = ts_tree_root_node(tree);
TSNode array_node = ts_node_named_child(root_node, 0);
TSNode number_node = ts_node_named_child(array_node, 0);
// Check that the nodes have the expected types.
assert(!strcmp(ts_node_type(root_node), "value"));
assert(!strcmp(ts_node_type(array_node), "array"));
assert(!strcmp(ts_node_type(number_node), "number"));
// Check that the nodes have the expected child counts.
assert(ts_node_child_count(root_node) == 1);
assert(ts_node_child_count(array_node) == 4);
assert(ts_node_named_child_count(array_node) == 2);
assert(ts_node_child_count(number_node) == 0);
// Print the syntax tree as an S-expression.
char *string = ts_node_string(root_node);
printf("Syntax tree: %s\n", string);
// Free all of the heap allocations.
free(string);
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
```
This program uses the Tree-sitter C API, which is declared in the header file `tree_sitter/runtime.h`, so we need to add the `tree_sitter/include` directory to the include path. We also need to link `libruntime.a` into the binary.
```sh
clang \
-I tree-sitter/include \
test-json-parser.c \
tree-sitter-json/src/parser.c \
tree-sitter/out/Release/libruntime.a \
-o test-json-parser
./test-json-parser
```
### Providing the text to parse
Text input is provided to a tree-sitter parser via a `TSInput` struct, which contains function pointers for seeking to positions in the text, and for reading chunks of text. The text can be encoded in either UTF8 or UTF16. This interface allows you to efficiently parse text that is stored in your own data structure.
### Querying the syntax tree
Tree-sitter provides a DOM-style interface for inspecting syntax trees. Functions like `ts_node_child(node, index)` and `ts_node_next_sibling(node)` expose every node in the concrete syntax tree. This is useful for operations like syntax-highlighting, which operate on a token-by-token basis. You can also traverse the tree in a more abstract way by using functions like
`ts_node_named_child(node, index)` and `ts_node_next_named_sibling(node)`. These functions don't expose nodes that were specified in the grammar as anonymous tokens, like `:` and `{`. This is useful when analyzing the meaning of a document.