diff --git a/docs/section-2-architecture.md b/docs/section-2-architecture.md deleted file mode 100644 index a1101d44..00000000 --- a/docs/section-2-architecture.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -title: Architecture -permalink: architecture ---- - -# Architecture - -Tree-sitter consists of two separate libraries, both of which expose C APIs. - -The first library, `libcompiler`, is -used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the -language. `libcompiler` is a build tool; it is no longer needed once a parser has been generated. Its public interface is specified in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h). - -The second library, `libruntime`, is used in combination with the parsers -generated by `libcompiler`, to produce syntax trees from source code and keep the -syntax trees up-to-date as the source code changes. `libruntime` is designed to be embedded in applications. Its interface is specified in the header file [`runtime.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/runtime.h). - -## The Compiler - -WIP - -## The Runtime - -WIP diff --git a/docs/section-4-using-parsers.md b/docs/section-2-using-parsers.md similarity index 100% rename from docs/section-4-using-parsers.md rename to docs/section-2-using-parsers.md diff --git a/docs/section-4-implementation.md b/docs/section-4-implementation.md new file mode 100644 index 00000000..e072b426 --- /dev/null +++ b/docs/section-4-implementation.md @@ -0,0 +1,42 @@ +--- +title: Implementation +permalink: implementation +--- + +# Implementation + +Tree-sitter consists of two separate libraries, both of which expose C APIs. + +The first library, `libcompiler`, is +used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the +language. `libcompiler` is a build tool; it is no longer needed once a parser has been generated. It is written in C++, but exposes a C interface, which is declared in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h). + +The second library, `libruntime`, is used in combination with the parsers +generated by `libcompiler`, to produce syntax trees from source code and keep the +syntax trees up-to-date as the source code changes. `libruntime` is designed to be embedded in applications. It is written in plain C. Its interface is specified in the header file [`runtime.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/runtime.h). + +## The Compiler + +The `libcompiler` library exports only one function: `ts_compile_grammar`. This function takes a context-free grammar as a JSON string and returns a parser as a string of C code. The source files in the [`src/compiler`](https://github.com/tree-sitter/tree-sitter/tree/master/src/compiler) directory all play a role in producing this C code. This section will describe some key parts of this process. + +### Parsing a Grammar + +First, `libcompiler` must parse the JSON grammar. The format of the grammars is formally specified by the JSON schema in [grammar-schema.json](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/grammar-schema.json). The parsing is implemented in [parser_grammar.cc](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/parse_grammar.cc). It uses [udp/json-parser](https://github.com/udp/json-parser), one of Tree-sitter's few library dependencies. + +### Grammar Rules + +A Tree-sitter grammar is composed of a set of *rules* - objects that describe how syntax nodes can be composed from other syntax nodes. There are several types of rules: symbols, strings, regexes, sequences, choices, repetitions, and a few others. Internally, these are all represented using a [tagged union](https://en.wikipedia.org/wiki/Tagged_union) class called [`Rule`](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/rule.h). This class has a method called `match`, which makes it easy to [pattern-match](https://en.wikipedia.org/wiki/Pattern_matching) a rule, processing each type of rule with separate code. + +### Preparing a Grammar + +Once a grammar has been parsed, it must be transformed in several ways before it can be used to generate a parser. Each transformation is implemented by a separate file in the [`src/compiler/prepare_grammar`](https://github.com/tree-sitter/tree-sitter/tree/master/src/compiler/prepare_grammar) directory, and the transformations are ultimately composed together in `prepare_grammar.cc`. + +At the end of these transformations, the initial grammar is split into two grammars: a *syntax grammar* and a *lexical grammar*. The syntax grammar describes how the language's [*non-terminal symbols*](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols) are constructed from other grammar symbols, and the lexical grammar describes how the grammar's *terminal symbols* (strings and regexes) can be composed from individual characters. + +### Building Parse Tables + + + +## The Runtime + +WIP