diff --git a/cli/README.md b/cli/README.md new file mode 100644 index 00000000..b6f526e9 --- /dev/null +++ b/cli/README.md @@ -0,0 +1,39 @@ +Tree-sitter CLI +=============== + +[![Build Status](https://travis-ci.org/tree-sitter/tree-sitter.svg?branch=master)](https://travis-ci.org/tree-sitter/tree-sitter) +[![Build status](https://ci.appveyor.com/api/projects/status/vtmbd6i92e97l55w/branch/master?svg=true)](https://ci.appveyor.com/project/maxbrunsfeld/tree-sitter/branch/master) +[![Crates.io](https://img.shields.io/crates/v/tree-sitter-cli.svg)](https://crates.io/crates/tree-sitter-cli) + +The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on MacOS, Linux, and Windows. + +### Installation + +You can install the `tree-sitter-cli` with `cargo`: + +```sh +cargo install tree-sitter-cli +``` + +or with `npm`: + +```sh +npm install tree-sitter-cli +``` + +You can also download a pre-built binary for your platform from [the releases page](https://github.com/tree-sitter/tree-sitter/releases/latest). + +### Dependencies + +The `tree-sitter` binary itself has no dependencies, but specific commands have dependencies that must be present at runtime: + +* To generate a parser from a grammar, you must have [`node`](https://nodejs.org) on your PATH. +* To run and test parsers, you must have a C and C++ compiler on your system. + +### Commands + +* `generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation](http://tree-sitter.github.io/tree-sitter/creating-parsers) for more information. + +* `test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation](http://tree-sitter.github.io/tree-sitter/creating-parsers) for more information. + +* `parse` - The `tree-sitter parse` command will parse a file (or list of file) using Tree-sitter parsers. diff --git a/docs/section-4-implementation.md b/docs/section-4-implementation.md index e072b426..532f1046 100644 --- a/docs/section-4-implementation.md +++ b/docs/section-4-implementation.md @@ -5,31 +5,31 @@ permalink: implementation # Implementation -Tree-sitter consists of two separate libraries, both of which expose C APIs. +Tree-sitter consists of two components: a C library (`libtree-sitter`), and a command-line tool (the `tree-sitter` CLI). -The first library, `libcompiler`, is +The library, `libtree-sitter`, is used in combination with the parsers +generated by the CLI, to produce syntax trees from source code and keep the +syntax trees up-to-date as the source code changes. `libtree-sitter` is designed to be embedded in applications. It is written in plain C. Its interface is specified in the header file [`tree_sitter/api.h`](https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h). + +The CLI is used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the -language. `libcompiler` is a build tool; it is no longer needed once a parser has been generated. It is written in C++, but exposes a C interface, which is declared in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h). +language. The CLI is a build tool; it is no longer needed once a parser has been generated. It is written in Rust, and is available on [crates.io](https://crates.io), [npm](http://npmjs.com), and as a pre-built binary [on GitHub](https://github.com/tree-sitter/tree-sitter/releases/latest). -The second library, `libruntime`, is used in combination with the parsers -generated by `libcompiler`, to produce syntax trees from source code and keep the -syntax trees up-to-date as the source code changes. `libruntime` is designed to be embedded in applications. It is written in plain C. Its interface is specified in the header file [`runtime.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/runtime.h). +## The CLI -## The Compiler - -The `libcompiler` library exports only one function: `ts_compile_grammar`. This function takes a context-free grammar as a JSON string and returns a parser as a string of C code. The source files in the [`src/compiler`](https://github.com/tree-sitter/tree-sitter/tree/master/src/compiler) directory all play a role in producing this C code. This section will describe some key parts of this process. +The `tree-sitter` CLI's most important feature is the `generate` subcommand. This subcommand reads context-free grammar from a file called `grammar.js` and outputs a parser as a C file called `parser.c`. The source files in the [`cli/src`](https://github.com/tree-sitter/tree-sitter/tree/master/cli/src) directory all play a role in producing the code in `parser.c`. This section will describe some key parts of this process. ### Parsing a Grammar -First, `libcompiler` must parse the JSON grammar. The format of the grammars is formally specified by the JSON schema in [grammar-schema.json](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/grammar-schema.json). The parsing is implemented in [parser_grammar.cc](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/parse_grammar.cc). It uses [udp/json-parser](https://github.com/udp/json-parser), one of Tree-sitter's few library dependencies. +First, Tree-sitter must must evaluate the JavaScript code in `grammar.js` and convert the grammar to a JSON format. It does this by shelling out to `node`. The format of the grammars is formally specified by the JSON schema in [grammar-schema.json](https://github.com/tree-sitter/tree-sitter/blob/master/cli/src/generate/grammar-schema.json). The parsing is implemented in [parse_grammar.rs](https://github.com/tree-sitter/tree-sitter/blob/master/cli/src/generate/parse_grammar.rs). ### Grammar Rules -A Tree-sitter grammar is composed of a set of *rules* - objects that describe how syntax nodes can be composed from other syntax nodes. There are several types of rules: symbols, strings, regexes, sequences, choices, repetitions, and a few others. Internally, these are all represented using a [tagged union](https://en.wikipedia.org/wiki/Tagged_union) class called [`Rule`](https://github.com/tree-sitter/tree-sitter/blob/master/src/compiler/rule.h). This class has a method called `match`, which makes it easy to [pattern-match](https://en.wikipedia.org/wiki/Pattern_matching) a rule, processing each type of rule with separate code. +A Tree-sitter grammar is composed of a set of *rules* - objects that describe how syntax nodes can be composed from other syntax nodes. There are several types of rules: symbols, strings, regexes, sequences, choices, repetitions, and a few others. Internally, these are all represented using an [enum](https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html) called [`Rule`](https://github.com/tree-sitter/tree-sitter/blob/master/cli/src/generate/rules.rs). ### Preparing a Grammar -Once a grammar has been parsed, it must be transformed in several ways before it can be used to generate a parser. Each transformation is implemented by a separate file in the [`src/compiler/prepare_grammar`](https://github.com/tree-sitter/tree-sitter/tree/master/src/compiler/prepare_grammar) directory, and the transformations are ultimately composed together in `prepare_grammar.cc`. +Once a grammar has been parsed, it must be transformed in several ways before it can be used to generate a parser. Each transformation is implemented by a separate file in the [`prepare_grammar`](https://github.com/tree-sitter/tree-sitter/tree/master/cli/src/generate/prepare_grammar) directory, and the transformations are ultimately composed together in `prepare_grammar/mod.rs`. At the end of these transformations, the initial grammar is split into two grammars: a *syntax grammar* and a *lexical grammar*. The syntax grammar describes how the language's [*non-terminal symbols*](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols) are constructed from other grammar symbols, and the lexical grammar describes how the grammar's *terminal symbols* (strings and regexes) can be composed from individual characters. diff --git a/lib/Cargo.toml b/lib/Cargo.toml index e71d0c21..4912c5b0 100644 --- a/lib/Cargo.toml +++ b/lib/Cargo.toml @@ -4,7 +4,7 @@ description = "Rust bindings to the Tree-sitter parsing library" version = "0.3.5" authors = ["Max Brunsfeld "] license = "MIT" -readme = "README.md" +readme = "binding/README.md" keywords = ["incremental", "parsing"] categories = ["api-bindings", "parsing", "text-editors"] diff --git a/lib/README.md b/lib/binding/README.md similarity index 88% rename from lib/README.md rename to lib/binding/README.md index 77ce0072..3e591071 100644 --- a/lib/README.md +++ b/lib/binding/README.md @@ -1,8 +1,8 @@ Rust Tree-sitter -=========================== +================ -[![Build Status](https://travis-ci.org/tree-sitter/rust-tree-sitter.svg)](https://travis-ci.org/tree-sitter/rust-tree-sitter) -[![Build status](https://ci.appveyor.com/api/projects/status/d0f6vqq3rflxx3y6/branch/master?svg=true)](https://ci.appveyor.com/project/maxbrunsfeld/rust-tree-sitter/branch/master) +[![Build Status](https://travis-ci.org/tree-sitter/tree-sitter.svg?branch=master)](https://travis-ci.org/tree-sitter/tree-sitter) +[![Build status](https://ci.appveyor.com/api/projects/status/vtmbd6i92e97l55w/branch/master?svg=true)](https://ci.appveyor.com/project/maxbrunsfeld/tree-sitter/branch/master) [![Crates.io](https://img.shields.io/crates/v/tree-sitter.svg)](https://crates.io/crates/tree-sitter) Rust bindings to the [Tree-sitter][] parsing library.