diff --git a/docs/_config.yml b/docs/_config.yml index 10e6731a..891551df 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -1 +1,2 @@ markdown: kramdown +theme: jekyll-theme-cayman diff --git a/docs/_layouts/default.html b/docs/_layouts/default.html new file mode 100644 index 00000000..1bee1a8a --- /dev/null +++ b/docs/_layouts/default.html @@ -0,0 +1,123 @@ + + + + + + + + + {{ page.title }} + + + +
+
+
+
+ +
+
+
+
+ +
+
+
+
+ +
+
+ {{ content }} +
+
+
+
+ + + + + + + + diff --git a/docs/_layouts/table-of-contents.html b/docs/_layouts/table-of-contents.html deleted file mode 100644 index 0e2dc7f6..00000000 --- a/docs/_layouts/table-of-contents.html +++ /dev/null @@ -1,74 +0,0 @@ - - - - - - - - - {{ page.title }} - - - - - -
-
-
-
- -
-
-
-
- - -
-
-
-
- -
-
- {{ content }} -
-
-
-
- - - - - - - - diff --git a/docs/assets/css/style.scss b/docs/assets/css/style.scss new file mode 100644 index 00000000..2fe29224 --- /dev/null +++ b/docs/assets/css/style.scss @@ -0,0 +1,39 @@ +--- +--- + +@import 'jekyll-theme-cayman'; + +#main-content, #table-of-contents { + padding-top: 20px; +} + +#table-of-contents { + border-right: 1px solid #ddd; +} + +.nav-link.active { + text-decoration: underline; +} + +.logo { + padding: 20px; + padding-top: 0; + display: block; +} + +.toc-section, .logo { + border-bottom: 1px solid #ccc; +} + +.toc-section.active { + background-color: #edffcb; +} + +li { + display: block; +} + +body { + overflow-y: scroll; + padding-bottom: 100px; +} diff --git a/docs/assets/images/tree-sitter-small.png b/docs/assets/images/tree-sitter-small.png new file mode 100644 index 00000000..73f7f163 Binary files /dev/null and b/docs/assets/images/tree-sitter-small.png differ diff --git a/docs/css/style.css b/docs/css/style.css deleted file mode 100644 index 7c69d614..00000000 --- a/docs/css/style.css +++ /dev/null @@ -1,13 +0,0 @@ -#main-content, #table-of-contents { - margin-top: 20px; -} - -#table-of-contents { - padding: 10px; - border-radius: 10px; - border: 1px solid #ddd; -} - -.nav-link.active { - text-decoration: underline; -} diff --git a/docs/index.md b/docs/index.md index 7ad56def..d11d34dd 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,10 +1,50 @@ +--- +title: Introduction +--- + +# Introduction + Tree-sitter is a library for parsing source code. It aims to be: -* **General** enough to parse any programming language -* **Dependency-free** and written in pure C so that it can be embedded in any application * **Fast** and incremental so that it can be used in a text editor * **Robust** enough to provide useful results even in the presence of syntax errors +* **General** enough to parse any programming language +* **Dependency-free** (and written in pure C) so that it can be embedded in any application -## Table of contents +### Language Bindings -1. [Creating parsers](creating-parsers.md) +There are currently bindings that allow Tree-sitter to be used from the following languages: + +* [JavaScript](https://github.com/tree-sitter/node-tree-sitter) +* [Rust](https://github.com/tree-sitter/rust-tree-sitter) +* [Haskell](https://github.com/tree-sitter/haskell-tree-sitter) +* [Ruby](https://github.com/tree-sitter/ruby-tree-sitter) + +### Available Parsers + +There are fairly complete parsers for the following languages: + +* [Bash](https://github.com/tree-sitter/tree-sitter-bash) +* [C](https://github.com/tree-sitter/tree-sitter-c) +* [C++](https://github.com/tree-sitter/tree-sitter-cpp) +* [Go](https://github.com/tree-sitter/tree-sitter-go) +* [JavaScript](https://github.com/tree-sitter/tree-sitter-javascript) +* [PHP](https://github.com/tree-sitter/tree-sitter-php) +* [Python](https://github.com/tree-sitter/tree-sitter-python) +* [Ruby](https://github.com/tree-sitter/tree-sitter-ruby) +* [Rust](https://github.com/tree-sitter/tree-sitter-rust) +* [TypeScript](https://github.com/tree-sitter/tree-sitter-typescript) + +There are parsers in development for these languages: + +* [Haskell](https://github.com/tree-sitter/tree-sitter-haskell) +* [Java](https://github.com/tree-sitter/tree-sitter-java) +* [OCaml](https://github.com/tree-sitter/tree-sitter-ocaml) +* [C-sharp](https://github.com/tree-sitter/tree-sitter-c-sharp) +* [Julia](https://github.com/tree-sitter/tree-sitter-julia) +* [Scala](https://github.com/tree-sitter/tree-sitter-scala) + +### Talks on Tree-sitter + +* [FOSDEM 2018](https://www.youtube.com/watch?v=0CGzC_iss-8) +* [GitHub Universe 2017](https://www.youtube.com/watch?v=a1rC79DHpmY) diff --git a/docs/section-2-architecture.md b/docs/section-2-architecture.md new file mode 100644 index 00000000..ad007cce --- /dev/null +++ b/docs/section-2-architecture.md @@ -0,0 +1,18 @@ +--- +title: Architecture +permalink: architecture +--- + +# Architecture + +Tree-sitter consists of two separate libraries, both of which expose C APIs. + +The first library, `libcompiler`, is +used to generate a parser for a language by supplying a [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) describing the +language. `libcompiler` is a build tool; once the parser has been generated, it is no longer needed. Its public interface is specified in the header file [`compiler.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/compiler.h). + +The second library, `libruntime`, is used in combination with the parsers +generated by `libcompiler`, to produce syntax trees from source code and keep the +syntax trees up-to-date as the source code changes. `libruntime` is designed to be embedded in applications. Its interface is specified in the header file [`runtime.h`](https://github.com/tree-sitter/tree-sitter/blob/master/include/tree_sitter/runtime.h). + +## The Compiler diff --git a/docs/creating-parsers.md b/docs/section-3-creating-parsers.md similarity index 91% rename from docs/creating-parsers.md rename to docs/section-3-creating-parsers.md index fa77669d..1de02833 100644 --- a/docs/creating-parsers.md +++ b/docs/section-3-creating-parsers.md @@ -1,5 +1,6 @@ --- -layout: table-of-contents +title: Creating Parsers +permalink: creating-parsers --- # Creating parsers @@ -57,59 +58,63 @@ It's usually a good idea to find a formal specification for the language you're Although languages have very different constructs, their constructs can often be categorized in to similar groups like *Declarations*, *Definitions*, *Statements*, *Expressions*, *Types*, and *Patterns*. In writing your grammar, a good first step is to create just enough structure to include all of these basic *groups* of symbols. For an imaginary C-like language, this might look something like this: ```js -rules: $ => { - source_file: $ => repeat($._definition), +{ + // ... - _definition: $ => choice( - $.function_definition - // TODO: other kinds of definitions - ), + rules: $ => { + source_file: $ => repeat($._definition), - function_definition: $ => seq( - 'func', - $.identifier, - $.parameter_list, - $._type, - $.block - ), + _definition: $ => choice( + $.function_definition + // TODO: other kinds of definitions + ), - parameter_list: $ => seq( - '(', - // TODO: parameters - ')' - ), + function_definition: $ => seq( + 'func', + $.identifier, + $.parameter_list, + $._type, + $.block + ), - _type: $ => choice( - 'bool' - // TODO: other kinds of types - ), + parameter_list: $ => seq( + '(', + // TODO: parameters + ')' + ), - block: $ => seq( - '{', - repeat($._statement), - '}' - ), + _type: $ => choice( + 'bool' + // TODO: other kinds of types + ), - _statement: $ => choice( - $.return_statement - // TODO: other kinds of statements - ), + block: $ => seq( + '{', + repeat($._statement), + '}' + ), - return_statement: $ => seq( - 'return', - $._expression, - ';' - ), + _statement: $ => choice( + $.return_statement + // TODO: other kinds of statements + ), - _expression: $ => choice( - $.identifier, - $.number - // TODO: other kinds of expressions - ), + return_statement: $ => seq( + 'return', + $._expression, + ';' + ), - identifier: $ => /[a-z]+/, + _expression: $ => choice( + $.identifier, + $.number + // TODO: other kinds of expressions + ), - number: $ => /\d+/ + identifier: $ => /[a-z]+/, + + number: $ => /\d+/ + } } ``` @@ -118,27 +123,31 @@ Some of the details of this grammar will be explained in more depth later on, bu With this structure in place, you can now freely decide what part of the grammar to flesh out next. For example, you might decide to start with *types*. One-by-one, you could define the rules for writing basic types and composing them into more complex types: ```js -_type: $ => choice( - $.primitive_type, - $.array_type, - $.pointer_type -), +{ + // ... -primitive_type: $ => choice( - 'bool', - 'int' -), + _type: $ => choice( + $.primitive_type, + $.array_type, + $.pointer_type + ), -array_type: $ => seq( - '[', - ']', - $._type -), + primitive_type: $ => choice( + 'bool', + 'int' + ), -pointer_type: $ => seq( - '*', - $._type -), + array_type: $ => seq( + '[', + ']', + $._type + ), + + pointer_type: $ => seq( + '*', + $._type + ) +} ``` After developing the *type* sublanguage a bit further, you might decide to switch to working on *statements* or *expressions* instead. It's often useful to check your progress by trying to parse some real code using `tree-sitter parse`. @@ -250,24 +259,28 @@ The language spec encodes the 20 precedence levels of JavaScript expressions usi To produce a readable syntax tree, we'd like to model JavaScript expressions using a much flatter structure like this: ```js -_expression: $ => choice( - $.identifier, - $.unary_expression, - $.binary_expression, +{ // ... -), -unary_expression: $ => choice( - seq('-', $._expression), - seq('!', $._expression), - // ... -), + _expression: $ => choice( + $.identifier, + $.unary_expression, + $.binary_expression, + // ... + ), -binary_expression: $ => choice( - seq($._expression, '*', $._expression), - seq($._expression, '+', $._expression), - // ... -), + unary_expression: $ => choice( + seq('-', $._expression), + seq('!', $._expression), + // ... + ), + + binary_expression: $ => choice( + seq($._expression, '*', $._expression), + seq($._expression, '+', $._expression), + // ... + ), +} ``` Of course, this flat structure is highly ambiguous. If we try to generate a parser, Tree-sitter gives us an error message: @@ -293,11 +306,15 @@ Possible resolutions: For an expression like `-a * b`, it's not clear whether the `-` operator applies to the `a * b` or just to the `a`. This is where the `prec` function described above comes into play. By wrapping a rule with `prec`, we can indicate that certain sequence of symbols should *bind to each other more tightly* than others. For example, the `'-', $._expression` sequence in `unary_expression` should bind more tightly than the `$._expression, '+', $._expression` sequence in `binary_expression`: ```js -unary_expression: $ => prec(2, choice( - seq('-', $._expression), - seq('!', $._expression), +{ // ... -)) + + unary_expression: $ => prec(2, choice( + seq('-', $._expression), + seq('!', $._expression), + // ... + )) +} ``` ### Using associativity @@ -323,11 +340,15 @@ Possible resolutions: For an expression like `a * b * c`, it's not clear whether we mean `a * (b * c)` or `(a * b) * c`. This is where `prec.left` and `prec.right` come into use. We want to select the second interpretation, so we use `prec.left`. ```js -binary_expression: $ => choice( - prec.left(2, seq($._expression, '*', $._expression)), - prec.left(1, seq($._expression, '+', $._expression)), +{ // ... -), + + binary_expression: $ => choice( + prec.left(2, seq($._expression, '*', $._expression)), + prec.left(1, seq($._expression, '+', $._expression)), + // ... + ), +} ``` ### Hiding rules diff --git a/docs/section-4-using-parsers.md b/docs/section-4-using-parsers.md new file mode 100644 index 00000000..903b4527 --- /dev/null +++ b/docs/section-4-using-parsers.md @@ -0,0 +1,8 @@ +--- +title: Using Parsers +permalink: using-parsers +--- + +# Using Parsers + +WIP