Merge branch 'master' into m-novikov-add-parsers

This commit is contained in:
Max Brunsfeld 2021-09-24 09:04:30 -07:00 committed by GitHub
commit e7dcd2b7c4
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
57 changed files with 822 additions and 353 deletions

View file

@ -6,8 +6,8 @@ GEM
minitest (~> 5.1)
thread_safe (~> 0.3, >= 0.3.4)
tzinfo (~> 1.1)
addressable (2.5.2)
public_suffix (>= 2.0.2, < 4.0)
addressable (2.8.0)
public_suffix (>= 2.0.2, < 5.0)
coffee-script (2.4.1)
coffee-script-source
execjs
@ -16,12 +16,27 @@ GEM
commonmarker (0.17.8)
ruby-enum (~> 0.5)
concurrent-ruby (1.0.5)
ethon (0.11.0)
ffi (>= 1.3.0)
ethon (0.14.0)
ffi (>= 1.15.0)
execjs (2.7.0)
faraday (0.14.0)
faraday (1.5.1)
faraday-em_http (~> 1.0)
faraday-em_synchrony (~> 1.0)
faraday-excon (~> 1.1)
faraday-httpclient (~> 1.0.1)
faraday-net_http (~> 1.0)
faraday-net_http_persistent (~> 1.1)
faraday-patron (~> 1.0)
multipart-post (>= 1.2, < 3)
ffi (1.9.23)
ruby2_keywords (>= 0.0.4)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-net_http (1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
ffi (1.15.3)
forwardable-extended (2.6.0)
gemoji (3.0.0)
github-pages (177)
@ -195,33 +210,35 @@ GEM
minima (2.1.1)
jekyll (~> 3.3)
minitest (5.11.3)
multipart-post (2.0.0)
net-dns (0.8.0)
multipart-post (2.1.1)
net-dns (0.9.0)
nokogiri (1.11.4)
mini_portile2 (~> 2.5.0)
racc (~> 1.4)
octokit (4.8.0)
octokit (4.21.0)
faraday (>= 0.9)
sawyer (~> 0.8.0, >= 0.5.3)
pathutil (0.16.1)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (2.0.5)
racc (1.5.2)
rb-fsevent (0.10.2)
rb-inotify (0.9.10)
ffi (>= 0.5.0, < 2)
rb-fsevent (0.11.0)
rb-inotify (0.10.1)
ffi (~> 1.0)
rouge (2.2.1)
ruby-enum (0.7.2)
i18n
ruby2_keywords (0.0.4)
rubyzip (2.0.0)
safe_yaml (1.0.4)
sass (3.5.5)
safe_yaml (1.0.5)
sass (3.7.4)
sass-listen (~> 4.0.0)
sass-listen (4.0.0)
rb-fsevent (~> 0.9, >= 0.9.4)
rb-inotify (~> 0.9, >= 0.9.7)
sawyer (0.8.1)
addressable (>= 2.3.5, < 2.6)
faraday (~> 0.8, < 1.0)
sawyer (0.8.2)
addressable (>= 2.3.5)
faraday (> 0.8, < 2.0)
terminal-table (1.8.0)
unicode-display_width (~> 1.1, >= 1.1.1)
thread_safe (0.3.6)

View file

@ -15,12 +15,13 @@ Tree-sitter is a parser generator tool and an incremental parsing library. It ca
There are currently bindings that allow Tree-sitter to be used from the following languages:
* [Rust](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
* [JavaScript (Wasm)](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
* [Haskell](https://github.com/tree-sitter/haskell-tree-sitter)
* [JavaScript (Node.js)](https://github.com/tree-sitter/node-tree-sitter)
* [JavaScript (Wasm)](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
* [OCaml](https://github.com/returntocorp/ocaml-tree-sitter-core)
* [Python](https://github.com/tree-sitter/py-tree-sitter)
* [Ruby](https://github.com/tree-sitter/ruby-tree-sitter)
* [Haskell](https://github.com/tree-sitter/haskell-tree-sitter)
* [Rust](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
### Available Parsers
@ -31,11 +32,13 @@ Parsers for these languages are fairly complete:
* [C#](https://github.com/tree-sitter/tree-sitter-c-sharp)
* [C++](https://github.com/tree-sitter/tree-sitter-cpp)
* [CSS](https://github.com/tree-sitter/tree-sitter-css)
* [DOT](https://github.com/rydesun/tree-sitter-dot)
* [Elm](https://github.com/elm-tooling/tree-sitter-elm)
* [Eno](https://github.com/eno-lang/tree-sitter-eno)
* [ERB / EJS](https://github.com/tree-sitter/tree-sitter-embedded-template)
* [Fennel](https://github.com/travonted/tree-sitter-fennel)
* [Go](https://github.com/tree-sitter/tree-sitter-go)
* [HCL](https://github.com/MichaHoffmann/tree-sitter-hcl)
* [HTML](https://github.com/tree-sitter/tree-sitter-html)
* [Java](https://github.com/tree-sitter/tree-sitter-java)
* [JavaScript](https://github.com/tree-sitter/tree-sitter-javascript)
@ -60,6 +63,7 @@ Parsers for these languages are fairly complete:
* [Vue](https://github.com/ikatyang/tree-sitter-vue)
* [YAML](https://github.com/ikatyang/tree-sitter-yaml)
* [WASM](https://github.com/wasm-lsp/tree-sitter-wasm)
* [WGSL WebGPU Shading Language](https://github.com/mehmetoguzderin/tree-sitter-wgsl)
Parsers for these languages are in development:
@ -67,10 +71,12 @@ Parsers for these languages are in development:
* [Erlang](https://github.com/AbstractMachinesLab/tree-sitter-erlang/)
* [Dockerfile](https://github.com/camdencheek/tree-sitter-dockerfile)
* [Go mod](https://github.com/camdencheek/tree-sitter-go-mod)
* [Hack](https://github.com/slackhq/tree-sitter-hack)
* [Haskell](https://github.com/tree-sitter/tree-sitter-haskell)
* [Julia](https://github.com/tree-sitter/tree-sitter-julia)
* [Kotlin](https://github.com/fwcd/tree-sitter-kotlin)
* [Nix](https://github.com/cstrahan/tree-sitter-nix)
* [Objective-C](https://github.com/jiyee/tree-sitter-objc)
* [Perl](https://github.com/ganezdragon/tree-sitter-perl)
* [Scala](https://github.com/tree-sitter/tree-sitter-scala)
* [Sourcepawn](https://github.com/nilshelmig/tree-sitter-sourcepawn)
@ -89,8 +95,8 @@ Parsers for these languages are in development:
The design of Tree-sitter was greatly influenced by the following research papers:
- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
- [Context Aware Scanning for Parsing Extensible Languages](http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible)
- [Efficient and Flexible Incremental Parsing](http://ftp.cs.berkeley.edu/sggs/toplas-parsing.ps)
- [Incremental Analysis of Real Programming Languages](https://pdfs.semanticscholar.org/ca69/018c29cc415820ed207d7e1d391e2da1656f.pdf)
- [Context Aware Scanning for Parsing Extensible Languages](https://www-users.cse.umn.edu/~evw/pubs/vanwyk07gpce/vanwyk07gpce.pdf)
- [Efficient and Flexible Incremental Parsing](http://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf)
- [Incremental Analysis of Real Programming Languages](http://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf)
- [Error Detection and Recovery in LR Parsers](http://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
- [Error Recovery for LR Parsers](http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf)
- [Error Recovery for LR Parsers](https://apps.dtic.mil/sti/pdfs/ADA043470.pdf)

View file

@ -464,7 +464,7 @@ In general, it's a good idea to make patterns more specific by specifying [field
#### Negated Fields
You can also constrain a pattern so that it only mathces nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
You can also constrain a pattern so that it only matches nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
```
(class_declaration
@ -586,8 +586,10 @@ This pattern would match a set of possible keyword tokens, capturing them as `@k
#### Wildcard Node
A wildcard node is represented with an underscore (`(_)`), it matches any node.
A wildcard node is represented with an underscore (`_`), it matches any node.
This is similar to `.` in regular expressions.
There are two types, `(_)` will match any named node,
and `_` will match any named or anonymous node.
For example, this pattern would match any node inside a call:

View file

@ -84,7 +84,7 @@ tree-sitter parse example-file
This should print the following:
```
(source_file [1, 0] - [1, 5])
(source_file [0, 0] - [1, 0])
```
You now have a working parser.
@ -95,7 +95,7 @@ Let's go over all of the functionality of the `tree-sitter` command line tool.
### Command: `generate`
The most important command you'll use is `tree-sitter generate`. This command reads the `grammar.js` file in your current working directory and creates a file called `src/parser.c`, which implements the parser. After making changes to your grammar, just run `tree-sitter` generate again.
The most important command you'll use is `tree-sitter generate`. This command reads the `grammar.js` file in your current working directory and creates a file called `src/parser.c`, which implements the parser. After making changes to your grammar, just run `tree-sitter generate` again.
The first time you run `tree-sitter generate`, it will also generate a few other files:
@ -674,7 +674,7 @@ This function is responsible for recognizing external tokens. It should return `
* **`TSSymbol result_symbol`** - The symbol that was recognized. Your scan function should *assign* to this field one of the values from the `TokenType` enum, described above.
* **`void (*advance)(TSLexer *, bool skip)`** - A function for advancing to the next character. If you pass `true` for the second argument, the current character will be treated as whitespace.
* **`void (*mark_end)(TSLexer *)`** - A function for marking the end of the recognized token. This allows matching tokens that require multiple characters of lookahead. By default (if you don't call `mark_end`), any character that you moved past using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later calls to `advance` will *not* increase the size of the returned token. You can call `mark_end` multiple times to increase the size of the token.
* **`uint32_t (*get_column)(TSLexer *)`** - **(Experimental)** A function for querying the current column position of the lexer. It returns the number of unicode code points (not bytes) since the start of the current line.
* **`uint32_t (*get_column)(TSLexer *)`** - A function for querying the current column position of the lexer. It returns the number of bytes (not characters) since the start of the current line.
* **`bool (*is_at_included_range_start)(TSLexer *)`** - A function for checking if the parser has just skipped some characters in the document. When parsing an embedded document using the `ts_parser_set_included_ranges` function (described in the [multi-language document section][multi-language-section]), your scanner may want to apply some special behavior when moving to a disjoint part of the document. For example, in [EJS documents][ejs], the JavaScript parser uses this function to enable inserting automatic semicolon tokens in between the code directives, delimited by `<%` and `%>`.
The third argument to the `scan` function is an array of booleans that indicates which of your external tokens are currently expected by the parser. You should only look for a given token if it is valid according to this array. At the same time, you cannot backtrack, so you may need to combine certain pieces of logic.

View file

@ -29,7 +29,7 @@ git clone https://github.com/tree-sitter/tree-sitter
cd tree-sitter
```
Optionally, build the WASM library. If you skip this step, then the `tree-sitter web-ui` command will require an internet connection. If you have emscripten installed, this will use your `emcc` compiler. Otherwise, it will use Docker:
Optionally, build the WASM library. If you skip this step, then the `tree-sitter playground` command will require an internet connection. If you have emscripten installed, this will use your `emcc` compiler. Otherwise, it will use Docker:
```sh
./script/build-wasm