Merge branch 'master' into m-novikov-add-parsers
This commit is contained in:
commit
e7dcd2b7c4
57 changed files with 822 additions and 353 deletions
|
|
@ -6,8 +6,8 @@ GEM
|
|||
minitest (~> 5.1)
|
||||
thread_safe (~> 0.3, >= 0.3.4)
|
||||
tzinfo (~> 1.1)
|
||||
addressable (2.5.2)
|
||||
public_suffix (>= 2.0.2, < 4.0)
|
||||
addressable (2.8.0)
|
||||
public_suffix (>= 2.0.2, < 5.0)
|
||||
coffee-script (2.4.1)
|
||||
coffee-script-source
|
||||
execjs
|
||||
|
|
@ -16,12 +16,27 @@ GEM
|
|||
commonmarker (0.17.8)
|
||||
ruby-enum (~> 0.5)
|
||||
concurrent-ruby (1.0.5)
|
||||
ethon (0.11.0)
|
||||
ffi (>= 1.3.0)
|
||||
ethon (0.14.0)
|
||||
ffi (>= 1.15.0)
|
||||
execjs (2.7.0)
|
||||
faraday (0.14.0)
|
||||
faraday (1.5.1)
|
||||
faraday-em_http (~> 1.0)
|
||||
faraday-em_synchrony (~> 1.0)
|
||||
faraday-excon (~> 1.1)
|
||||
faraday-httpclient (~> 1.0.1)
|
||||
faraday-net_http (~> 1.0)
|
||||
faraday-net_http_persistent (~> 1.1)
|
||||
faraday-patron (~> 1.0)
|
||||
multipart-post (>= 1.2, < 3)
|
||||
ffi (1.9.23)
|
||||
ruby2_keywords (>= 0.0.4)
|
||||
faraday-em_http (1.0.0)
|
||||
faraday-em_synchrony (1.0.0)
|
||||
faraday-excon (1.1.0)
|
||||
faraday-httpclient (1.0.1)
|
||||
faraday-net_http (1.0.1)
|
||||
faraday-net_http_persistent (1.2.0)
|
||||
faraday-patron (1.0.0)
|
||||
ffi (1.15.3)
|
||||
forwardable-extended (2.6.0)
|
||||
gemoji (3.0.0)
|
||||
github-pages (177)
|
||||
|
|
@ -195,33 +210,35 @@ GEM
|
|||
minima (2.1.1)
|
||||
jekyll (~> 3.3)
|
||||
minitest (5.11.3)
|
||||
multipart-post (2.0.0)
|
||||
net-dns (0.8.0)
|
||||
multipart-post (2.1.1)
|
||||
net-dns (0.9.0)
|
||||
nokogiri (1.11.4)
|
||||
mini_portile2 (~> 2.5.0)
|
||||
racc (~> 1.4)
|
||||
octokit (4.8.0)
|
||||
octokit (4.21.0)
|
||||
faraday (>= 0.9)
|
||||
sawyer (~> 0.8.0, >= 0.5.3)
|
||||
pathutil (0.16.1)
|
||||
pathutil (0.16.2)
|
||||
forwardable-extended (~> 2.6)
|
||||
public_suffix (2.0.5)
|
||||
racc (1.5.2)
|
||||
rb-fsevent (0.10.2)
|
||||
rb-inotify (0.9.10)
|
||||
ffi (>= 0.5.0, < 2)
|
||||
rb-fsevent (0.11.0)
|
||||
rb-inotify (0.10.1)
|
||||
ffi (~> 1.0)
|
||||
rouge (2.2.1)
|
||||
ruby-enum (0.7.2)
|
||||
i18n
|
||||
ruby2_keywords (0.0.4)
|
||||
rubyzip (2.0.0)
|
||||
safe_yaml (1.0.4)
|
||||
sass (3.5.5)
|
||||
safe_yaml (1.0.5)
|
||||
sass (3.7.4)
|
||||
sass-listen (~> 4.0.0)
|
||||
sass-listen (4.0.0)
|
||||
rb-fsevent (~> 0.9, >= 0.9.4)
|
||||
rb-inotify (~> 0.9, >= 0.9.7)
|
||||
sawyer (0.8.1)
|
||||
addressable (>= 2.3.5, < 2.6)
|
||||
faraday (~> 0.8, < 1.0)
|
||||
sawyer (0.8.2)
|
||||
addressable (>= 2.3.5)
|
||||
faraday (> 0.8, < 2.0)
|
||||
terminal-table (1.8.0)
|
||||
unicode-display_width (~> 1.1, >= 1.1.1)
|
||||
thread_safe (0.3.6)
|
||||
|
|
|
|||
|
|
@ -15,12 +15,13 @@ Tree-sitter is a parser generator tool and an incremental parsing library. It ca
|
|||
|
||||
There are currently bindings that allow Tree-sitter to be used from the following languages:
|
||||
|
||||
* [Rust](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
|
||||
* [JavaScript (Wasm)](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
|
||||
* [Haskell](https://github.com/tree-sitter/haskell-tree-sitter)
|
||||
* [JavaScript (Node.js)](https://github.com/tree-sitter/node-tree-sitter)
|
||||
* [JavaScript (Wasm)](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
|
||||
* [OCaml](https://github.com/returntocorp/ocaml-tree-sitter-core)
|
||||
* [Python](https://github.com/tree-sitter/py-tree-sitter)
|
||||
* [Ruby](https://github.com/tree-sitter/ruby-tree-sitter)
|
||||
* [Haskell](https://github.com/tree-sitter/haskell-tree-sitter)
|
||||
* [Rust](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
|
||||
|
||||
### Available Parsers
|
||||
|
||||
|
|
@ -31,11 +32,13 @@ Parsers for these languages are fairly complete:
|
|||
* [C#](https://github.com/tree-sitter/tree-sitter-c-sharp)
|
||||
* [C++](https://github.com/tree-sitter/tree-sitter-cpp)
|
||||
* [CSS](https://github.com/tree-sitter/tree-sitter-css)
|
||||
* [DOT](https://github.com/rydesun/tree-sitter-dot)
|
||||
* [Elm](https://github.com/elm-tooling/tree-sitter-elm)
|
||||
* [Eno](https://github.com/eno-lang/tree-sitter-eno)
|
||||
* [ERB / EJS](https://github.com/tree-sitter/tree-sitter-embedded-template)
|
||||
* [Fennel](https://github.com/travonted/tree-sitter-fennel)
|
||||
* [Go](https://github.com/tree-sitter/tree-sitter-go)
|
||||
* [HCL](https://github.com/MichaHoffmann/tree-sitter-hcl)
|
||||
* [HTML](https://github.com/tree-sitter/tree-sitter-html)
|
||||
* [Java](https://github.com/tree-sitter/tree-sitter-java)
|
||||
* [JavaScript](https://github.com/tree-sitter/tree-sitter-javascript)
|
||||
|
|
@ -60,6 +63,7 @@ Parsers for these languages are fairly complete:
|
|||
* [Vue](https://github.com/ikatyang/tree-sitter-vue)
|
||||
* [YAML](https://github.com/ikatyang/tree-sitter-yaml)
|
||||
* [WASM](https://github.com/wasm-lsp/tree-sitter-wasm)
|
||||
* [WGSL WebGPU Shading Language](https://github.com/mehmetoguzderin/tree-sitter-wgsl)
|
||||
|
||||
Parsers for these languages are in development:
|
||||
|
||||
|
|
@ -67,10 +71,12 @@ Parsers for these languages are in development:
|
|||
* [Erlang](https://github.com/AbstractMachinesLab/tree-sitter-erlang/)
|
||||
* [Dockerfile](https://github.com/camdencheek/tree-sitter-dockerfile)
|
||||
* [Go mod](https://github.com/camdencheek/tree-sitter-go-mod)
|
||||
* [Hack](https://github.com/slackhq/tree-sitter-hack)
|
||||
* [Haskell](https://github.com/tree-sitter/tree-sitter-haskell)
|
||||
* [Julia](https://github.com/tree-sitter/tree-sitter-julia)
|
||||
* [Kotlin](https://github.com/fwcd/tree-sitter-kotlin)
|
||||
* [Nix](https://github.com/cstrahan/tree-sitter-nix)
|
||||
* [Objective-C](https://github.com/jiyee/tree-sitter-objc)
|
||||
* [Perl](https://github.com/ganezdragon/tree-sitter-perl)
|
||||
* [Scala](https://github.com/tree-sitter/tree-sitter-scala)
|
||||
* [Sourcepawn](https://github.com/nilshelmig/tree-sitter-sourcepawn)
|
||||
|
|
@ -89,8 +95,8 @@ Parsers for these languages are in development:
|
|||
The design of Tree-sitter was greatly influenced by the following research papers:
|
||||
|
||||
- [Practical Algorithms for Incremental Software Development Environments](https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf)
|
||||
- [Context Aware Scanning for Parsing Extensible Languages](http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible)
|
||||
- [Efficient and Flexible Incremental Parsing](http://ftp.cs.berkeley.edu/sggs/toplas-parsing.ps)
|
||||
- [Incremental Analysis of Real Programming Languages](https://pdfs.semanticscholar.org/ca69/018c29cc415820ed207d7e1d391e2da1656f.pdf)
|
||||
- [Context Aware Scanning for Parsing Extensible Languages](https://www-users.cse.umn.edu/~evw/pubs/vanwyk07gpce/vanwyk07gpce.pdf)
|
||||
- [Efficient and Flexible Incremental Parsing](http://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf)
|
||||
- [Incremental Analysis of Real Programming Languages](http://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf)
|
||||
- [Error Detection and Recovery in LR Parsers](http://what-when-how.com/compiler-writing/bottom-up-parsing-compiler-writing-part-13)
|
||||
- [Error Recovery for LR Parsers](http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf)
|
||||
- [Error Recovery for LR Parsers](https://apps.dtic.mil/sti/pdfs/ADA043470.pdf)
|
||||
|
|
|
|||
|
|
@ -464,7 +464,7 @@ In general, it's a good idea to make patterns more specific by specifying [field
|
|||
|
||||
#### Negated Fields
|
||||
|
||||
You can also constrain a pattern so that it only mathces nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
|
||||
You can also constrain a pattern so that it only matches nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
|
||||
|
||||
```
|
||||
(class_declaration
|
||||
|
|
@ -586,8 +586,10 @@ This pattern would match a set of possible keyword tokens, capturing them as `@k
|
|||
|
||||
#### Wildcard Node
|
||||
|
||||
A wildcard node is represented with an underscore (`(_)`), it matches any node.
|
||||
A wildcard node is represented with an underscore (`_`), it matches any node.
|
||||
This is similar to `.` in regular expressions.
|
||||
There are two types, `(_)` will match any named node,
|
||||
and `_` will match any named or anonymous node.
|
||||
|
||||
For example, this pattern would match any node inside a call:
|
||||
|
||||
|
|
|
|||
|
|
@ -84,7 +84,7 @@ tree-sitter parse example-file
|
|||
This should print the following:
|
||||
|
||||
```
|
||||
(source_file [1, 0] - [1, 5])
|
||||
(source_file [0, 0] - [1, 0])
|
||||
```
|
||||
|
||||
You now have a working parser.
|
||||
|
|
@ -95,7 +95,7 @@ Let's go over all of the functionality of the `tree-sitter` command line tool.
|
|||
|
||||
### Command: `generate`
|
||||
|
||||
The most important command you'll use is `tree-sitter generate`. This command reads the `grammar.js` file in your current working directory and creates a file called `src/parser.c`, which implements the parser. After making changes to your grammar, just run `tree-sitter` generate again.
|
||||
The most important command you'll use is `tree-sitter generate`. This command reads the `grammar.js` file in your current working directory and creates a file called `src/parser.c`, which implements the parser. After making changes to your grammar, just run `tree-sitter generate` again.
|
||||
|
||||
The first time you run `tree-sitter generate`, it will also generate a few other files:
|
||||
|
||||
|
|
@ -674,7 +674,7 @@ This function is responsible for recognizing external tokens. It should return `
|
|||
* **`TSSymbol result_symbol`** - The symbol that was recognized. Your scan function should *assign* to this field one of the values from the `TokenType` enum, described above.
|
||||
* **`void (*advance)(TSLexer *, bool skip)`** - A function for advancing to the next character. If you pass `true` for the second argument, the current character will be treated as whitespace.
|
||||
* **`void (*mark_end)(TSLexer *)`** - A function for marking the end of the recognized token. This allows matching tokens that require multiple characters of lookahead. By default (if you don't call `mark_end`), any character that you moved past using the `advance` function will be included in the size of the token. But once you call `mark_end`, then any later calls to `advance` will *not* increase the size of the returned token. You can call `mark_end` multiple times to increase the size of the token.
|
||||
* **`uint32_t (*get_column)(TSLexer *)`** - **(Experimental)** A function for querying the current column position of the lexer. It returns the number of unicode code points (not bytes) since the start of the current line.
|
||||
* **`uint32_t (*get_column)(TSLexer *)`** - A function for querying the current column position of the lexer. It returns the number of bytes (not characters) since the start of the current line.
|
||||
* **`bool (*is_at_included_range_start)(TSLexer *)`** - A function for checking if the parser has just skipped some characters in the document. When parsing an embedded document using the `ts_parser_set_included_ranges` function (described in the [multi-language document section][multi-language-section]), your scanner may want to apply some special behavior when moving to a disjoint part of the document. For example, in [EJS documents][ejs], the JavaScript parser uses this function to enable inserting automatic semicolon tokens in between the code directives, delimited by `<%` and `%>`.
|
||||
|
||||
The third argument to the `scan` function is an array of booleans that indicates which of your external tokens are currently expected by the parser. You should only look for a given token if it is valid according to this array. At the same time, you cannot backtrack, so you may need to combine certain pieces of logic.
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ git clone https://github.com/tree-sitter/tree-sitter
|
|||
cd tree-sitter
|
||||
```
|
||||
|
||||
Optionally, build the WASM library. If you skip this step, then the `tree-sitter web-ui` command will require an internet connection. If you have emscripten installed, this will use your `emcc` compiler. Otherwise, it will use Docker:
|
||||
Optionally, build the WASM library. If you skip this step, then the `tree-sitter playground` command will require an internet connection. If you have emscripten installed, this will use your `emcc` compiler. Otherwise, it will use Docker:
|
||||
|
||||
```sh
|
||||
./script/build-wasm
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue