Commit graph

49 commits

Author SHA1 Message Date
Amaan Qureshi
5fba369c4a
fix: disallow inlining the first rule
This prevents a panic when indexing symbol_ids during the generation process
2023-07-19 16:14:58 -04:00
Andrew Hlynskyi
0b0cc6c429 Fix rustc 1.71.0 warnings 2023-07-13 17:50:04 +03:00
Nat Mote
4e3179fbc0
Avoid extracting default alias for extras
Fixes #1834
2022-08-10 07:27:34 -07:00
Max Brunsfeld
9866674cf8
Merge pull request #1660 from alex-pinkus/expanded-regex-support
Expand regex support to include emojis and binary ops
2022-02-24 17:14:23 -08:00
Alex Pinkus
8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
Max Brunsfeld
82ceebc10d 🎨 Use base struct syntax to clean up grammar expectations 2022-01-20 17:17:46 -08:00
Max Brunsfeld
a0c085bbec Return an error when trying to inline a token
Fixes #1420
2021-11-19 13:02:04 -08:00
Razze
956705a23d Update to unicode standard 14 2021-10-10 16:40:31 +02:00
FnControlOption
e030434ca7 Handle aliases in unicode property escapes in regexes 2021-08-18 22:22:46 -07:00
Douglas Creager
d2d01e77e3 cli: Use anyhow and thiserror for errors
This patch updates the CLI to use anyhow and thiserror for error
management.  The main feature that our custom `Error` type was providing
was a _list_ of messages, which would allow us to annotate "lower-level"
errors with more contextual information.  This is exactly what's
provided by anyhow's `Context` trait.

(This is setup work for a future PR that will pull the `config` and
`loader` modules out into separate crates; by using `anyhow` we wouldn't
have to deal with a circular dependency between with the new crates.)
2021-06-09 16:17:23 -04:00
Max Brunsfeld
dd4cba2625 Allow symbols to be used in precedence lists 2021-03-03 13:11:05 -08:00
Max Brunsfeld
d8a235faa1 Add further static validation of named precedences 2021-02-25 11:54:21 -08:00
Max Brunsfeld
344797c110 Implement named precedence comparison 2021-02-24 16:02:56 -08:00
Max Brunsfeld
d40f118370 Generalize precedence datatype to include strings
Right now, the strings are not used in comparisons, but they
are passed through the grammar processing pipeline, and are
available to the parse table construction algorithm.

This also cleans up a confusing aspect of the parse table
construction, in which precedences and associativities were
temporarily stored in the parse table data structure itself.
2021-02-23 20:48:39 -08:00
Max Brunsfeld
2f28a35e1b Handle unicode property escapes inside bracketed char classes
Refs #906
2021-02-18 22:27:44 -08:00
Max Brunsfeld
29bc26ecd5 Fix test failure after non-terminal extras change 2021-02-18 15:43:01 -08:00
Max Brunsfeld
b46d51f224 Add a unit test for all unicode character escape forms 2021-02-17 17:49:01 -08:00
Max Brunsfeld
5b630054c6 Handle negated unicode property escapes in regexes
Refs #380
2021-02-17 17:22:33 -08:00
Max Brunsfeld
e3ba701344 Start work on handling unicode property escapes in regexes 2021-01-29 16:37:45 -08:00
Max Brunsfeld
ab78ab3f9b Represent CharacterSet internally as a vector of ranges 2021-01-28 16:10:39 -08:00
Max Brunsfeld
026231e93d Merge branch 'master' into HEAD 2020-12-03 09:44:33 -08:00
Max Brunsfeld
a2d760e426 Ensure nodes are aliased consistently within syntax error nodes
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2020-10-27 15:46:09 -07:00
Max Brunsfeld
5003064da7 Make supertypes automatically hidden, without underscore prefix 2020-09-23 09:35:14 -07:00
Max Brunsfeld
f4adf0269a Propagate dynamic precedence correctly for inlined rules
Fixes #683
2020-07-17 09:53:01 -07:00
Max Brunsfeld
0cceca7b4e Rename extra_tokens -> extra_symbols 2019-10-21 17:26:01 -07:00
Max Brunsfeld
fcaabea0cf Allow non-terminal extras 2019-10-21 16:08:59 -07:00
Ika
4b0489e2f3 fix: allow lowercase unicode escape (#440) 2019-08-31 23:30:33 -07:00
Max Brunsfeld
56ce4e5d50 Upgrade rsass, remove hashbrown 2019-08-13 10:08:58 -07:00
Max Brunsfeld
d274e81d0d Overhaul CLI error handling to allow multiple levels of context 2019-05-30 16:52:55 -07:00
Max Brunsfeld
9674df0c54 Avoid introducing certain auxiliary repeat rules in hidden rules 2019-05-15 12:36:54 -07:00
Ervin Oro
e5584f82d3 Add test to verify regex unicode codepoints work 2019-04-09 21:55:49 +03:00
Ervin Oro
8c845f29e0 Allow hex characters in unicode code points 2019-04-09 20:37:36 +03:00
Max Brunsfeld
6c65d74810 Restructure node-types.json output 2019-03-26 13:43:10 -07:00
Max Brunsfeld
b79bd8693b Start work on handling node supertypes 2019-03-26 11:51:02 -07:00
Max Brunsfeld
b7e38ccc96 Allow using fields in inlined rules 2019-02-08 17:12:08 -08:00
Max Brunsfeld
18a13b457d Get basic field API working 2019-02-08 15:16:56 -08:00
Max Brunsfeld
108ca989ea Start work on including child refs in generated parsers 2019-02-08 15:16:56 -08:00
Max Brunsfeld
4cac85fec4 Add benchmark script
* Structure `cli` crate as both a library and an executable, so that
benchmarks can import code from the crate.
* Import macros in the Rust 2018 style.
2019-02-01 15:17:35 -08:00
Max Brunsfeld
ed195de8b6 rustfmt 2019-01-17 17:16:04 -08:00
Max Brunsfeld
c27f776d41 Fix word token index issue in a different way
Refs https://github.com/tree-sitter/tree-sitter/issues/258
2019-01-17 13:18:59 -08:00
Max Brunsfeld
9f7079c9c5 Ensure that the word token has a low numerical index
Fixes https://github.com/tree-sitter/tree-sitter/issues/258
2019-01-17 12:44:14 -08:00
Max Brunsfeld
522021b107 Fix NFA generation w/ nested groups 2019-01-15 15:57:29 -08:00
Max Brunsfeld
b799b46f79 Handle repetition range operators with commas in regexes 2019-01-15 13:21:48 -08:00
Max Brunsfeld
e2717a6ad1 Preprocess regexes to allow non-standard escape sequences
Also allow unescaped curly braces to match literal curly braces when 
they don't form a valid repetition operator.
2019-01-14 14:05:19 -08:00
Max Brunsfeld
6f242fda0c Fix edge case in flatten rule 2019-01-11 17:43:42 -08:00
Max Brunsfeld
6592fdd24c Fix parser generation error messages 2019-01-11 17:26:45 -08:00
Max Brunsfeld
272046a250 Reorganize tests - move them all into the CLI crate 2019-01-10 17:11:57 -08:00
Max Brunsfeld
2e8b2ab8fb Give strings more implicit precedence than immediate tokens 2019-01-09 09:59:46 -08:00
Max Brunsfeld
f059557a9d Move parser generation code in to 'generate' module within CLI crate 2019-01-07 10:23:01 -08:00