Commit graph

14 commits

Author SHA1 Message Date
Amaan Qureshi
b40839cd72 style: prefer turbofish syntax where possible 2024-02-19 16:00:50 -05:00
dundargoc
c8bd6705cf
chore: clippy 2024-02-06 23:34:14 -05:00
Amaan Qureshi
04ff704bca
chore(cli): apply clippy fixes 2024-02-04 04:18:48 -05:00
Alex Pinkus
8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
Max Brunsfeld
e12093e8df Fix regression introduced in CharacterSet optimization 2021-03-04 13:50:27 -08:00
Max Brunsfeld
f5a4c14dbe Add some doc comments to CharacterSet 2021-02-16 21:37:52 -08:00
Max Brunsfeld
ab78ab3f9b Represent CharacterSet internally as a vector of ranges 2021-01-28 16:10:39 -08:00
Max Brunsfeld
a6f71328fe Avoid whitelist/blacklist terminology in test comments 2020-06-16 09:22:34 -07:00
Max Brunsfeld
911fb7f1b2
Extract helper functions to reduce the code size of the lexer function (#626)
* Extract helper functions to reduce code size of ts_lex

* Name char set helper functions based on token name
2020-05-26 13:39:11 -07:00
Max Brunsfeld
c9f46b8242 Fix false negative in token conflict detection
Co-Authored-By: Timothy Clem <timothy.clem@gmail.com>
2019-09-19 11:50:38 -07:00
Max Brunsfeld
df76aef067 CLI: In lex function, merge branches with the same body 2019-04-04 16:02:50 -07:00
Max Brunsfeld
d8ab36b2a5 Fix bugs in handling tokens that overlap with separators 2019-01-15 13:21:48 -08:00
Max Brunsfeld
a8292f4fe9 Load all fixture grammars dynamically
This way the build doesn't take forever any time a single grammar has 
been regenerated.
2019-01-15 13:21:48 -08:00
Max Brunsfeld
f059557a9d Move parser generation code in to 'generate' module within CLI crate 2019-01-07 10:23:01 -08:00
Renamed from cli/src/nfa.rs (Browse further)