tree-sitter/cli/src/generate
Alex Pinkus 8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
..
build_tables Remove unnecessary borrows 2021-08-14 15:44:24 +02:00
prepare_grammar Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
templates fix(cli): actual Rust binding version in generated Cargo.toml 2021-06-30 00:36:11 +03:00
binding_files.rs Merge pull request #1194 from ahlinc/fix/1032 2021-06-29 16:48:23 -07:00
char_tree.rs Tweak whitespace in generated character set functions 2021-02-17 16:32:49 -08:00
dedup.rs Move state splitting algorithm into its own file 2019-07-19 12:39:52 -07:00
dsl.js dsl.js: Reuse sym() in RuleBuilder 2021-03-10 23:06:53 +02:00
grammar-schema.json Update cli/src/generate/grammar-schema.json 2020-02-24 18:13:38 -05:00
grammars.rs Fix incorrect merging of states with different inherited fields 2021-03-05 14:49:28 -08:00
mod.rs Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
nfa.rs Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
node_types.rs Use serde's derive feature everywhere 2021-11-21 13:39:30 -08:00
parse_grammar.rs Use serde's derive feature everywhere 2021-11-21 13:39:30 -08:00
render.rs Fix back compat by moving primary_field_ids to the end 2022-01-17 17:23:02 -08:00
rules.rs Allow symbols to be used in precedence lists 2021-03-03 13:11:05 -08:00
tables.rs Generator::add_parse_table: Store entries in hash map 2021-08-08 21:45:43 +02:00