An incremental parsing system for programming tools https://tree-sitter.github.io
Find a file
Alex Pinkus 8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
.github/workflows Fix ci.yml format 2021-09-28 18:52:12 -04:00
cli Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
docs Merge pull request #1581 from tlaplus-community/get-codepoint-column 2022-01-12 10:49:44 -08:00
highlight Bump library versions to 0.20.1 2021-11-21 12:33:12 -08:00
lib Fix back compat by moving primary_field_ids to the end 2022-01-17 17:23:02 -08:00
script Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
tags tags: Remove unused field 2021-12-09 22:39:27 -08:00
test Changed decimal unicode codepoint to hex 2022-01-11 19:15:36 -05:00
.appveyor.yml Build and test wasm on CI 2019-04-26 14:38:13 -07:00
.gitattributes lib: remove utf8proc dependency (#436) 2019-10-14 11:18:39 -07:00
.gitignore Bump lib tree-sitter dependency versions in loader crate 2021-09-03 13:29:03 -07:00
Cargo.lock Expand regex support to include emojis and binary ops 2022-02-19 11:41:36 -08:00
Cargo.toml Move code into cli directory 2019-01-04 16:50:52 -08:00
CONTRIBUTING.md Tweak readmes 2020-05-12 16:16:48 -07:00
LICENSE chore(cli): Add the LICENSE file to the tree-sitter-cli npm package 2021-08-22 03:13:46 +03:00
Makefile Fix compilation warnings (#635) 2020-06-03 12:19:57 -07:00
README.md Add zenodo citation badge to readme 2021-03-18 12:16:03 -07:00
tree-sitter.pc.in Add a simple Makefile-based build system. 2020-04-21 23:49:19 -04:00

tree-sitter

Build Status Build status DOI

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:

  • General enough to parse any programming language
  • Fast enough to parse on every keystroke in a text editor
  • Robust enough to provide useful results even in the presence of syntax errors
  • Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application