Commit graph

222 commits

Author SHA1 Message Date
Alex Pinkus
8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
Max Brunsfeld
516fd6f6de Add --abi flag to generate command, generate version 13 by default 2022-01-17 14:50:47 -08:00
Max Brunsfeld
8de8c64c95 Remove unnecessary types from binding.rs 2021-12-09 21:02:15 -08:00
Max Brunsfeld
e78413832b Restructure test suite's allocation recording so that tests can run in parallel 2021-10-11 17:24:37 -07:00
Razze
956705a23d Update to unicode standard 14 2021-10-10 16:40:31 +02:00
Max Brunsfeld
4d64c2b939 Put emscripten-version file in cli directory
This lets the CLI crate build without relying on sibling directories.
2021-09-03 13:57:45 -07:00
FnControlOption
e030434ca7 Handle aliases in unicode property escapes in regexes 2021-08-18 22:22:46 -07:00
Vladimir Panteleev
b14ea51e3d
Refactor emscripten/emsdk version to a single file 2021-06-29 21:39:12 +00:00
Vladimir Panteleev
725f3f7f2b
Pin emscripten/emsdk Docker version
Fixes issues caused by incompatible changes in Emscripten since the
time that tree-sitter was built.
2021-06-26 18:07:12 +00:00
Max Brunsfeld
036aceed57 In script/generate-bindings, add flags for latest bindgen 2021-05-25 18:02:39 -07:00
Max Brunsfeld
8e894ff3f1 Add --no-bindings flag to generate subcommand 2021-03-08 12:01:45 -08:00
Max Brunsfeld
dd4cba2625 Allow symbols to be used in precedence lists 2021-03-03 13:11:05 -08:00
Max Brunsfeld
9d9eb2234f
Merge pull request #906 from tree-sitter/unicode-property-escapes
Handle simple unicode property escapes in regexes
2021-02-17 16:14:42 -08:00
Max Brunsfeld
894357d1d1 In version script, add 'v' prefix to version tags 2021-02-11 16:16:19 -08:00
Max Brunsfeld
242ad90770 Only build the CLI crate when running benchmarks 2021-02-05 11:57:34 -08:00
Max Brunsfeld
e3ba701344 Start work on handling unicode property escapes in regexes 2021-01-29 16:37:45 -08:00
Max Brunsfeld
6c8a928253 Avoid always specifying a --target flag in test script 2020-12-02 15:35:28 -08:00
Max Brunsfeld
b661050a61 Simplify setup for enabling/disabling allocation recording in the C lib 2020-12-02 15:35:13 -08:00
Max Brunsfeld
2699c01ab1 Use latest emscripten on CI 2020-12-01 11:04:06 -08:00
Max Brunsfeld
18980b7b99 wasm: Avoid registering uncaught exception/rejection handlers 2020-12-01 11:04:06 -08:00
Max Brunsfeld
751ffd2ee1 Use new emscripten when building with docker 2020-12-01 11:04:06 -08:00
Max Brunsfeld
b118e7d750 Make binding.js syntactically valid
Put the end of the surrounding closure into a separate file, suffix.js.
2020-11-30 15:28:26 -08:00
Max Brunsfeld
533aaa462b Add heap-profiling script 2020-10-23 13:20:57 -07:00
Max Brunsfeld
297e2bcb28 static query analysis: Fix handling of fields in hidden nodes 2020-09-23 16:55:48 -07:00
Max Brunsfeld
4b9db41584 Remove unnecessary echo in test script 2020-09-02 09:17:48 -07:00
Max Brunsfeld
456b1f6771 Fix handling of alternations and optional nodes in query analysis 2020-08-20 16:28:54 -07:00
Max Brunsfeld
a317199215 Add query construction to benchmark 2020-06-26 15:05:27 -07:00
Max Brunsfeld
7b39420de3 Make it easy to build with address sanitizer in test script 2020-05-18 12:00:57 -07:00
Max Brunsfeld
61814b468d Remove build-lib script, recommend make 2020-05-12 16:28:26 -07:00
Max Brunsfeld
96bdcfcf57 mac CI: Use newer emscripten 2020-05-12 16:18:20 -07:00
Max Brunsfeld
ee46218a73 Fix incremental parsing problem with non-terminal extras
Also add PHP grammar as a fixture to test against.
2020-03-02 14:17:12 -08:00
Max Brunsfeld
96c060fc6d wasm: Fix typo in Node.typeId 2020-02-21 17:06:07 -08:00
Max Brunsfeld
7de36a33eb Remove halt_on_error API 2020-01-27 15:36:09 -08:00
Max Brunsfeld
3f109a3cb5 highlight: Fix logic for handling empty injections with no highlights 2020-01-27 12:32:37 -08:00
Max Brunsfeld
cf5a6c0b9f Test against branches of language repos w/ new injection queries 2020-01-16 12:49:00 -08:00
Patrick Thomson
39bfcdf595 Fix build with MinGW tooling. (#514)
Courtesy of @Eli-Zaretskii, these fixes should unblock people from
building tree-sitter with MinGW.

I don't think this is an unreasonable maintenance burden, especially
given the Emacs project's interest in using tree-sitter, but
@maxbrunsfeld gets the final call.
2020-01-06 09:21:40 -08:00
Phil Turnbull
1a9c68aebf
Run highlighting logic in fuzzer 2019-10-24 10:44:34 -04:00
Max Brunsfeld
ddd3dc2d6d Use emscripten 1.39.0 on mac CI 2019-10-21 16:40:44 -07:00
Max Brunsfeld
7ccec8c0e2 Tweak wasm binding to work with new upstream LLVM backend 2019-10-21 16:10:29 -07:00
Max Brunsfeld
c49afd5536 Return to using master branches of grammar repos for testing 2019-10-17 15:27:03 -07:00
Max Brunsfeld
060e00463d Implement include-children directive in injection queries 2019-10-14 17:38:42 -07:00
Max Brunsfeld
b3809274f0 Load highlight queries correctly in highlight unit tests 2019-10-14 17:24:16 -07:00
Matthew Krupcale
ee9a3c0ebb lib: remove utf8proc dependency (#436)
* Remove dependency on utf8proc

This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API.

 * .gitmodules: remove utf8proc submodule
 * docs/section-2-using-parsers.md: remove requirement for utf8proc submodule
 * docs/section-6-contributing.md: likewise
 * lib/Cargo.toml: remove utf8proc subdirectory package include
 * lib/README.md: remove utf8proc subdirectory description
 * lib/binding_rust/build.rs: remove utf8proc compiler include directory
 * lib/src/lexer.c: remove utf8proc dependencies and types
 * lib/src/lib.c: remove utf8proc dependency
 * lib/src/unicode.h: define types for Unicode decoders
 * lib/src/utf16.{c,h}: implement more readable UTF-16 decoder
 * lib/src/utf8.{c,h}: implement fast UTF-8 decoder
 * lib/utf8proc: remove utf8proc submodule directory
 * script/build-lib: remove utf8proc compiler include directory
 * script/build-wasm: likewise

* Optimize ts_lexer__get_lookahead.

Try to favor non-failure code path and assign lookahead values directly to lexer

 * lib/src/lexer.c: optimize for non-failure code path

* Fix some compiler errors

 * lib/src/lexer.c: cast from signed to unsigned for decode_next result
 * lib/src/utf16.c: fix non-constant initializers for older compilers

* Remove some missed remnants of utf8proc

 * docs/section-2-using-parsers.md: only two include paths necessary now
 * lib/src/lib.c: no need to define UTF8PROC_STATIC

* Use ICU's utf8 and utf16 decoding routines

* Remove unnecessary casts when calling icu macros

* Check buffer length before attempting to decode a unicode character

* Use new unicode function when parsing Queries

Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com>

* Mark libicu files as vendored for GitHub's stats
2019-10-14 11:18:39 -07:00
Max Brunsfeld
9323ba52c8 Minify function names in wasm build 2019-09-16 11:38:29 -07:00
Max Brunsfeld
0d913dec65 Fix layout issues in web-ui 2019-09-13 15:19:31 -07:00
Max Brunsfeld
7ad087ce27 Tweak compile flags in build-wasm script 2019-09-04 08:54:13 -07:00
Max Brunsfeld
3ac0ff2a11 Fix error in build-lib script 2019-08-30 22:07:32 -07:00
Max Brunsfeld
c5fc9d7dcb Remove existing static library in build-lib script 2019-08-29 14:30:45 -07:00
Max Brunsfeld
84c3bf1dd9 Make scripts work when repo path contains spaces 2019-08-12 15:13:41 -07:00
Max Brunsfeld
8cdc903d0f Print emcc version after installing emscripten 2019-08-08 10:56:48 -07:00