Commit graph

271 commits

Author SHA1 Message Date
Max Brunsfeld
09b18fad5b
Merge pull request #3181 from tree-sitter/handle-wasm-oom
When loading languages via WASM, gracefully handle memory errors and leaks in external scanners
2024-03-18 13:15:06 -07:00
Max Brunsfeld
096ad4669b Return to using master branch of tree-sitter-html in tests 2024-03-18 13:14:52 -07:00
Max Brunsfeld
210a032f59 Temporarily use a branch of the HTML parser for testing 2024-03-18 10:58:42 -07:00
Max Brunsfeld
7a9b3076ef Handle memory errors occurring in wasm scanners
* In WASM, use a custom, simple malloc implementation that lets us
  expicitly reset the heap with a new start location.
* When a WASM call traps or errors, propagate that as a parse failure.
* Reset the WASM heap after every parse.

Co-authored-by: Conrad <conrad@zed.dev>
2024-03-17 10:19:42 -07:00
Max Brunsfeld
a85a61e56f Update test script to allow running w/ asan on mac 2024-03-17 09:59:51 -07:00
Amaan Qureshi
99b93d83a1 feat(cli)!: add a separate build command to compile parsers
This allows users to build parsers without having to run `test` or
`parse` to invoke the compilation process, and allows them to output the
object file to wherever they like. The `build-wasm` command was merged
into this by just specifying the `--wasm` flag.
2024-03-17 05:36:30 -04:00
Amaan Qureshi
516f13e89e docs: update changelog 2024-03-12 01:14:37 -04:00
ObserverOfTime
799833f9cf build: use c11 everywhere
And improve the makefiles
2024-02-27 15:54:38 -05:00
Amaan Qureshi
4e2880407c feat: add xtasks to assist with bumping crates 2024-02-25 13:40:03 -05:00
dundargoc
f0b315359a build: improve changelog settings
Quality of life improvements:

- automatically use `gh auth token` when running `make changelog`
- Add full links to related pull requests
- Include non-conventional commits
- Force group order by adding html comments
2024-02-23 11:29:18 +01:00
Amaan Qureshi
d59f950005
docs: add GitHub user and PR info to the changelog 2024-02-21 19:45:18 -05:00
dundargoc
48a1f12ca3 build: enable creating changelogs with git-cliff
Introduce a target called `changelog` that will use git-cliff to create
the changelog of the latest release.

Closes https://github.com/tree-sitter/tree-sitter/issues/527.
2024-02-18 11:44:31 +01:00
dundargoc
5d1db069f5 test: add quotes around bash variables
This allows the script to work on directory names with spaces in them.

Co-authored-by: buckynbrocko <77247638+buckynbrocko@users.noreply.github.com>
2024-02-15 14:33:34 +01:00
dundargoc
1a6f3d39a7 build: remove symbolic links from repository
This will reduce cross-platform differences between windows and linux.

Closes https://github.com/tree-sitter/tree-sitter/issues/627.
2024-02-12 14:16:12 +01:00
Luma
94a198d20f ci(windows): exit in script when failing 2024-02-08 14:21:42 +01:00
Max Brunsfeld
d2900510f6 Remove duplicate specification of stdlib symbols for web tree-sitter 2024-02-02 12:04:49 -08:00
Max Brunsfeld
e21d9e7f93 Avoid duplication of list wasm stdlib symbol list 2024-02-02 12:00:08 -08:00
Steven Kalt
d35efd4608
feat(cli): support building WASM via podman
Previously, `tree-sitter build-wasm` had the ability to build WASM
by using docker to pull in an image with a complete emscripten toolchain.
This commit adds the ability to use podman to do the same thing.

Using podman requires two notable changes:
1. Using the fully-qualified image name. Docker defaults to prepending
    `docker.io` to the image name, but podman does not.
2. Podman will mount the `/src/` volume as belonging to root unless
  `--userns=keep-id` is passed. I think podman's different
  volume-ownership is related to podman's daemonless execution and
  `--uidmap` functionality, but I'm not 100% sure.

To test, I ran
```sh
script/fetch-fixtures
script/generate-fixtures
script/generate-fixtures-wasm # <- the important one!
```

which worked as well as the docker version.
2024-01-29 00:50:32 -05:00
Max Brunsfeld
68ba9a4d66 Grow memory dynamically as-needed when loading wasm language modules 2023-12-03 12:12:47 -08:00
Max Brunsfeld
6fd7a1e44e Return informative error when load_language fails 2023-11-26 12:15:05 -08:00
Max Brunsfeld
0743edd162 Include two more std::string functions in wasm stdlib 2023-10-27 21:54:23 +01:00
Max Brunsfeld
f4e2f68f14 Merge branch 'master' into wasm-language 2023-10-27 12:11:43 +01:00
Andrew Hlynskyi
dd52cafdd9 chore: switch fetch-fixtures.cmd to all master branches 2023-09-21 11:28:22 +03:00
Amaan Qureshi
ef9cabd4b5
fix: update javascript tests and use cpp/javascript master for fixtures 2023-09-20 11:31:53 -04:00
Andrew Hlynskyi
9cc1daafca chore(ffi): remove enum name prefixes from all C enum values 2023-09-03 07:38:27 +03:00
Andrew Hlynskyi
abd57bc69b chore: simplify script/generate-bindings 2023-08-21 02:56:14 +03:00
Andrew Hlynskyi
759af6d0a4 Remove Copy, Clone from TSLookaheadIterator raw binding struct 2023-08-02 00:04:17 +03:00
Amaan Qureshi
f9e5696bcb
ci: rework fuzzer script 2023-07-24 00:44:44 -04:00
Andrew Hlynskyi
b52d9313dd chore: script/test - fix usage, remove trial mention, deleted in 7170ec7c 2023-07-16 19:50:13 +03:00
Andrew Hlynskyi
45aede8bf5 script/generate-bindings - protect from using old incompatible bindgen versions 2023-07-14 00:19:23 +03:00
Andrew Hlynskyi
a2bcc4f448 script/generate-bindings - no derived Copy, Clone for ptr data wrappers 2023-07-14 00:19:23 +03:00
Andrew Hlynskyi
f01c4f8376 Restore Rust bindings generation with newer bindgen 0.65.1 2023-07-13 17:34:32 +03:00
Max Brunsfeld
a2119cb691 Add APIs for retrieving tree cursor's depth and descendant index 2023-06-12 11:50:44 -07:00
Andrew Hlynskyi
a9dfcb9e47 script/build-wasm: update emcc options to use actual non-deprecated names 2023-05-16 04:46:46 +03:00
Andrew Hlynskyi
be179a3c80 script/build-wasm: add a --verbose option 2023-05-15 17:03:24 +03:00
Andrew Hlynskyi
09fe5f29d9 fix(test): stick tree-sitter-cpp fixture grammar to a specific hash
It's needed to fix tests fail: https://github.com/tree-sitter/tree-sitter-cpp/pull/202#issuecomment-1546279646

See CLI xtask notes in https://github.com/tree-sitter/tree-sitter/issues/1223
2023-05-13 18:57:42 +03:00
Andrew Hlynskyi
cc6596be82 chore(bindgen): update bindgen to 0.65.1 and regenerate bindings 2023-04-17 11:24:05 +03:00
Andrew Hlynskyi
c38f78345e binding(rust): update script/generate-bindings to use latest rust-bindgen 0.64.0 version 2023-04-04 22:16:27 +03:00
Andrew Hlynskyi
cc4f932d17 cicd: new workflow 2023-04-04 03:42:16 +03:00
Max Brunsfeld
97fd990822 Add --dot flag to parse subcommand, for printing tree as DOT graph 2023-02-13 12:33:34 -08:00
Boris Verkhovskiy
61b85b2664 Make error message more specific 2023-01-08 08:10:14 -07:00
Max Brunsfeld
98ccfcffb0 Provide minimal C/C++ std library to wasm external scanners 2022-11-15 17:14:33 -08:00
Max Brunsfeld
d47713ee4a Integrate WASM compilation into the CLI's Loader 2022-11-15 17:14:33 -08:00
Max Brunsfeld
3f1a7f9cd4 Start work on ability to load wasm languages from native lib, via wasmtime 2022-11-15 17:14:33 -08:00
Jonathan Arnett
a8988339c3 Add 'stringToUTF16' and 'AsciiToString' to exported method 2022-11-15 16:39:17 -08:00
Max Brunsfeld
15190a497d Build core wasm library with C++ exceptions disabled 2022-09-02 14:55:50 -07:00
Kian-Meng Ang
b8552ec6c4 Fix typos 2022-06-28 19:57:42 +08:00
Max Brunsfeld
be463c7789 Update test script with flags for new randomized test options 2022-06-24 14:24:21 -07:00
Max Brunsfeld
7170ec7c96 Improve randomized testing setup
* Allow iterations to be specified via an env var
* Randomly decide the edit count, with a maximum
  specified via an env var.
* Instead of separate env vars for starting seed + trial, just accept a seed
* Remove some noisy output
2022-03-02 17:12:25 -08:00
Alex Pinkus
8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00