Compare commits

...
Sign in to create a new pull request.

133 commits

Author SHA1 Message Date
Amaan Qureshi
da6fe9beb4
0.25.10 2025-09-22 17:52:56 -04:00
WillLillis
f75d710ec6 fix(rust): pass correct fd to C lib's ts_tree_print_dot_graph
Co-authored-by: Amaan Qureshi <git@amaanq.com>
(cherry picked from commit 92678f0fc5)
2025-09-22 17:45:30 -04:00
Amaan Qureshi
9275aeba08 build: define _DARWIN_C_SOURCE
(cherry picked from commit 95ab17e444)
2025-09-22 17:45:23 -04:00
Amaan Qureshi
bb1634a544 perf(xtask); check out the tag directly for fixtures 2025-09-22 17:19:21 -04:00
Will Lillis
8eed0c7693 feat(test): pin fixture grammars to specific commits 2025-09-22 17:19:21 -04:00
Amaan Qureshi
88b3f9cc2b fix: lint 2025-09-22 17:19:21 -04:00
Amaan Qureshi
590f6d7c9a build: update Cargo.lock 2025-09-22 17:19:21 -04:00
Nia
32d784ec8f fix(wasm): fix alias map size computation
This fixes a crash where parsing with certain languages can lead to a crash due to how the alias map was allocated and laid out in wasm memory

(cherry picked from commit f09dc3cf46)
2025-09-22 17:19:21 -04:00
Amaan Qureshi
17d3f85582 fix(lib/wasm): keep track of freed blocks that are not the last allocated pointer
This fixes issues where the scanner allocates and frees a lot of data
during a single parse.

Co-authored-by: Will Lillis <will.lillis24@gmail.com>
2025-09-22 17:19:21 -04:00
Amaan Qureshi
e29a5ee82e fix(xtask): make building the wasm stdlib work again
Co-authored-by: Will Lillis <will.lillis24@gmail.com>
(cherry picked from commit b863b16454)
2025-09-22 17:19:21 -04:00
Will Lillis
20dad25cce fix(lib): improve wasm scanner serialization error handling
Co-authored-by: Amaan Qureshi <contact@amaanq.com>
(cherry picked from commit 0c35511aea)
2025-09-22 17:19:21 -04:00
Will Lillis
a467ea8502 fix(rust): correct crate versions in root Cargo.toml file 2025-09-06 21:51:17 +02:00
Christian Clason
6cd25aadd5 0.25.9 2025-09-06 21:17:26 +02:00
Will Lillis
027136c98a fix(generate): use correct state id when adding terminal states to
non terminal extras

(cherry picked from commit 5fd818babe)
2025-09-04 04:52:45 -04:00
Will Lillis
14c4d2f8ca fix(generate): return error when single state transitions have
indirectly recursive cycles.

This can cause infinite loops in the parser near EOF.

Co-authored-by: Amaan Qureshi <amaanq12@gmail.com>
(cherry picked from commit 310c0b86a7)
2025-09-04 01:47:36 -04:00
Will Lillis
8e2b5ad2a4 fix(test): improve readability of corpus error message mismatch
(cherry picked from commit cc5463ad44)
2025-09-04 01:47:36 -04:00
vemoo
bb82b94ded fix(web): correct type errors, improve build
(cherry picked from commit 4db3edadf4)
2025-08-31 13:22:28 +03:00
ObserverOfTime
59f3cb91c2 fix(npm): add directory to repository fields
and remove non-existent "main" entry point

(cherry picked from commit 90bdd63a71)
2025-08-30 17:18:14 -04:00
ObserverOfTime
a80cd86d47 fix(cli): fix DSL type declarations
(cherry picked from commit ca27fb5d43)
2025-08-30 17:16:05 -04:00
Will Lillis
253003ccf8 fix(generate): warn users when extra rule can lead to parser hang
When a *named* rule in the extras is able to match the empty string,
parsing can hang in certain situations (i.e. near EOF).

(cherry picked from commit ac171eb280)
2025-08-29 23:32:57 -04:00
ObserverOfTime
e61407cc36 fix(bindings): properly detect MSVC compiler
(cherry picked from commit 0be215e152)
2025-08-29 15:30:52 +03:00
ObserverOfTime
cd503e803d build(zig): support wasmtime for ARM64 Windows (MSVC)
(cherry picked from commit e0edfe1cb3)
2025-08-28 20:27:11 -04:00
Amaan Qureshi
77e5c1c8aa fix(lib): allow error nodes to match when they are child nodes
(cherry picked from commit 8387101a61)
2025-08-28 20:26:16 -04:00
Amaan Qureshi
22fa144016 fix(lib): check if an ERROR node is named before assuming it's the builtin error node
(cherry picked from commit b7f36a13ba)
2025-08-28 23:53:00 +02:00
ObserverOfTime
1083795af6 style(zig): reformat files
(cherry picked from commit 66ea1a6dda)
2025-08-28 22:32:43 +03:00
ObserverOfTime
dc0b5530b3 build(zig): use ArrayListUnmanaged
This is supported in 0.14 and 0.15

(cherry picked from commit 298b6775c6)
2025-08-28 22:32:43 +03:00
ObserverOfTime
910b3c738c build(zig): don't link wasmtime in static build
(cherry picked from commit 2e4b7d26b1)
2025-08-28 22:32:43 +03:00
ObserverOfTime
f764f485d2 build(zig): expose wasmtimeDep function
This allows consumers to reuse the dependency.

(cherry picked from commit dab84a1b10)
2025-08-28 22:32:43 +03:00
ObserverOfTime
d5b8c19d0b fix(bindings): add tree-sitter as npm dev dependency
npm is supposed to automatically install peer dependencies since v7
but sometimes it's not doing it and we need this dependency for tests

(cherry picked from commit e67f9f8f7a)
2025-08-28 13:57:22 +02:00
ObserverOfTime
9504c247d6 fix(bindings): improve zig dependency fetching logic
Currently, including a tree-sitter parser as a dependency in a zig
project and running `zig build test` on the project will fetch the
zig-tree-sitter dependency declared by the parser. This is a problem
because (a) consumers may not want this dependency for whatever reason
and (b) due to how often Zig breaks everything and how scarcely most
tree-sitter parsers are updated, the zig-tree-sitter version pinned
by the parser module will often be outdated and broken.

The workaround I used was taken from https://ziggit.dev/t/11234

(cherry picked from commit 107bd800b0)
2025-08-28 10:59:06 +02:00
Quentin LE DILAVREC
17cb10a677 fix(rust): EqCapture accepted cases where number of captured nodes differed by one
Problem: When using alternations, the `#eq?` predicate does not always use the same capture name.

Solution: Iterate the left and right captured nodes more independently.
(cherry picked from commit 79177a1cd5)
2025-08-27 11:02:52 +02:00
WillLillis
25d63ab7ab fix(wasm): delete var_i32_type after initializing global stack
pointer value

(cherry picked from commit 0d914c860a)
2025-08-25 17:51:30 -04:00
Alexander von Gluck
629093d2c3 fix(c): add Haiku support to endian.h
(cherry picked from commit be888a5fef)
2025-08-22 18:00:09 +03:00
JacobCrabill
dc4e5b5999 build(zig): fix package hashes for Zig 0.14
Zig 0.14 changed how package hashes are computed and used, and if the
old package hashes are left, every call to `zig build` will re-download
every package every time.  Updating to the new hash format solves this.

(cherry picked from commit e3db212b0b)
2025-08-20 22:07:53 +03:00
Niklas Koll
a53058b84d build(zig): update build.zig.zon for zig 0.14
(cherry picked from commit 1850762118)
2025-08-20 22:07:53 +03:00
Omar-xt
de141362d5
build(zig): fix package name 2025-08-19 11:13:50 +03:00
Ronald T. Casili
8f7539af72 fix(bindings): update zig template files (#4637)
(cherry picked from commit d87921bb9c)
2025-08-09 14:41:43 +03:00
ObserverOfTime
c70d6c2dfd fix(bindings): use custom class name
(cherry picked from commit 9d619d6fdc)
2025-08-08 12:38:41 +03:00
Will Lillis
1b2fc42e45 fix(ci): ignore mismatched_lifetime_syntaxes lint when building wasmtime
(cherry picked from commit 49ae48f7fe)
2025-08-08 11:39:12 +03:00
Will Lillis
dbbe8c642d fix(rust): ignore new mismatched-lifetime-syntaxes lint
(cherry picked from commit 46a0e94de7)
2025-08-08 11:39:12 +03:00
Will Lillis
362419836e fix(rust): correct indices for Node::utf16_text
(cherry picked from commit d3c2fed4b3)
2025-08-02 16:35:20 -04:00
Will Lillis
0c83a5d03e fix(cli): improve error message when language in list can't be found (#4643)
Problem: When multiple input paths are provided to the `parse` command (a la `tree-sitter parse --paths [...]`), if a language can't be found for one of the paths, it can be a little unclear *which* path caused the failure. The loader *can* fail with `Failed to load language for file name <foo.bar>`, but this isn't guaranteed.

Solution: Attach some additional context in the case where multiple paths can be provided, displaying the problematic path on failure.
(cherry picked from commit 9ced6172de)
2025-08-02 12:17:52 +02:00
Pieter Goetschalckx
05bfeb5b69 fix(cli): add reserved type declarations and schema
- Use `globalThis` for `reserved` function export
- Add `reserved` field and function to DSL declarations
- Add `reserved` rule to grammar schema

(cherry picked from commit 07b4c8d05d)
2025-08-02 11:51:09 +02:00
Riley Bruins
e7f4dfcd4a fix(query): prevent cycles when analyzing hidden children
**Problem:** `query.c` compares the current analysis state with the
previous analysis state to see if they are equal, so that it can return
early if so. This prevents redundant work. However, the comparison
function here differs from the one used for sorted insertion/lookup in
that it does not check any state data other than the child index. This
is problematic because it leads to infinite analysis when hidden nodes
have cycles.

**Solution:** Remove the custom comparison function, and apply the
insertion/lookup comparison function in place of it.

**NOTE:** This commit also changes the comparison function slightly, so
that some comparisons are reordered. Namely, for performance, it returns
early if the lhs depth is less than the rhs depth. Is this acceptable?
Tests still pass and nothing hangs in my testing, but it still seems
sketchy. Returning early if the lhs depth is greater than the rhs depth
does seem to make query analysis hang, weirdly enough... Keeping the
depth checks at the end of the loop also works, but it introduces a
noticeable performance regression (for queries that otherwise wouldn't
have had analysis cycles, of course).

(cherry picked from commit 6850df969d)
2025-07-30 01:15:58 -04:00
Robert Muir
d507a2defb feat(bindings): improve python binding test
Previously, the test would not detect ABI incompatibilities.

(cherry picked from commit 8c61bbdb73)
2025-07-29 23:52:26 -04:00
ObserverOfTime
3c0088f037 fix(bindings): improve python platform detection
(cherry picked from commit 99988b7081)
2025-07-29 23:52:14 -04:00
ObserverOfTime
e920009d60 fix(bindings): only include top level LICENSE file
Ref: tree-sitter/workflows#33
(cherry picked from commit 436162ae7c)
2025-07-29 23:52:03 -04:00
ObserverOfTime
b4fd46fdc0 fix(bindings): use parser title in lib.rs description
(cherry picked from commit c3012a7d8a)
2025-07-29 23:51:51 -04:00
Riley Bruins
81e7410b78 fix(rust): prevent overflow in error message calculation
**Problem:** When encountering an invalid symbol at the beginning of the
file, the rust bindings attempt to index the character at position -1 of
the query source, which leads to an overflow and thus invalid character
index which causes a panic.

**Solution:** Bounds check the offset before performing the subtraction.

(cherry picked from commit dff828cdbe)
2025-07-25 12:14:35 +02:00
Will Lillis
58edb3a11c perf(generate): reserve more Vec capacities
(cherry picked from commit 0f79c61188)
2025-07-19 12:32:09 +02:00
Ronald T. Casili
ad95b2b906 fix(build.zig): remove deprecated addStaticLibrary()
(cherry picked from commit 618b9dd66e)
2025-07-16 11:40:36 +02:00
Alex Aron
d991edf074 fix(lib): add wasm32 support to portable/endian.h (#4607)
(cherry picked from commit aeab755033)
2025-07-14 19:30:38 +02:00
Will Lillis
f2f197b6b2 0.25.8 2025-07-13 20:32:42 +02:00
Will Lillis
8bb33f7d8c perf: reorder conditional operands
(cherry picked from commit 854f527f6e)
2025-07-13 20:05:01 +02:00
Will Lillis
6f944de32f fix(generate): propagate node types error
(cherry picked from commit c740f244ba)
2025-07-13 20:05:01 +02:00
Will Lillis
c15938532d 0.25.7 2025-07-12 20:47:20 +02:00
Will Lillis
94b55bfcdc perf: reorder expensive conditional operand
(cherry picked from commit 5ed2c77b59)
2025-07-12 20:17:47 +02:00
WillLillis
bcb30f7951 fix(generate): use topological sort for subtype map 2025-07-10 17:43:08 -04:00
Antonin Delpeuch
3bd8f7df8e perf: More efficient computation of used symbols
As the call to `symbol_is_used` does not depend
on the production, it is more efficient to call it
only once outside the loop over productions.

I'm not sure if `rustc` is able to do this optimization
on its own (it would need to know that the function
is pure, which sounds difficult in general).

(cherry picked from commit 36d93aeff3)
2025-07-10 09:25:22 +02:00
Will Lillis
d7529c3265 perf: reserve Vec capacities where appropriate
(cherry picked from commit 1e7d77c517)
2025-07-09 22:33:57 -04:00
Bernardo Uriarte
bf4217f0ff fix(web): wasm export paths 2025-07-09 21:07:29 +02:00
Antonin Delpeuch
bb7b339ae2 Fix 'extra' field generation for node-types.json
(cherry picked from commit 1a3b0375fa)
2025-07-07 21:58:47 -04:00
Antonin Delpeuch
9184a32b4b Add test demonstrating failure to populate 'extra'
The test is currently failing, will be fixed by the next commit.

(cherry picked from commit 59bcffe83b)
2025-07-07 21:58:47 -04:00
WillLillis
78a040d78a fix(rust): ignore new nightly lint, correct order of lint list
(cherry picked from commit 8938309f4b)
2025-07-06 19:11:59 +02:00
Veesh Goldman
ab6c98eed7 fix(cli): require correct setuptools version
(cherry picked from commit b09a15eb54)
2025-06-27 14:46:01 +02:00
Will Lillis
6b84118e33 fix(generate): only display conflicting symbol name in non-terminal
word token error message if available

(cherry picked from commit a9818e4b17)
2025-06-26 15:40:48 +02:00
Christian Clason
2bc8aa939f ci(lint): stop linting with nightly 2025-06-26 15:06:52 +02:00
ObserverOfTime
462fcd7c30 fix(loader): fix no-default-features build (#4505) 2025-06-11 18:10:21 +02:00
Will Lillis
ffbe504242 fix(xtask): limit test command to a single thread on windows (#4489)
(cherry picked from commit e1f6e38b57)
2025-06-08 19:03:52 +02:00
tree-sitter-ci-bot[bot]
4fcf78cfec
fix(bindings): update swift & node dependencies (#4432) (#4499)
Co-authored-by: ObserverOfTime <chronobserver@disroot.org>
2025-06-07 15:09:22 -04:00
James McCoy
415a657d08 fix(test): remove period in test_flatten_grammar_with_recursive_inline_variable
The period was dropped in the `thiserror` refactor
(79444e07f9), which caused the
`test_flatten_grammar_with_recursive_inline_variable` test to fail.

Signed-off-by: James McCoy <jamessan@jamessan.com>
(cherry picked from commit a6e530b33d)
2025-06-06 16:39:45 +02:00
Thalia Archibald
a293dcc1c5 fix(highlight): account for carriage return at EOF and chunk ends
(cherry picked from commit 6ba73fd888)
2025-06-05 09:16:09 +02:00
Will Lillis
b890e8bea0 fix(lib): replace raw array accesses with array_get
(cherry picked from commit 8bd923ab9e)
2025-06-05 01:42:29 -04:00
Max Brunsfeld
bf655c0bea 0.25.6 2025-06-04 09:08:14 -07:00
Olive Easton
8ef6f0685b fix(generate): re-enable default url features
(cherry picked from commit 50622f71f8)
2025-06-04 10:56:00 +02:00
tree-sitter-ci-bot[bot]
057c6ad2ba
Fully fix field underflow in go_to_previous_sibling (#4483) (#4485)
(cherry picked from commit 2ab9c9b590)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
2025-06-02 16:12:16 -07:00
Kai Pastor
c44110c29f fix(build): pkgconfig and use of GNUInstallDirs (#4319)
* Fix pkgconfig

Init CMAKE_INSTALL_INCLUDEDIR before pc file generation.
Install pc file to CMAKE_INSTALL_LIBDIR/pkgconfig -
it accompanies the architecture-dependent library.

* Include GNUInstallDirs early

The CMake module initializes variables which are used for
exported information (CMake and pkgconfig).

* Change pc file install destination

(cherry picked from commit 0bdf698673)
2025-05-31 12:12:29 +02:00
Christian Clason
baf222f772 Revert "feat: add build sha to parser.c header comment" (#4475)
This reverts commit dc4e232e6e.

Reason: The sha in the generated output (which most distro builds of
tree-sitter, including `cargo install`, strip) produces too many
conflicts when verifying via CI that parsers are regenerated on every
grammar change.

(cherry picked from commit e7f9160867)
2025-05-29 23:14:25 +02:00
Max Brunsfeld
4cac30b54a Ignore lock files in grammar repos
It is very common practice to ignore
these lock files for libraries, since they do not apply to applications
that use the libraries. The lock files are especially not useful in
tree-sitter grammar repos, since tree-sitter grammars should not have
dependencies. The lock files are just a source of merge conflicts and
spurious CI failures.
2025-05-29 11:33:49 +02:00
Max Brunsfeld
460118b4c8 0.25.5 2025-05-27 18:01:08 -07:00
Max Brunsfeld
42ca484b6b Fix hang in npm install script 2025-05-27 17:36:43 -07:00
tree-sitter-ci-bot[bot]
75550c8e2c
Fix crash w/ goto_previous_sibling when parent node has leading extra child (#4472) (#4473)
* Fix crash w/ goto_previous_sibling when parent node has leading extra
child Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>



* Fix lint



---------


(cherry picked from commit f91255a201)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>
2025-05-27 17:35:57 -07:00
Haoxiang Fei
02f9c1502b fix: wasi has endian.h
(cherry picked from commit 06537fda83)
2025-05-24 13:08:42 +02:00
Mike Zeller
d6701c68d3 illumos has endian.h
(cherry picked from commit 4339b0fe05)
2025-05-15 10:24:32 +02:00
Will Lillis
726dcd1e87 0.25.4 2025-05-11 16:21:17 +02:00
Will Lillis
b0a6bde2fb fix(lib): return early for empty predicate step slice
(cherry picked from commit 31b9717ca3)
2025-05-11 15:23:39 +02:00
Will Lillis
69723ca40e fix(query): correct last_child_step_index in cases where a new step
wasn't created.

This fixes an OOB access to `self.steps` when a last child anchor
immediately follows a predicate.

(cherry picked from commit b1d2b7cfb8)
2025-05-04 00:08:10 +02:00
Will Lillis
97131b4a73 fix(rust): address new clippy lint
(cherry picked from commit cc634236b1)
2025-05-03 22:00:55 +02:00
Will Lillis
a3f86b1fa9 fix(rust): ignore obfuscated_if_else lint
(cherry picked from commit 91274f47e4)
2025-05-03 22:00:55 +02:00
Amaan Qureshi
41413e7a71 fix(generate): mark url as a Windows-only dependency
(cherry picked from commit 3056dc5be4)
2025-04-29 09:20:34 +02:00
Amaan Qureshi
d7d0d9fef3 fix(lib): do not access the alias sequence for the end subtree in ts_subtree_summarize_children
(cherry picked from commit 21c658a12c)
2025-04-29 09:19:37 +02:00
Will Lillis
a876fff5ba fix(parse): explicitly move temporaries in the logger callback
This fixes problems where these stack-local temporaries are used after their scope ends.

(cherry picked from commit dcdd5bc372)
2025-04-28 10:12:37 +02:00
Will Lillis
7ddcc7b20b perf(highlight): use BTreeMap over IndexMap for highlight configs
(cherry picked from commit c7475e4bf3)
2025-04-20 07:30:24 -04:00
Daniel Jalkut
779d613941 docs(cli): improve documentation for the edits argument when parsing code
(cherry picked from commit 4514751803)
2025-04-19 12:22:46 +02:00
Tamir Bahar
0d360a1831 fix(web): replace dynamic require with import
(cherry picked from commit 27fa1088b9)
2025-04-19 12:19:20 +02:00
MichiRecRoom
d44d0f94da docs(rust): improve bindings' crate doc
(cherry picked from commit 853ca46899)
2025-04-19 12:01:25 +02:00
vemoo
69e857b387 feat(web): export wasm files to better support bundling use cases
(cherry picked from commit 4dffb818e2)
2025-04-19 11:59:23 +02:00
WillLillis
42624511cf fix(ci): increase timeouts for flaky tests
(cherry picked from commit eee41925aa)
2025-04-19 11:37:12 +02:00
Riley Bruins
20a5d46b50 fix(web): correct childWithDescendant() functionality
This fix allows for more granular address control when marshalling nodes
across WASM. This is necessary for node methods which accept another
node as a parameter (i.e., `childWithDescendant()`)

(cherry picked from commit 21390af2dd)
2025-04-18 19:13:51 -04:00
Riley Bruins
62cc419262 fix(lib): reset parser options after use
**Problem:** After `ts_parser_parse_with_options()`, the parser options
are still stored in the parser object, meaning that a successive call to
`ts_parser_parse()` will actually behave like
`ts_parser_parse_with_options()`, which is not obvious and can have
unintended consequences.

**Solution:** Reset to empty options state after
`ts_parser_parse_with_options()`.

(cherry picked from commit 733d7513af)
2025-04-15 09:37:45 +02:00
Paul Gey
264684d31d Make highlighting more deterministic when themes are ambiguous
(cherry picked from commit b341073192)
2025-04-11 10:20:43 +02:00
Jon Shea
e295c99eca fix(rust): clarify error message for non-token reserved words
Improve the `NonTokenReservedWord` error message by including the
specific reserved word that was not used as a token.

(cherry picked from commit 92c5d3b8e2)
2025-04-10 01:10:25 -04:00
Jason Boatman
9fda3e417e Fix WASI build by not calling a non-existent function. (#4343)
(cherry picked from commit abc5c6bc50)
2025-04-08 19:14:46 +02:00
Edgar Onghena
d2914ca243 chore(generate): add @generated to parser.c header (#4338)
This makes `parser.c` follow the https://generated.at/ convention for generated files. This potentially allows any compatible IDE to discourage editing it directly.

(cherry picked from commit 52d2865365)
2025-04-08 11:20:25 +02:00
Will Lillis
4619261da0 fix(cli): display "N/A" in parse stats where appropriate when no parsing
took place

(cherry picked from commit 0f949168ef)
2025-04-06 17:13:43 +02:00
Will Lillis
14d930d131 fix(highlight): account for multiple rows in highlight testing assertions
(cherry picked from commit 71941d8bda)
2025-04-06 17:13:43 +02:00
Amaan Qureshi
ff8bf05def fix(rust): adapt to new clippy lints
(cherry picked from commit 74d7ca8582)
2025-04-06 16:12:21 +02:00
Amaan Qureshi
150cd12b66 fix: add generate crate to workspace members
(cherry picked from commit 1a80a1f413)
2025-04-06 16:12:21 +02:00
WillLillis
fae24b6da6 fix(rust): address new nightly lint for pointer comparisons
(cherry picked from commit 521da2b0a7)
2025-03-28 09:41:19 +01:00
Simon Willshire
ed69a74463 fix(rust): use core crates for no_std
also add `no_std` build to CI
2025-03-25 15:02:14 +01:00
WillLillis
acc9cafc7c fix(rust): address new clippy lint for pointer comparisons
(cherry picked from commit dac6300558)
2025-03-25 14:11:21 +01:00
Peter Oliver
d25e5d48ea fix(build): make install shouldn’t fail when a parser bundles no queries (#4284)
(cherry picked from commit 17471bdfcc)
2025-03-14 10:06:40 +01:00
WillLillis
774eebdf6b fix(xtask): error if new version supplied to bump-version is less than
or equal to current version

(cherry picked from commit 5985690d45)
2025-03-14 10:06:27 +01:00
WillLillis
979e5ecec0 fix(cli): properly escape invisible characters in parse error output
(cherry picked from commit efd212ee46)
2025-03-12 11:33:28 +01:00
dependabot[bot]
b1a9a827d6 build(deps): bump emscripten to 4.0.4
(cherry picked from commit 12aff698b9)
2025-03-12 10:57:58 +01:00
dependabot[bot]
e413947cc5 build(deps): bump ring from 0.17.8 to 0.17.13
Bumps [ring](https://github.com/briansmith/ring) from 0.17.8 to 0.17.13.
- [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md)
- [Commits](https://github.com/briansmith/ring/commits)

---
updated-dependencies:
- dependency-name: ring
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
(cherry picked from commit 276accc210)
2025-03-12 10:57:58 +01:00
WillLillis
c313be63b2 fix(rust): adapt to new nightly lint
(cherry picked from commit 11071ed682)
2025-03-06 18:25:24 -05:00
NOT XVilka
4adcebe284 fix(lib): remove duplicate TSLanguageMetadata typedef (#4268)
(cherry picked from commit a00fab7dc4)
2025-03-06 23:48:22 +01:00
Max Brunsfeld
2a835ee029 0.25.3 2025-03-04 16:03:16 -08:00
tree-sitter-ci-bot[bot]
3ad1c7d4e1
Fix cases where error recovery could infinite loop (#4257) (#4262)
* Rename corpus test functions to allow easy filtering by language

* Use usize for seed argument

* Avoid retaining useless stack versions when reductions merge

We found this problem when debugging an infinite loop that happened
during error recovery when using the Zig grammar. The large number of
unnecessary paused stack versions were preventing the correct recovery
strategy from being tried.

* Fix leaked lookahead token when reduction results in a merged stack

* Enable running PHP tests in CI

* Fix possible infinite loop during error recovery at EOF

* Account for external scanner state changes when detecting changed ranges in subtrees

(cherry picked from commit 066fd77d39)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-03-04 15:38:59 -08:00
polazarus
b1a7074010 fix(generate): mark TSCharacterRange as static (#4255)
Problem: Linking different parsers into one executable fails due to duplicate symbols.

Solution: Mark `TSCharacterRange` as `static` when generating parsers.

fixes #4209

(cherry picked from commit 8138dba800)
2025-03-04 16:52:58 +01:00
tree-sitter-ci-bot[bot]
6f2dbaab5f
build: do not define _POSIX_C_SOURCE on NetBSD (#4196)
It leads to missing symbols, see #4180.

(cherry picked from commit 2bf04d1f04)

---------

Co-authored-by: Thomas Klausner <wiz@gatalith.at>
2025-03-02 23:46:23 +01:00
WillLillis
781dc0570d ci: separate nightly lints to separate job
(cherry picked from commit 1fdd1d250c)
2025-03-02 23:20:08 +01:00
WillLillis
1f64036d87 fix(test): update expected tree-sitter-rust supertypes
(cherry picked from commit 998fb34d15)
2025-03-02 23:20:08 +01:00
WillLillis
4eb46b493f fix(rust): adapt to some new nightly lints
(cherry picked from commit cb30ec5b17)
2025-03-02 23:20:08 +01:00
Roberto Huertas
d73126d582 fix(web): provide type in the exports
When using TypeScript projects using other module settings than CommonJs, the types were not correctly exposed, and the compilation failed.

This adds the types path to the exports so compilation works for `module: NodeNext` and other variants.

(cherry picked from commit f95e0e3a56)
2025-02-28 19:11:40 +01:00
Will Lillis
637a3e111b fix(wasm): restore passing in ERROR to descendantsOfType (#4226)
(cherry picked from commit 3b67861def)
2025-02-20 16:08:19 +01:00
Max Brunsfeld
8b5c63bffa tree-sitter-language 0.1.5 2025-02-17 19:47:40 -08:00
Max Brunsfeld
6e0618704a 0.25.2 2025-02-17 18:54:23 -08:00
tree-sitter-ci-bot[bot]
64665ec462
Decrease the MSRV for the tree-sitter-language crate (#4221) (#4222)
(cherry picked from commit b26b7f8d62)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-02-17 18:54:06 -08:00
tree-sitter-ci-bot[bot]
1925a70f7e
Reset result_symbol field of lexer in wasm memory in between invocations (#4218) (#4220)
(cherry picked from commit 2bd400dcee)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-02-17 18:52:32 -08:00
tree-sitter-ci-bot[bot]
02625fc959
Ignore external tokens that are zero-length and extra (#4213) (#4216)
Co-authored-by: Anthony <anthony@zed.dev>
(cherry picked from commit dedcc5255a)

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-02-17 17:38:13 -08:00
Max Brunsfeld
d799b78663 Fix crash when loading languages w/ old ABI via wasm (#4210)
(cherry picked from commit 14b8ead412)
2025-02-17 23:48:59 +01:00
126 changed files with 3406 additions and 1557 deletions

View file

@ -10,6 +10,9 @@ insert_final_newline = true
[*.rs]
indent_size = 4
[*.{zig,zon}]
indent_size = 4
[Makefile]
indent_style = tab
indent_size = 8

View file

@ -139,6 +139,8 @@ jobs:
[[ -n $runner ]] && printf 'CROSS_RUNNER=%s\n' "$runner" >> $GITHUB_ENV
fi
# TODO: Remove RUSTFLAGS="--cap-lints allow" once we use a wasmtime release that addresses
# the `mismatched-lifetime-syntaxes` lint
- name: Build wasmtime library
if: ${{ !matrix.use-cross && contains(matrix.features, 'wasm') }}
run: |
@ -156,6 +158,7 @@ jobs:
printf 'CMAKE_PREFIX_PATH=%s\n' "$PWD/artifacts" >> $GITHUB_ENV
env:
WASMTIME_REPO: https://github.com/bytecodealliance/wasmtime
RUSTFLAGS: "--cap-lints allow"
- name: Build C library (make)
if: ${{ runner.os != 'Windows' }}
@ -195,6 +198,13 @@ jobs:
npm run build
npm run build:debug
- name: Check no_std builds
if: ${{ !matrix.no-run && inputs.run-test }}
shell: bash
run: |
cd lib
$BUILD_CMD check --no-default-features
- name: Build target
run: $BUILD_CMD build --release --target=${{ matrix.target }} --features=${{ matrix.features }}

View file

@ -32,11 +32,6 @@ jobs:
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
- name: Set up nightly Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: nightly
components: clippy, rustfmt
- name: Lint files

121
Cargo.lock generated
View file

@ -10,14 +10,14 @@ checksum = "512761e0bb2578dd7380c6baaa0f4ce03e84f95e960231d1dec8bf4d7d6e2627"
[[package]]
name = "ahash"
version = "0.8.11"
version = "0.8.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e89da841a80418a9b391ebaea17f5c112ffaaa96f621d2c285b5174da76b9011"
checksum = "5a15f179cd60c4584b8a8c596927aadc462e27f2ca70c04e0071964a73ba7a75"
dependencies = [
"cfg-if",
"once_cell",
"version_check",
"zerocopy",
"zerocopy 0.8.27",
]
[[package]]
@ -1118,9 +1118,9 @@ checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67"
[[package]]
name = "libc"
version = "0.2.169"
version = "0.2.175"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b5aba8db14291edd000dfcc4d620c7ebfb122c613afb886ca8803fa4e128a20a"
checksum = "6a82ae493e598baaea5209805c49bbf2ea7de956d50d7da0da1164f9c6d28543"
[[package]]
name = "libgit2-sys"
@ -1460,7 +1460,7 @@ version = "0.2.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77957b295656769bb8ad2b6a6b09d897d94f05c41b069aede1fcdaa675eaea04"
dependencies = [
"zerocopy",
"zerocopy 0.7.35",
]
[[package]]
@ -1563,9 +1563,9 @@ dependencies = [
[[package]]
name = "regalloc2"
version = "0.11.1"
version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "145c1c267e14f20fb0f88aa76a1c5ffec42d592c1d28b3cd9148ae35916158d3"
checksum = "dc06e6b318142614e4a48bc725abbf08ff166694835c43c9dae5a9009704639a"
dependencies = [
"allocator-api2",
"bumpalo",
@ -1615,15 +1615,14 @@ dependencies = [
[[package]]
name = "ring"
version = "0.17.8"
version = "0.17.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c17fa4cb658e3583423e915b9f3acc01cceaee1860e33d59ebae66adc3a2dc0d"
checksum = "70ac5d832aa16abd7d1def883a8545280c20a60f523a370aa3a9617c2b8550ee"
dependencies = [
"cc",
"cfg-if",
"getrandom",
"libc",
"spin",
"untrusted",
"windows-sys 0.52.0",
]
@ -1787,12 +1786,6 @@ dependencies = [
"serde",
]
[[package]]
name = "spin"
version = "0.9.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67"
[[package]]
name = "sptr"
version = "0.3.2"
@ -1885,11 +1878,11 @@ dependencies = [
[[package]]
name = "thiserror"
version = "2.0.11"
version = "2.0.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d452f284b73e6d76dd36758a0c8684b1d5be31f92b89d07fd5822175732206fc"
checksum = "3467d614147380f2e4e374161426ff399c91084acd2363eaf549172b3d5e60c0"
dependencies = [
"thiserror-impl 2.0.11",
"thiserror-impl 2.0.16",
]
[[package]]
@ -1905,9 +1898,9 @@ dependencies = [
[[package]]
name = "thiserror-impl"
version = "2.0.11"
version = "2.0.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "26afc1baea8a989337eeb52b6e72a039780ce45c3edfcc9c5b9d112feeb173c2"
checksum = "6c5e1be1c48b9172ee610da68fd9cd2770e7a4056cb3fc98710ee6906f0c7960"
dependencies = [
"proc-macro2",
"quote",
@ -2011,6 +2004,12 @@ dependencies = [
"winnow",
]
[[package]]
name = "topological-sort"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ea68304e134ecd095ac6c3574494fc62b909f416c4fca77e440530221e549d3d"
[[package]]
name = "tracing"
version = "0.1.41"
@ -2044,7 +2043,7 @@ dependencies = [
[[package]]
name = "tree-sitter"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"bindgen",
"cc",
@ -2058,7 +2057,7 @@ dependencies = [
[[package]]
name = "tree-sitter-cli"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"ansi_colours",
"anstyle",
@ -2093,6 +2092,7 @@ dependencies = [
"streaming-iterator",
"tempfile",
"tiny_http",
"topological-sort",
"tree-sitter",
"tree-sitter-config",
"tree-sitter-generate",
@ -2110,7 +2110,7 @@ dependencies = [
[[package]]
name = "tree-sitter-config"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"anyhow",
"etcetera",
@ -2120,7 +2120,7 @@ dependencies = [
[[package]]
name = "tree-sitter-generate"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"anyhow",
"heck",
@ -2134,28 +2134,29 @@ dependencies = [
"serde",
"serde_json",
"smallbitvec",
"thiserror 2.0.11",
"thiserror 2.0.16",
"topological-sort",
"tree-sitter",
"url",
]
[[package]]
name = "tree-sitter-highlight"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"regex",
"streaming-iterator",
"thiserror 2.0.11",
"thiserror 2.0.16",
"tree-sitter",
]
[[package]]
name = "tree-sitter-language"
version = "0.1.4"
version = "0.1.5"
[[package]]
name = "tree-sitter-loader"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"anyhow",
"cc",
@ -2178,12 +2179,12 @@ dependencies = [
[[package]]
name = "tree-sitter-tags"
version = "0.25.1"
version = "0.25.10"
dependencies = [
"memchr",
"regex",
"streaming-iterator",
"thiserror 2.0.11",
"thiserror 2.0.16",
"tree-sitter",
]
@ -2391,19 +2392,19 @@ dependencies = [
[[package]]
name = "wasm-encoder"
version = "0.221.2"
version = "0.221.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c17a3bd88f2155da63a1f2fcb8a56377a24f0b6dfed12733bb5f544e86f690c5"
checksum = "dc8444fe4920de80a4fe5ab564fff2ae58b6b73166b89751f8c6c93509da32e5"
dependencies = [
"leb128",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
]
[[package]]
name = "wasmparser"
version = "0.221.2"
version = "0.221.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9845c470a2e10b61dd42c385839cdd6496363ed63b5c9e420b5488b77bd22083"
checksum = "d06bfa36ab3ac2be0dee563380147a5b81ba10dd8885d7fbbc9eb574be67d185"
dependencies = [
"bitflags 2.8.0",
"hashbrown 0.15.2",
@ -2427,13 +2428,13 @@ dependencies = [
[[package]]
name = "wasmprinter"
version = "0.221.2"
version = "0.221.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a80742ff1b9e6d8c231ac7c7247782c6fc5bce503af760bca071811e5fc9ee56"
checksum = "7343c42a97f2926c7819ff81b64012092ae954c5d83ddd30c9fcdefd97d0b283"
dependencies = [
"anyhow",
"termcolor",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
]
[[package]]
@ -2465,7 +2466,7 @@ dependencies = [
"smallvec",
"sptr",
"target-lexicon",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
"wasmtime-asm-macros",
"wasmtime-component-macro",
"wasmtime-cranelift",
@ -2552,7 +2553,7 @@ dependencies = [
"smallvec",
"target-lexicon",
"thiserror 1.0.69",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
"wasmtime-environ",
"wasmtime-versioned-export-macros",
]
@ -2576,7 +2577,7 @@ dependencies = [
"smallvec",
"target-lexicon",
"wasm-encoder",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
"wasmprinter",
]
@ -2644,7 +2645,7 @@ dependencies = [
"gimli",
"object",
"target-lexicon",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
"wasmtime-cranelift",
"wasmtime-environ",
"winch-codegen",
@ -2727,7 +2728,7 @@ dependencies = [
"smallvec",
"target-lexicon",
"thiserror 1.0.69",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
"wasmtime-cranelift",
"wasmtime-environ",
]
@ -2957,9 +2958,9 @@ dependencies = [
[[package]]
name = "wit-parser"
version = "0.221.2"
version = "0.221.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fbe1538eea6ea5ddbe5defd0dc82539ad7ba751e1631e9185d24a931f0a5adc8"
checksum = "896112579ed56b4a538b07a3d16e562d101ff6265c46b515ce0c701eef16b2ac"
dependencies = [
"anyhow",
"id-arena",
@ -2970,7 +2971,7 @@ dependencies = [
"serde_derive",
"serde_json",
"unicode-xid",
"wasmparser 0.221.2",
"wasmparser 0.221.3",
]
[[package]]
@ -3043,7 +3044,16 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1b9b4fd18abc82b8136838da5d50bae7bdea537c574d8dc1a34ed098d6c166f0"
dependencies = [
"byteorder",
"zerocopy-derive",
"zerocopy-derive 0.7.35",
]
[[package]]
name = "zerocopy"
version = "0.8.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0894878a5fa3edfd6da3f88c4805f4c8558e2b996227a3d864f47fe11e38282c"
dependencies = [
"zerocopy-derive 0.8.27",
]
[[package]]
@ -3057,6 +3067,17 @@ dependencies = [
"syn",
]
[[package]]
name = "zerocopy-derive"
version = "0.8.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "88d2b8d9c68ad2b9e4340d7832716a4d21a22a1154777ad56ea55c51a9cf3831"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "zerofrom"
version = "0.1.5"

View file

@ -3,6 +3,7 @@ default-members = ["cli"]
members = [
"cli",
"cli/config",
"cli/generate",
"cli/loader",
"lib",
"lib/language",
@ -13,7 +14,7 @@ members = [
resolver = "2"
[workspace.package]
version = "0.25.1"
version = "0.25.10"
authors = [
"Max Brunsfeld <maxbrunsfeld@gmail.com>",
"Amaan Qureshi <amaanq12@gmail.com>",
@ -59,6 +60,8 @@ missing_errors_doc = "allow"
missing_panics_doc = "allow"
module_name_repetitions = "allow"
multiple_crate_versions = "allow"
needless_for_each = "allow"
obfuscated_if_else = "allow"
option_if_let_else = "allow"
or_fun_call = "allow"
range_plus_one = "allow"
@ -75,6 +78,9 @@ unnecessary_wraps = "allow"
unused_self = "allow"
used_underscore_items = "allow"
[workspace.lints.rust]
mismatched_lifetime_syntaxes = "allow"
[profile.optimize]
inherits = "release"
strip = true # Automatically strip symbols from the binary.
@ -143,15 +149,16 @@ tempfile = "3.15.0"
thiserror = "2.0.11"
tiny_http = "0.12.0"
toml = "0.8.19"
topological-sort = "0.2.2"
unindent = "0.2.3"
url = { version = "2.5.4", features = ["serde"] }
walkdir = "2.5.0"
wasmparser = "0.224.0"
webbrowser = "1.0.3"
tree-sitter = { version = "0.25.1", path = "./lib" }
tree-sitter-generate = { version = "0.25.1", path = "./cli/generate" }
tree-sitter-loader = { version = "0.25.1", path = "./cli/loader" }
tree-sitter-config = { version = "0.25.1", path = "./cli/config" }
tree-sitter-highlight = { version = "0.25.1", path = "./highlight" }
tree-sitter-tags = { version = "0.25.1", path = "./tags" }
tree-sitter = { version = "0.25.10", path = "./lib" }
tree-sitter-generate = { version = "0.25.10", path = "./cli/generate" }
tree-sitter-loader = { version = "0.25.10", path = "./cli/loader" }
tree-sitter-config = { version = "0.25.10", path = "./cli/config" }
tree-sitter-highlight = { version = "0.25.10", path = "./highlight" }
tree-sitter-tags = { version = "0.25.10", path = "./tags" }

View file

@ -2,7 +2,7 @@ ifeq ($(OS),Windows_NT)
$(error Windows is not supported)
endif
VERSION := 0.25.1
VERSION := 0.25.10
DESCRIPTION := An incremental parsing system for programming tools
HOMEPAGE_URL := https://tree-sitter.github.io/tree-sitter/
@ -27,7 +27,7 @@ OBJ := $(SRC:.c=.o)
ARFLAGS := rcs
CFLAGS ?= -O3 -Wall -Wextra -Wshadow -Wpedantic -Werror=incompatible-pointer-types
override CFLAGS += -std=c11 -fPIC -fvisibility=hidden
override CFLAGS += -D_POSIX_C_SOURCE=200112L -D_DEFAULT_SOURCE
override CFLAGS += -D_POSIX_C_SOURCE=200112L -D_DEFAULT_SOURCE -D_DARWIN_C_SOURCE
override CFLAGS += -Ilib/src -Ilib/src/wasm -Ilib/include
# ABI versioning
@ -106,15 +106,15 @@ test-wasm:
lint:
cargo update --workspace --locked --quiet
cargo check --workspace --all-targets
cargo +nightly fmt --all --check
cargo +nightly clippy --workspace --all-targets -- -D warnings
cargo fmt --all --check
cargo clippy --workspace --all-targets -- -D warnings
lint-web:
npm --prefix lib/binding_web ci
npm --prefix lib/binding_web run lint
format:
cargo +nightly fmt --all
cargo fmt --all
changelog:
@git-cliff --config .github/cliff.toml --prepend CHANGELOG.md --latest --github-token $(shell gh auth token)

View file

@ -27,6 +27,7 @@ let package = Package(
.headerSearchPath("src"),
.define("_POSIX_C_SOURCE", to: "200112L"),
.define("_DEFAULT_SOURCE"),
.define("_DARWIN_C_SOURCE"),
]),
],
cLanguageStandard: .c11

198
build.zig
View file

@ -1,116 +1,122 @@
const std = @import("std");
pub fn build(b: *std.Build) !void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const wasm = b.option(bool, "enable-wasm", "Enable Wasm support") orelse false;
const shared = b.option(bool, "build-shared", "Build a shared library") orelse false;
const amalgamated = b.option(bool, "amalgamated", "Build using an amalgamated source") orelse false;
const wasm = b.option(bool, "enable-wasm", "Enable Wasm support") orelse false;
const shared = b.option(bool, "build-shared", "Build a shared library") orelse false;
const amalgamated = b.option(bool, "amalgamated", "Build using an amalgamated source") orelse false;
const lib: *std.Build.Step.Compile = if (!shared) b.addStaticLibrary(.{
.name = "tree-sitter",
.target = target,
.optimize = optimize,
.link_libc = true,
}) else b.addSharedLibrary(.{
.name = "tree-sitter",
.pic = true,
.target = target,
.optimize = optimize,
.link_libc = true,
});
if (amalgamated) {
lib.addCSourceFile(.{
.file = b.path("lib/src/lib.c"),
.flags = &.{"-std=c11"},
const lib: *std.Build.Step.Compile = b.addLibrary(.{
.name = "tree-sitter",
.linkage = if (shared) .dynamic else .static,
.root_module = b.createModule(.{
.target = target,
.optimize = optimize,
.link_libc = true,
.pic = if (shared) true else null,
}),
});
} else {
lib.addCSourceFiles(.{
.root = b.path("lib/src"),
.files = try findSourceFiles(b),
.flags = &.{"-std=c11"},
});
}
lib.addIncludePath(b.path("lib/include"));
lib.addIncludePath(b.path("lib/src"));
lib.addIncludePath(b.path("lib/src/wasm"));
lib.root_module.addCMacro("_POSIX_C_SOURCE", "200112L");
lib.root_module.addCMacro("_DEFAULT_SOURCE", "");
if (wasm) {
if (b.lazyDependency(wasmtimeDep(target.result), .{})) |wasmtime| {
lib.root_module.addCMacro("TREE_SITTER_FEATURE_WASM", "");
lib.addSystemIncludePath(wasmtime.path("include"));
lib.addLibraryPath(wasmtime.path("lib"));
lib.linkSystemLibrary("wasmtime");
if (amalgamated) {
lib.addCSourceFile(.{
.file = b.path("lib/src/lib.c"),
.flags = &.{"-std=c11"},
});
} else {
const files = try findSourceFiles(b);
defer b.allocator.free(files);
lib.addCSourceFiles(.{
.root = b.path("lib/src"),
.files = files,
.flags = &.{"-std=c11"},
});
}
}
lib.installHeadersDirectory(b.path("lib/include"), ".", .{});
lib.addIncludePath(b.path("lib/include"));
lib.addIncludePath(b.path("lib/src"));
lib.addIncludePath(b.path("lib/src/wasm"));
b.installArtifact(lib);
lib.root_module.addCMacro("_POSIX_C_SOURCE", "200112L");
lib.root_module.addCMacro("_DEFAULT_SOURCE", "");
lib.root_module.addCMacro("_DARWIN_C_SOURCE", "");
if (wasm) {
if (b.lazyDependency(wasmtimeDep(target.result), .{})) |wasmtime| {
lib.root_module.addCMacro("TREE_SITTER_FEATURE_WASM", "");
lib.addSystemIncludePath(wasmtime.path("include"));
lib.addLibraryPath(wasmtime.path("lib"));
if (shared) lib.linkSystemLibrary("wasmtime");
}
}
lib.installHeadersDirectory(b.path("lib/include"), ".", .{});
b.installArtifact(lib);
}
fn wasmtimeDep(target: std.Target) []const u8 {
const arch = target.cpu.arch;
const os = target.os.tag;
const abi = target.abi;
return switch (os) {
.linux => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_linux",
.musl => "wasmtime_c_api_x86_64_musl",
.android => "wasmtime_c_api_x86_64_android",
else => null
},
.aarch64 => switch (abi) {
.gnu => "wasmtime_c_api_aarch64_linux",
.android => "wasmtime_c_api_aarch64_android",
else => null
},
.s390x => "wasmtime_c_api_s390x_linux",
.riscv64 => "wasmtime_c_api_riscv64gc_linux",
else => null
},
.windows => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_mingw",
.msvc => "wasmtime_c_api_x86_64_windows",
else => null
},
else => null
},
.macos => switch (arch) {
.x86_64 => "wasmtime_c_api_x86_64_macos",
.aarch64 => "wasmtime_c_api_aarch64_macos",
else => null
},
else => null
} orelse std.debug.panic(
"Unsupported target for wasmtime: {s}-{s}-{s}",
.{ @tagName(arch), @tagName(os), @tagName(abi) }
);
/// Get the name of the wasmtime dependency for this target.
pub fn wasmtimeDep(target: std.Target) []const u8 {
const arch = target.cpu.arch;
const os = target.os.tag;
const abi = target.abi;
return switch (os) {
.linux => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_linux",
.musl => "wasmtime_c_api_x86_64_musl",
.android => "wasmtime_c_api_x86_64_android",
else => null,
},
.aarch64 => switch (abi) {
.gnu => "wasmtime_c_api_aarch64_linux",
.android => "wasmtime_c_api_aarch64_android",
else => null,
},
.s390x => "wasmtime_c_api_s390x_linux",
.riscv64 => "wasmtime_c_api_riscv64gc_linux",
else => null,
},
.windows => switch (arch) {
.x86_64 => switch (abi) {
.gnu => "wasmtime_c_api_x86_64_mingw",
.msvc => "wasmtime_c_api_x86_64_windows",
else => null,
},
.aarch64 => switch (abi) {
.msvc => "wasmtime_c_api_aarch64_windows",
else => null,
},
else => null,
},
.macos => switch (arch) {
.x86_64 => "wasmtime_c_api_x86_64_macos",
.aarch64 => "wasmtime_c_api_aarch64_macos",
else => null,
},
else => null,
} orelse std.debug.panic(
"Unsupported target for wasmtime: {s}-{s}-{s}",
.{ @tagName(arch), @tagName(os), @tagName(abi) },
);
}
fn findSourceFiles(b: *std.Build) ![]const []const u8 {
var sources = std.ArrayList([]const u8).init(b.allocator);
var sources: std.ArrayListUnmanaged([]const u8) = .empty;
var dir = try b.build_root.handle.openDir("lib/src", .{ .iterate = true });
var iter = dir.iterate();
defer dir.close();
var dir = try b.build_root.handle.openDir("lib/src", .{ .iterate = true });
var iter = dir.iterate();
defer dir.close();
while (try iter.next()) |entry| {
if (entry.kind != .file) continue;
const file = entry.name;
const ext = std.fs.path.extension(file);
if (std.mem.eql(u8, ext, ".c") and !std.mem.eql(u8, file, "lib.c")) {
try sources.append(b.dupe(file));
while (try iter.next()) |entry| {
if (entry.kind != .file) continue;
const file = entry.name;
const ext = std.fs.path.extension(file);
if (std.mem.eql(u8, ext, ".c") and !std.mem.eql(u8, file, "lib.c")) {
try sources.append(b.allocator, b.dupe(file));
}
}
}
return sources.items;
return sources.toOwnedSlice(b.allocator);
}

View file

@ -1,69 +1,76 @@
.{
.name = "tree-sitter",
.version = "0.25.1",
.paths = .{
"build.zig",
"build.zig.zon",
"lib/src",
"lib/include",
"README.md",
"LICENSE",
},
.dependencies = .{
.wasmtime_c_api_aarch64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-android-c-api.tar.xz",
.hash = "12204c77979ad8291c6e395d695a824fb053ffdfeb2cc21de95fffb09f77d77188d1",
.lazy = true,
.name = .tree_sitter,
.fingerprint = 0x841224b447ac0d4f,
.version = "0.25.9",
.minimum_zig_version = "0.14.1",
.paths = .{
"build.zig",
"build.zig.zon",
"lib/src",
"lib/include",
"README.md",
"LICENSE",
},
.wasmtime_c_api_aarch64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-linux-c-api.tar.xz",
.hash = "12203a8e3d823490186fb1e230d54f575148713088e914926305ee5678790b731bba",
.lazy = true,
.dependencies = .{
.wasmtime_c_api_aarch64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-android-c-api.tar.xz",
.hash = "N-V-__8AAC3KCQZMd5ea2CkcbjldaVqCT7BT_9_rLMId6V__",
.lazy = true,
},
.wasmtime_c_api_aarch64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-linux-c-api.tar.xz",
.hash = "N-V-__8AAGUY3gU6jj2CNJAYb7HiMNVPV1FIcTCI6RSSYwXu",
.lazy = true,
},
.wasmtime_c_api_aarch64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-macos-c-api.tar.xz",
.hash = "N-V-__8AAM1GMARD6LGQebhVsSZ0uePUoo3Fw5nEO2L764vf",
.lazy = true,
},
.wasmtime_c_api_aarch64_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-windows-c-api.zip",
.hash = "N-V-__8AAH8a_wQ7oAeVVsaJcoOZhKTMkHIBc_XjDyLlHp2x",
.lazy = true,
},
.wasmtime_c_api_riscv64gc_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-riscv64gc-linux-c-api.tar.xz",
.hash = "N-V-__8AAN2cuQadBwMc8zJxv0sMY99Ae1Nc1dZcZAK9b4DZ",
.lazy = true,
},
.wasmtime_c_api_s390x_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-s390x-linux-c-api.tar.xz",
.hash = "N-V-__8AAPevngYz99mwT0KQY9my2ax1p6APzgLEJeV4II9U",
.lazy = true,
},
.wasmtime_c_api_x86_64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-android-c-api.tar.xz",
.hash = "N-V-__8AABHIEgaTyzPfjgnnCy0dwJiXoDiJFblCkYOJsQvy",
.lazy = true,
},
.wasmtime_c_api_x86_64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-linux-c-api.tar.xz",
.hash = "N-V-__8AALUN5AWSEDRulL9u-OJJ-l0_GoT5UFDtGWZayEIq",
.lazy = true,
},
.wasmtime_c_api_x86_64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-macos-c-api.tar.xz",
.hash = "N-V-__8AANUeXwSPh13TqJCSSFdi87GEcHs8zK6FqE4v_TjB",
.lazy = true,
},
.wasmtime_c_api_x86_64_mingw = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-mingw-c-api.zip",
.hash = "N-V-__8AALundgW-p1ffOnd7bsYyL8SY5OziDUZu7cXio2EL",
.lazy = true,
},
.wasmtime_c_api_x86_64_musl = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-musl-c-api.tar.xz",
.hash = "N-V-__8AALMZ5wXJWW5qY-3MMjTAYR0MusckvzCsmg-69ALH",
.lazy = true,
},
.wasmtime_c_api_x86_64_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-windows-c-api.zip",
.hash = "N-V-__8AAG-uVQVEDMsB1ymJzxpHcoiXo1_I3TFnPM5Zjy1i",
.lazy = true,
},
},
.wasmtime_c_api_aarch64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-aarch64-macos-c-api.tar.xz",
.hash = "122043e8b19079b855b12674b9e3d4a28dc5c399c43b62fbeb8bdf0fdb4ef2d1d38c",
.lazy = true,
},
.wasmtime_c_api_riscv64gc_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-riscv64gc-linux-c-api.tar.xz",
.hash = "12209d07031cf33271bf4b0c63df407b535cd5d65c6402bd6f80d99de439d6feb89b",
.lazy = true,
},
.wasmtime_c_api_s390x_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-s390x-linux-c-api.tar.xz",
.hash = "122033f7d9b04f429063d9b2d9ac75a7a00fce02c425e578208f54ddc40edaa1e355",
.lazy = true,
},
.wasmtime_c_api_x86_64_android = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-android-c-api.tar.xz",
.hash = "122093cb33df8e09e70b2d1dc09897a0388915b942918389b10bf23f9684bdb6f047",
.lazy = true,
},
.wasmtime_c_api_x86_64_linux = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-linux-c-api.tar.xz",
.hash = "12209210346e94bf6ef8e249fa5d3f1a84f95050ed19665ac8422a15b5f2246d83af",
.lazy = true,
},
.wasmtime_c_api_x86_64_macos = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-macos-c-api.tar.xz",
.hash = "12208f875dd3a89092485762f3b184707b3cccae85a84e2ffd38c138cc3a3fd90447",
.lazy = true,
},
.wasmtime_c_api_x86_64_mingw = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-mingw-c-api.zip",
.hash = "1220bea757df3a777b6ec6322fc498e4ece20d466eedc5e2a3610b338849553cd94d",
.lazy = true,
},
.wasmtime_c_api_x86_64_musl = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-musl-c-api.tar.xz",
.hash = "1220c9596e6a63edcc3234c0611d0cbac724bf30ac9a0fbaf402c7da649b278b1322",
.lazy = true,
},
.wasmtime_c_api_x86_64_windows = .{
.url = "https://github.com/bytecodealliance/wasmtime/releases/download/v29.0.1/wasmtime-v29.0.1-x86_64-windows-c-api.zip",
.hash = "1220440ccb01d72989cf1a47728897a35fc8dd31673cce598f2d62c58e2c3228b0ed",
.lazy = true,
},
}
}

View file

@ -59,6 +59,7 @@ similar.workspace = true
smallbitvec.workspace = true
streaming-iterator.workspace = true
tiny_http.workspace = true
topological-sort.workspace = true
url.workspace = true
walkdir.workspace = true
wasmparser.workspace = true

View file

@ -112,7 +112,7 @@ fn main() {
parse(path, max_path_length, |source| {
Query::new(&language, str::from_utf8(source).unwrap())
.with_context(|| format!("Query file path: {path:?}"))
.with_context(|| format!("Query file path: {}", path.display()))
.expect("Failed to parse query");
});
}
@ -201,7 +201,7 @@ fn parse(path: &Path, max_path_length: usize, mut action: impl FnMut(&[u8])) ->
);
let source_code = fs::read(path)
.with_context(|| format!("Failed to read {path:?}"))
.with_context(|| format!("Failed to read {}", path.display()))
.unwrap();
let time = Instant::now();
for _ in 0..*REPETITION_COUNT {
@ -221,6 +221,6 @@ fn get_language(path: &Path) -> Language {
let src_path = GRAMMARS_DIR.join(path).join("src");
TEST_LOADER
.load_language_at_path(CompileConfig::new(&src_path, None, None))
.with_context(|| format!("Failed to load language at path {src_path:?}"))
.with_context(|| format!("Failed to load language at path {}", src_path.display()))
.unwrap()
}

View file

@ -60,8 +60,6 @@ fn web_playground_files_present() -> bool {
paths.iter().all(|p| Path::new(p).exists())
}
// When updating this function, don't forget to also update generate/build.rs which has a
// near-identical function.
fn read_git_sha() -> Option<String> {
let crate_path = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());

View file

@ -4,7 +4,8 @@
"description": "Eslint configuration for Tree-sitter grammar files",
"repository": {
"type": "git",
"url": "git+https://github.com/tree-sitter/tree-sitter.git"
"url": "git+https://github.com/tree-sitter/tree-sitter.git",
"directory": "crates/cli/eslint"
},
"license": "MIT",
"author": "Amaan Qureshi <amaanq12@gmail.com>",

View file

@ -29,6 +29,9 @@ serde.workspace = true
serde_json.workspace = true
smallbitvec.workspace = true
thiserror.workspace = true
url.workspace = true
topological-sort.workspace = true
tree-sitter.workspace = true
[target.'cfg(windows)'.dependencies]
url.workspace = true

View file

@ -1,32 +0,0 @@
use std::{env, path::PathBuf, process::Command};
fn main() {
if let Some(git_sha) = read_git_sha() {
println!("cargo:rustc-env=BUILD_SHA={git_sha}");
}
}
// This is copied from the build.rs in parent directory. This should be updated if the
// parent build.rs gets fixes.
fn read_git_sha() -> Option<String> {
let crate_path = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
if !crate_path
.parent()?
.parent()
.is_some_and(|p| p.join(".git").exists())
{
return None;
}
Command::new("git")
.args(["rev-parse", "HEAD"])
.current_dir(crate_path)
.output()
.map_or(None, |output| {
if !output.status.success() {
return None;
}
Some(String::from_utf8_lossy(&output.stdout).to_string())
})
}

View file

@ -312,6 +312,12 @@ impl<'a> ParseTableBuilder<'a> {
}
}
let non_terminal_sets_len = non_terminal_extra_item_sets_by_first_terminal.len();
self.non_terminal_extra_states
.reserve(non_terminal_sets_len);
self.parse_state_info_by_id.reserve(non_terminal_sets_len);
self.parse_table.states.reserve(non_terminal_sets_len);
self.parse_state_queue.reserve(non_terminal_sets_len);
// Add a state for each starting terminal of a non-terminal extra rule.
for (terminal, item_set) in non_terminal_extra_item_sets_by_first_terminal {
if terminal.is_non_terminal() {
@ -320,9 +326,10 @@ impl<'a> ParseTableBuilder<'a> {
))?;
}
self.non_terminal_extra_states
.push((terminal, self.parse_table.states.len()));
self.add_parse_state(&Vec::new(), &Vec::new(), item_set);
// Add the parse state, and *then* push the terminal and the state id into the
// list of nonterminal extra states
let state_id = self.add_parse_state(&Vec::new(), &Vec::new(), item_set);
self.non_terminal_extra_states.push((terminal, state_id));
}
while let Some(entry) = self.parse_state_queue.pop_front() {
@ -908,7 +915,7 @@ impl<'a> ParseTableBuilder<'a> {
let get_rule_names = |items: &[&ParseItem]| -> Vec<String> {
let mut last_rule_id = None;
let mut result = Vec::new();
let mut result = Vec::with_capacity(items.len());
for item in items {
if last_rule_id == Some(item.variable_index) {
continue;

View file

@ -529,7 +529,7 @@ globalThis.optional = optional;
globalThis.prec = prec;
globalThis.repeat = repeat;
globalThis.repeat1 = repeat1;
global.reserved = reserved;
globalThis.reserved = reserved;
globalThis.seq = seq;
globalThis.sym = sym;
globalThis.token = token;

View file

@ -27,7 +27,7 @@ mod tables;
use build_tables::build_tables;
pub use build_tables::ParseTableBuilderError;
use grammars::InputGrammar;
pub use node_types::VariableInfoError;
pub use node_types::{SuperTypeCycleError, VariableInfoError};
use parse_grammar::parse_grammar;
pub use parse_grammar::ParseGrammarError;
use prepare_grammar::prepare_grammar;
@ -70,6 +70,8 @@ pub enum GenerateError {
BuildTables(#[from] ParseTableBuilderError),
#[error(transparent)]
ParseVersion(#[from] ParseVersionError),
#[error(transparent)]
SuperTypeCycle(#[from] SuperTypeCycleError),
}
impl From<std::io::Error> for GenerateError {
@ -183,7 +185,8 @@ pub fn generate_parser_in_directory(
if grammar_path.file_name().unwrap() != "grammar.json" {
fs::write(src_path.join("grammar.json"), &grammar_json).map_err(|e| {
GenerateError::IO(format!(
"Failed to write grammar.json to {src_path:?} -- {e}"
"Failed to write grammar.json to {} -- {e}",
src_path.display()
))
})?;
}
@ -249,7 +252,7 @@ fn generate_parser_for_grammar_with_opts(
&lexical_grammar,
&simple_aliases,
&variable_info,
);
)?;
let supertype_symbol_map =
node_types::get_supertype_symbol_map(&syntax_grammar, &simple_aliases, &variable_info);
let tables = build_tables(

View file

@ -1,7 +1,4 @@
use std::{
cmp::Ordering,
collections::{BTreeMap, HashMap, HashSet},
};
use std::collections::{BTreeMap, HashMap, HashSet};
use anyhow::Result;
use serde::Serialize;
@ -444,12 +441,33 @@ pub fn get_supertype_symbol_map(
supertype_symbol_map
}
pub type SuperTypeCycleResult<T> = Result<T, SuperTypeCycleError>;
#[derive(Debug, Error, Serialize)]
pub struct SuperTypeCycleError {
items: Vec<String>,
}
impl std::fmt::Display for SuperTypeCycleError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Dependency cycle detected in node types:")?;
for (i, item) in self.items.iter().enumerate() {
write!(f, " {item}")?;
if i < self.items.len() - 1 {
write!(f, ",")?;
}
}
Ok(())
}
}
pub fn generate_node_types_json(
syntax_grammar: &SyntaxGrammar,
lexical_grammar: &LexicalGrammar,
default_aliases: &AliasMap,
variable_info: &[VariableInfo],
) -> Vec<NodeInfoJSON> {
) -> SuperTypeCycleResult<Vec<NodeInfoJSON>> {
let mut node_types_json = BTreeMap::new();
let child_type_to_node_type = |child_type: &ChildType| match child_type {
@ -507,6 +525,31 @@ pub fn generate_node_types_json(
let aliases_by_symbol = get_aliases_by_symbol(syntax_grammar, default_aliases);
let empty = HashSet::new();
let extra_names = syntax_grammar
.extra_symbols
.iter()
.flat_map(|symbol| {
aliases_by_symbol
.get(symbol)
.unwrap_or(&empty)
.iter()
.map(|alias| {
alias.as_ref().map_or(
match symbol.kind {
SymbolType::NonTerminal => &syntax_grammar.variables[symbol.index].name,
SymbolType::Terminal => &lexical_grammar.variables[symbol.index].name,
SymbolType::External => {
&syntax_grammar.external_tokens[symbol.index].name
}
_ => unreachable!(),
},
|alias| &alias.value,
)
})
})
.collect::<HashSet<_>>();
let mut subtype_map = Vec::new();
for (i, info) in variable_info.iter().enumerate() {
let symbol = Symbol::non_terminal(i);
@ -519,7 +562,7 @@ pub fn generate_node_types_json(
kind: variable.name.clone(),
named: true,
root: false,
extra: false,
extra: extra_names.contains(&variable.name),
fields: None,
children: None,
subtypes: None,
@ -563,7 +606,7 @@ pub fn generate_node_types_json(
kind: kind.clone(),
named: is_named,
root: i == 0,
extra: false,
extra: extra_names.contains(&kind),
fields: Some(BTreeMap::new()),
children: None,
subtypes: None,
@ -602,15 +645,33 @@ pub fn generate_node_types_json(
}
}
// Sort the subtype map so that subtypes are listed before their supertypes.
subtype_map.sort_by(|a, b| {
if b.1.contains(&a.0) {
Ordering::Less
} else if a.1.contains(&b.0) {
Ordering::Greater
} else {
Ordering::Equal
// Sort the subtype map topologically so that subtypes are listed before their supertypes.
let mut sorted_kinds = Vec::with_capacity(subtype_map.len());
let mut top_sort = topological_sort::TopologicalSort::<String>::new();
for (supertype, subtypes) in &subtype_map {
for subtype in subtypes {
top_sort.add_dependency(subtype.kind.clone(), supertype.kind.clone());
}
}
loop {
let mut next_kinds = top_sort.pop_all();
match (next_kinds.is_empty(), top_sort.is_empty()) {
(true, true) => break,
(true, false) => {
let mut items = top_sort.collect::<Vec<String>>();
items.sort();
return Err(SuperTypeCycleError { items });
}
(false, _) => {
next_kinds.sort();
sorted_kinds.extend(next_kinds);
}
}
}
subtype_map.sort_by(|a, b| {
let a_idx = sorted_kinds.iter().position(|n| n.eq(&a.0.kind)).unwrap();
let b_idx = sorted_kinds.iter().position(|n| n.eq(&b.0.kind)).unwrap();
a_idx.cmp(&b_idx)
});
for node_type_json in node_types_json.values_mut() {
@ -634,7 +695,6 @@ pub fn generate_node_types_json(
let mut anonymous_node_types = Vec::new();
let empty = HashSet::new();
let regular_tokens = lexical_grammar
.variables
.iter()
@ -668,29 +728,6 @@ pub fn generate_node_types_json(
})
})
});
let extra_names = syntax_grammar
.extra_symbols
.iter()
.flat_map(|symbol| {
aliases_by_symbol
.get(symbol)
.unwrap_or(&empty)
.iter()
.map(|alias| {
alias.as_ref().map_or(
match symbol.kind {
SymbolType::NonTerminal => &syntax_grammar.variables[symbol.index].name,
SymbolType::Terminal => &lexical_grammar.variables[symbol.index].name,
SymbolType::External => {
&syntax_grammar.external_tokens[symbol.index].name
}
_ => unreachable!(),
},
|alias| &alias.value,
)
})
})
.collect::<HashSet<_>>();
for (name, kind) in regular_tokens.chain(external_tokens) {
match kind {
@ -743,7 +780,7 @@ pub fn generate_node_types_json(
.then_with(|| a.kind.cmp(&b.kind))
});
result.dedup();
result
Ok(result)
}
fn process_supertypes(info: &mut FieldInfoJSON, subtype_map: &[(NodeTypeJSON, Vec<NodeTypeJSON>)]) {
@ -829,7 +866,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(node_types.len(), 3);
@ -918,7 +956,9 @@ mod tests {
},
// This rule is not reachable from the start symbol, but
// it is reachable from the 'extra_symbols' so it
// should be present in the node_types
// should be present in the node_types.
// But because it's only a literal, it will get replaced by
// a lexical variable.
Variable {
name: "v3".to_string(),
kind: VariableType::Named,
@ -926,7 +966,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(node_types.len(), 4);
@ -1007,6 +1048,118 @@ mod tests {
);
}
#[test]
fn test_node_types_deeper_extras() {
let node_types = get_node_types(&InputGrammar {
extra_symbols: vec![Rule::named("v3")],
variables: vec![
Variable {
name: "v1".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![
Rule::field("f1".to_string(), Rule::named("v2")),
Rule::field("f2".to_string(), Rule::string(";")),
]),
},
Variable {
name: "v2".to_string(),
kind: VariableType::Named,
rule: Rule::string("x"),
},
// This rule is not reachable from the start symbol, but
// it is reachable from the 'extra_symbols' so it
// should be present in the node_types.
// Because it is not just a literal, it won't get replaced
// by a lexical variable.
Variable {
name: "v3".to_string(),
kind: VariableType::Named,
rule: Rule::seq(vec![Rule::string("y"), Rule::repeat(Rule::string("z"))]),
},
],
..Default::default()
})
.unwrap();
assert_eq!(node_types.len(), 6);
assert_eq!(
node_types[0],
NodeInfoJSON {
kind: "v1".to_string(),
named: true,
root: true,
extra: false,
subtypes: None,
children: None,
fields: Some(
vec![
(
"f1".to_string(),
FieldInfoJSON {
multiple: false,
required: true,
types: vec![NodeTypeJSON {
kind: "v2".to_string(),
named: true,
}]
}
),
(
"f2".to_string(),
FieldInfoJSON {
multiple: false,
required: true,
types: vec![NodeTypeJSON {
kind: ";".to_string(),
named: false,
}]
}
),
]
.into_iter()
.collect()
)
}
);
assert_eq!(
node_types[1],
NodeInfoJSON {
kind: "v3".to_string(),
named: true,
root: false,
extra: true,
subtypes: None,
children: None,
fields: Some(BTreeMap::default())
}
);
assert_eq!(
node_types[2],
NodeInfoJSON {
kind: ";".to_string(),
named: false,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
}
);
assert_eq!(
node_types[3],
NodeInfoJSON {
kind: "v2".to_string(),
named: true,
root: false,
extra: false,
subtypes: None,
children: None,
fields: None
}
);
}
#[test]
fn test_node_types_with_supertypes() {
let node_types = get_node_types(&InputGrammar {
@ -1038,7 +1191,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types[0],
@ -1127,7 +1281,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types[0],
@ -1212,7 +1367,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types[0],
@ -1286,7 +1442,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(node_types.iter().find(|t| t.kind == "foo_identifier"), None);
assert_eq!(
@ -1342,7 +1499,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types[0],
@ -1391,7 +1549,8 @@ mod tests {
]),
}],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types,
@ -1439,7 +1598,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
&node_types
@ -1558,7 +1718,8 @@ mod tests {
},
],
..Default::default()
});
})
.unwrap();
assert_eq!(
node_types.iter().map(|n| &n.kind).collect::<Vec<_>>(),
@ -1885,7 +2046,7 @@ mod tests {
);
}
fn get_node_types(grammar: &InputGrammar) -> Vec<NodeInfoJSON> {
fn get_node_types(grammar: &InputGrammar) -> SuperTypeCycleResult<Vec<NodeInfoJSON>> {
let (syntax_grammar, lexical_grammar, _, default_aliases) =
prepare_grammar(grammar).unwrap();
let variable_info =

View file

@ -1,6 +1,7 @@
use std::collections::HashSet;
use anyhow::Result;
use regex::Regex;
use serde::{Deserialize, Serialize};
use serde_json::{Map, Value};
use thiserror::Error;
@ -238,13 +239,14 @@ pub(crate) fn parse_grammar(input: &str) -> ParseGrammarResult<InputGrammar> {
let mut in_progress = HashSet::new();
for (name, rule) in &rules {
if !variable_is_used(
&rules,
&extra_symbols,
&external_tokens,
name,
&mut in_progress,
) && grammar_json.word.as_ref().is_none_or(|w| w != name)
if grammar_json.word.as_ref().is_none_or(|w| w != name)
&& !variable_is_used(
&rules,
&extra_symbols,
&external_tokens,
name,
&mut in_progress,
)
{
grammar_json.conflicts.retain(|r| !r.contains(name));
grammar_json.supertypes.retain(|r| r != name);
@ -261,6 +263,27 @@ pub(crate) fn parse_grammar(input: &str) -> ParseGrammarResult<InputGrammar> {
});
continue;
}
if extra_symbols
.iter()
.any(|r| rule_is_referenced(r, name, false))
{
let inner_rule = if let Rule::Metadata { rule, .. } = rule {
rule
} else {
rule
};
let matches_empty = match inner_rule {
Rule::String(rule_str) => rule_str.is_empty(),
Rule::Pattern(ref value, _) => Regex::new(value)
.map(|reg| reg.is_match(""))
.unwrap_or(false),
_ => false,
};
if matches_empty {
eprintln!("Warning: Named extra rule `{name}` matches the empty string. Inline this to avoid infinite loops while parsing.");
}
}
variables.push(Variable {
name: name.clone(),
kind: VariableType::Named,
@ -272,12 +295,11 @@ pub(crate) fn parse_grammar(input: &str) -> ParseGrammarResult<InputGrammar> {
.reserved
.into_iter()
.map(|(name, rule_values)| {
let mut reserved_words = Vec::new();
let Value::Array(rule_values) = rule_values else {
Err(ParseGrammarError::InvalidReservedWordSet)?
};
let mut reserved_words = Vec::with_capacity(rule_values.len());
for value in rule_values {
reserved_words.push(parse_rule(serde_json::from_value(value)?, false)?);
}

View file

@ -90,7 +90,7 @@ pub fn expand_tokens(mut grammar: ExtractedLexicalGrammar) -> ExpandTokensResult
Rule::repeat(Rule::choice(grammar.separators))
};
let mut variables = Vec::new();
let mut variables = Vec::with_capacity(grammar.variables.len());
for (i, variable) in grammar.variables.into_iter().enumerate() {
if variable.rule.is_empty() {
Err(ExpandTokensError::EmptyString(variable.name.clone()))?;
@ -195,7 +195,7 @@ impl NfaBuilder {
Ok(!s.is_empty())
}
Rule::Choice(elements) => {
let mut alternative_state_ids = Vec::new();
let mut alternative_state_ids = Vec::with_capacity(elements.len());
for element in elements {
if self.expand_rule(element, next_state_id)? {
alternative_state_ids.push(self.nfa.last_state_id());
@ -338,7 +338,7 @@ impl NfaBuilder {
Ok(result)
}
HirKind::Alternation(alternations) => {
let mut alternative_state_ids = Vec::new();
let mut alternative_state_ids = Vec::with_capacity(alternations.len());
for hir in alternations {
if self.expand_regex(hir, next_state_id)? {
alternative_state_ids.push(self.nfa.last_state_id());

View file

@ -26,10 +26,34 @@ unless they are used only as the grammar's start rule.
ExternalTokenNonTerminal(String),
#[error("Non-symbol rules cannot be used as external tokens")]
NonSymbolExternalToken,
#[error("Non-terminal symbol '{0}' cannot be used as the word token, because its rule is duplicated in '{1}'")]
NonTerminalWordToken(String, String),
#[error("Reserved words must be tokens")]
NonTokenReservedWord,
#[error(transparent)]
WordToken(NonTerminalWordTokenError),
#[error("Reserved word '{0}' must be a token")]
NonTokenReservedWord(String),
}
#[derive(Debug, Error, Serialize)]
pub struct NonTerminalWordTokenError {
pub symbol_name: String,
pub conflicting_symbol_name: Option<String>,
}
impl std::fmt::Display for NonTerminalWordTokenError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"Non-terminal symbol '{}' cannot be used as the word token",
self.symbol_name
)?;
if let Some(conflicting_name) = &self.conflicting_symbol_name {
writeln!(
f,
", because its rule is duplicated in '{conflicting_name}'",
)
} else {
writeln!(f)
}
}
}
pub(super) fn extract_tokens(
@ -62,7 +86,7 @@ pub(super) fn extract_tokens(
// that pointed to that variable will need to be updated to point to the
// variable in the lexical grammar. Symbols that pointed to later variables
// will need to have their indices decremented.
let mut variables = Vec::new();
let mut variables = Vec::with_capacity(grammar.variables.len());
let mut symbol_replacer = SymbolReplacer {
replacements: HashMap::new(),
};
@ -162,23 +186,23 @@ pub(super) fn extract_tokens(
let token = symbol_replacer.replace_symbol(token);
if token.is_non_terminal() {
let word_token_variable = &variables[token.index];
let conflicting_variable = variables
let conflicting_symbol_name = variables
.iter()
.enumerate()
.find(|(i, v)| *i != token.index && v.rule == word_token_variable.rule)
.expect("Failed to find a variable with the same rule as the word token");
.map(|(_, v)| v.name.clone());
Err(ExtractTokensError::NonTerminalWordToken(
word_token_variable.name.clone(),
conflicting_variable.1.name.clone(),
))?;
Err(ExtractTokensError::WordToken(NonTerminalWordTokenError {
symbol_name: word_token_variable.name.clone(),
conflicting_symbol_name,
}))?;
}
word_token = Some(token);
}
let mut reserved_word_contexts = Vec::new();
let mut reserved_word_contexts = Vec::with_capacity(grammar.reserved_word_sets.len());
for reserved_word_context in grammar.reserved_word_sets {
let mut reserved_words = Vec::new();
let mut reserved_words = Vec::with_capacity(reserved_word_contexts.len());
for reserved_rule in reserved_word_context.reserved_words {
if let Rule::Symbol(symbol) = reserved_rule {
reserved_words.push(symbol_replacer.replace_symbol(symbol));
@ -188,7 +212,12 @@ pub(super) fn extract_tokens(
{
reserved_words.push(Symbol::terminal(index));
} else {
Err(ExtractTokensError::NonTokenReservedWord)?;
let token_name = match &reserved_rule {
Rule::String(s) => s.clone(),
Rule::Pattern(p, _) => p.clone(),
_ => "unknown".to_string(),
};
Err(ExtractTokensError::NonTokenReservedWord(token_name))?;
}
}
reserved_word_contexts.push(ReservedWordContext {

View file

@ -57,8 +57,9 @@ impl RuleFlattener {
}
fn flatten_variable(&mut self, variable: Variable) -> FlattenGrammarResult<SyntaxVariable> {
let mut productions = Vec::new();
for rule in extract_choices(variable.rule) {
let choices = extract_choices(variable.rule);
let mut productions = Vec::with_capacity(choices.len());
for rule in choices {
let production = self.flatten_rule(rule)?;
if !productions.contains(&production) {
productions.push(production);
@ -195,7 +196,7 @@ fn extract_choices(rule: Rule) -> Vec<Rule> {
let mut result = vec![Rule::Blank];
for element in elements {
let extraction = extract_choices(element);
let mut next_result = Vec::new();
let mut next_result = Vec::with_capacity(result.len());
for entry in result {
for extraction_entry in &extraction {
next_result.push(Rule::Seq(vec![entry.clone(), extraction_entry.clone()]));
@ -206,7 +207,7 @@ fn extract_choices(rule: Rule) -> Vec<Rule> {
result
}
Rule::Choice(elements) => {
let mut result = Vec::new();
let mut result = Vec::with_capacity(elements.len());
for element in elements {
for rule in extract_choices(element) {
result.push(rule);
@ -262,9 +263,10 @@ pub(super) fn flatten_grammar(
for (i, variable) in variables.iter().enumerate() {
let symbol = Symbol::non_terminal(i);
let used = symbol_is_used(&variables, symbol);
for production in &variable.productions {
if production.steps.is_empty() && symbol_is_used(&variables, symbol) {
if used && production.steps.is_empty() {
Err(FlattenGrammarError::EmptyString(variable.name.clone()))?;
}
@ -533,7 +535,7 @@ mod tests {
assert_eq!(
result.unwrap_err().to_string(),
"Rule `test` cannot be inlined because it contains a reference to itself.",
"Rule `test` cannot be inlined because it contains a reference to itself",
);
}
}

View file

@ -65,7 +65,7 @@ pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<Inte
let mut reserved_words = Vec::with_capacity(grammar.reserved_words.len());
for reserved_word_set in &grammar.reserved_words {
let mut interned_set = Vec::new();
let mut interned_set = Vec::with_capacity(reserved_word_set.reserved_words.len());
for rule in &reserved_word_set.reserved_words {
interned_set.push(interner.intern_rule(rule, None)?);
}
@ -75,7 +75,7 @@ pub(super) fn intern_symbols(grammar: &InputGrammar) -> InternSymbolsResult<Inte
});
}
let mut expected_conflicts = Vec::new();
let mut expected_conflicts = Vec::with_capacity(grammar.expected_conflicts.len());
for conflict in &grammar.expected_conflicts {
let mut interned_conflict = Vec::with_capacity(conflict.len());
for name in conflict {

View file

@ -8,7 +8,7 @@ mod process_inlines;
use std::{
cmp::Ordering,
collections::{hash_map, HashMap, HashSet},
collections::{hash_map, BTreeSet, HashMap, HashSet},
mem,
};
@ -16,6 +16,7 @@ use anyhow::Result;
pub use expand_tokens::ExpandTokensError;
pub use extract_tokens::ExtractTokensError;
pub use flatten_grammar::FlattenGrammarError;
use indexmap::IndexMap;
pub use intern_symbols::InternSymbolsError;
pub use process_inlines::ProcessInlinesError;
use serde::Serialize;
@ -80,6 +81,7 @@ pub type PrepareGrammarResult<T> = Result<T, PrepareGrammarError>;
#[error(transparent)]
pub enum PrepareGrammarError {
ValidatePrecedences(#[from] ValidatePrecedenceError),
ValidateIndirectRecursion(#[from] IndirectRecursionError),
InternSymbols(#[from] InternSymbolsError),
ExtractTokens(#[from] ExtractTokensError),
FlattenGrammar(#[from] FlattenGrammarError),
@ -96,6 +98,22 @@ pub enum ValidatePrecedenceError {
Ordering(#[from] ConflictingPrecedenceOrderingError),
}
#[derive(Debug, Error, Serialize)]
pub struct IndirectRecursionError(pub Vec<String>);
impl std::fmt::Display for IndirectRecursionError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Grammar contains an indirectly recursive rule: ")?;
for (i, symbol) in self.0.iter().enumerate() {
if i > 0 {
write!(f, " -> ")?;
}
write!(f, "{symbol}")?;
}
Ok(())
}
}
#[derive(Debug, Error, Serialize)]
pub struct UndeclaredPrecedenceError {
pub precedence: String,
@ -141,6 +159,7 @@ pub fn prepare_grammar(
AliasMap,
)> {
validate_precedences(input_grammar)?;
validate_indirect_recursion(input_grammar)?;
let interned_grammar = intern_symbols(input_grammar)?;
let (syntax_grammar, lexical_grammar) = extract_tokens(interned_grammar)?;
@ -152,6 +171,83 @@ pub fn prepare_grammar(
Ok((syntax_grammar, lexical_grammar, inlines, default_aliases))
}
/// Check for indirect recursion cycles in the grammar that can cause infinite loops while
/// parsing. An indirect recursion cycle occurs when a non-terminal can derive itself through
/// a chain of single-symbol productions (e.g., A -> B, B -> A).
fn validate_indirect_recursion(grammar: &InputGrammar) -> Result<(), IndirectRecursionError> {
let mut epsilon_transitions: IndexMap<&str, BTreeSet<String>> = IndexMap::new();
for variable in &grammar.variables {
let productions = get_single_symbol_productions(&variable.rule);
// Filter out rules that *directly* reference themselves, as this doesn't
// cause a parsing loop.
let filtered: BTreeSet<String> = productions
.into_iter()
.filter(|s| s != &variable.name)
.collect();
epsilon_transitions.insert(variable.name.as_str(), filtered);
}
for start_symbol in epsilon_transitions.keys() {
let mut visited = BTreeSet::new();
let mut path = Vec::new();
if let Some((start_idx, end_idx)) =
get_cycle(start_symbol, &epsilon_transitions, &mut visited, &mut path)
{
let cycle_symbols = path[start_idx..=end_idx]
.iter()
.map(|s| (*s).to_string())
.collect();
return Err(IndirectRecursionError(cycle_symbols));
}
}
Ok(())
}
fn get_single_symbol_productions(rule: &Rule) -> BTreeSet<String> {
match rule {
Rule::NamedSymbol(name) => BTreeSet::from([name.clone()]),
Rule::Choice(choices) => choices
.iter()
.flat_map(get_single_symbol_productions)
.collect(),
Rule::Metadata { rule, .. } => get_single_symbol_productions(rule),
_ => BTreeSet::new(),
}
}
/// Perform a depth-first search to detect cycles in single state transitions.
fn get_cycle<'a>(
current: &'a str,
transitions: &'a IndexMap<&'a str, BTreeSet<String>>,
visited: &mut BTreeSet<&'a str>,
path: &mut Vec<&'a str>,
) -> Option<(usize, usize)> {
if let Some(first_idx) = path.iter().position(|s| *s == current) {
path.push(current);
return Some((first_idx, path.len() - 1));
}
if visited.contains(current) {
return None;
}
path.push(current);
visited.insert(current);
if let Some(next_symbols) = transitions.get(current) {
for next in next_symbols {
if let Some(cycle) = get_cycle(next, transitions, visited, path) {
return Some(cycle);
}
}
}
path.pop();
None
}
/// Check that all of the named precedences used in the grammar are declared
/// within the `precedences` lists, and also that there are no conflicting
/// precedence orderings declared in those lists.

View file

@ -24,7 +24,6 @@ pub const ABI_VERSION_MIN: usize = 14;
pub const ABI_VERSION_MAX: usize = tree_sitter::LANGUAGE_VERSION;
const ABI_VERSION_WITH_RESERVED_WORDS: usize = 15;
const BUILD_VERSION: &str = env!("CARGO_PKG_VERSION");
const BUILD_SHA: Option<&'static str> = option_env!("BUILD_SHA");
macro_rules! add {
($this: tt, $($arg: tt)*) => {{
@ -322,13 +321,9 @@ impl Generator {
}
fn add_header(&mut self) {
let version = BUILD_SHA.map_or_else(
|| BUILD_VERSION.to_string(),
|build_sha| format!("{BUILD_VERSION} ({build_sha})"),
);
add_line!(
self,
"/* Automatically generated by tree-sitter v{version} */",
"/* Automatically @generated by tree-sitter v{BUILD_VERSION} */",
);
add_line!(self, "");
}
@ -683,12 +678,12 @@ impl Generator {
&mut next_flat_field_map_index,
);
let mut field_map_ids = Vec::new();
let mut field_map_ids = Vec::with_capacity(self.parse_table.production_infos.len());
for production_info in &self.parse_table.production_infos {
if production_info.field_map.is_empty() {
field_map_ids.push((0, 0));
} else {
let mut flat_field_map = Vec::new();
let mut flat_field_map = Vec::with_capacity(production_info.field_map.len());
for (field_name, locations) in &production_info.field_map {
for location in locations {
flat_field_map.push((field_name.clone(), *location));
@ -1111,7 +1106,11 @@ impl Generator {
return;
}
add_line!(self, "const TSCharacterRange {}[] = {{", info.constant_name);
add_line!(
self,
"static const TSCharacterRange {}[] = {{",
info.constant_name
);
indent!(self);
for (ix, range) in characters.ranges().enumerate() {
@ -1351,7 +1350,12 @@ impl Generator {
indent!(self);
let mut next_table_index = 0;
let mut small_state_indices = Vec::new();
let mut small_state_indices = Vec::with_capacity(
self.parse_table
.states
.len()
.saturating_sub(self.large_state_count),
);
let mut symbols_by_value = HashMap::<(usize, SymbolType), Vec<Symbol>>::new();
for state in self.parse_table.states.iter().skip(self.large_state_count) {
small_state_indices.push(next_table_index);
@ -1847,11 +1851,11 @@ impl Generator {
'\u{007F}' => "DEL",
'\u{FEFF}' => "BOM",
'\u{0080}'..='\u{FFFF}' => {
result.push_str(&format!("u{:04x}", c as u32));
write!(result, "u{:04x}", c as u32).unwrap();
break 'special_chars;
}
'\u{10000}'..='\u{10FFFF}' => {
result.push_str(&format!("U{:08x}", c as u32));
write!(result, "U{:08x}", c as u32).unwrap();
break 'special_chars;
}
'0'..='9' | 'a'..='z' | 'A'..='Z' | '_' => unreachable!(),
@ -1882,11 +1886,9 @@ impl Generator {
'\r' => result += "\\r",
'\t' => result += "\\t",
'\0' => result += "\\0",
'\u{0001}'..='\u{001f}' => result += &format!("\\x{:02x}", c as u32),
'\u{007F}'..='\u{FFFF}' => result += &format!("\\u{:04x}", c as u32),
'\u{10000}'..='\u{10FFFF}' => {
result.push_str(&format!("\\U{:08x}", c as u32));
}
'\u{0001}'..='\u{001f}' => write!(result, "\\x{:02x}", c as u32).unwrap(),
'\u{007F}'..='\u{FFFF}' => write!(result, "\\u{:04x}", c as u32).unwrap(),
'\u{10000}'..='\u{10FFFF}' => write!(result, "\\U{:08x}", c as u32).unwrap(),
_ => result.push(c),
}
}

View file

@ -306,7 +306,6 @@ impl Symbol {
}
impl From<Symbol> for Rule {
#[must_use]
fn from(symbol: Symbol) -> Self {
Self::Symbol(symbol)
}

View file

@ -1 +1 @@
4.0.1
4.0.4

View file

@ -11,6 +11,7 @@ use std::{
ffi::{OsStr, OsString},
fs,
io::{BufRead, BufReader},
marker::PhantomData,
mem,
path::{Path, PathBuf},
process::Command,
@ -18,9 +19,7 @@ use std::{
time::SystemTime,
};
#[cfg(any(feature = "tree-sitter-highlight", feature = "tree-sitter-tags"))]
use anyhow::Error;
use anyhow::{anyhow, Context, Result};
use anyhow::{anyhow, Context, Error, Result};
use etcetera::BaseStrategy as _;
use fs4::fs_std::FileExt;
use indoc::indoc;
@ -327,6 +326,7 @@ pub struct LanguageConfiguration<'a> {
highlight_names: &'a Mutex<Vec<String>>,
#[cfg(feature = "tree-sitter-highlight")]
use_all_highlight_names: bool,
_phantom: PhantomData<&'a ()>,
}
pub struct Loader {
@ -561,8 +561,8 @@ impl Loader {
// If multiple language configurations match, then determine which
// one to use by applying the configurations' content regexes.
else {
let file_contents =
fs::read(path).with_context(|| format!("Failed to read path {path:?}"))?;
let file_contents = fs::read(path)
.with_context(|| format!("Failed to read path {}", path.display()))?;
let file_contents = String::from_utf8_lossy(&file_contents);
let mut best_score = -2isize;
let mut best_configuration_id = None;
@ -780,8 +780,8 @@ impl Loader {
if recompile {
fs::create_dir_all(lock_path.parent().unwrap()).with_context(|| {
format!(
"Failed to create directory {:?}",
lock_path.parent().unwrap()
"Failed to create directory {}",
lock_path.parent().unwrap().display()
)
})?;
let lock_file = fs::OpenOptions::new()
@ -799,7 +799,7 @@ impl Loader {
}
let library = unsafe { Library::new(&output_path) }
.with_context(|| format!("Error opening dynamic library {output_path:?}"))?;
.with_context(|| format!("Error opening dynamic library {}", output_path.display()))?;
let language = unsafe {
let language_fn = library
.get::<Symbol<unsafe extern "C" fn() -> Language>>(language_fn_name.as_bytes())
@ -1214,6 +1214,7 @@ impl Loader {
highlight_names: &self.highlight_names,
#[cfg(feature = "tree-sitter-highlight")]
use_all_highlight_names: self.use_all_highlight_names,
_phantom: PhantomData,
};
for file_type in &configuration.file_types {
@ -1283,6 +1284,7 @@ impl Loader {
highlight_names: &self.highlight_names,
#[cfg(feature = "tree-sitter-highlight")]
use_all_highlight_names: self.use_all_highlight_names,
_phantom: PhantomData,
};
self.language_configurations.push(unsafe {
mem::transmute::<LanguageConfiguration<'_>, LanguageConfiguration<'static>>(
@ -1564,7 +1566,7 @@ impl LanguageConfiguration<'_> {
error.row = source[range.start..offset_within_section]
.matches('\n')
.count();
Error::from(error).context(format!("Error in query file {path:?}"))
Error::from(error).context(format!("Error in query file {}", path.display()))
}
#[allow(clippy::type_complexity)]
@ -1581,7 +1583,7 @@ impl LanguageConfiguration<'_> {
let abs_path = self.root_path.join(path);
let prev_query_len = query.len();
query += &fs::read_to_string(&abs_path)
.with_context(|| format!("Failed to read query file {path:?}"))?;
.with_context(|| format!("Failed to read query file {}", path.display()))?;
path_ranges.push((path.clone(), prev_query_len..query.len()));
}
} else {
@ -1599,7 +1601,7 @@ impl LanguageConfiguration<'_> {
let path = queries_path.join(default_path);
if path.exists() {
query = fs::read_to_string(&path)
.with_context(|| format!("Failed to read query file {path:?}"))?;
.with_context(|| format!("Failed to read query file {}", path.display()))?;
path_ranges.push((PathBuf::from(default_path), 0..query.len()));
}
}
@ -1612,8 +1614,8 @@ fn needs_recompile(lib_path: &Path, paths_to_check: &[PathBuf]) -> Result<bool>
if !lib_path.exists() {
return Ok(true);
}
let lib_mtime =
mtime(lib_path).with_context(|| format!("Failed to read mtime of {lib_path:?}"))?;
let lib_mtime = mtime(lib_path)
.with_context(|| format!("Failed to read mtime of {}", lib_path.display()))?;
for path in paths_to_check {
if mtime(path)? > lib_mtime {
return Ok(true);

37
cli/npm/dsl.d.ts vendored
View file

@ -10,6 +10,7 @@ type PrecRightRule = { type: 'PREC_RIGHT'; content: Rule; value: number };
type PrecRule = { type: 'PREC'; content: Rule; value: number };
type Repeat1Rule = { type: 'REPEAT1'; content: Rule };
type RepeatRule = { type: 'REPEAT'; content: Rule };
type ReservedRule = { type: 'RESERVED'; content: Rule; context_name: string };
type SeqRule = { type: 'SEQ'; members: Rule[] };
type StringRule = { type: 'STRING'; value: string };
type SymbolRule<Name extends string> = { type: 'SYMBOL'; name: Name };
@ -33,12 +34,10 @@ type Rule =
| SymbolRule<string>
| TokenRule;
class RustRegex {
declare class RustRegex {
value: string;
constructor(pattern: string) {
this.value = pattern;
}
constructor(pattern: string);
}
type RuleOrLiteral = Rule | RegExp | RustRegex | string;
@ -167,6 +166,17 @@ interface Grammar<
* @see https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar#keyword-extraction
*/
word?: ($: GrammarSymbols<RuleName | BaseGrammarRuleName>) => RuleOrLiteral;
/**
* Mapping of names to reserved word sets. The first reserved word set is the
* global word set, meaning it applies to every rule in every parse state.
* The other word sets can be used with the `reserved` function.
*/
reserved?: Record<
string,
($: GrammarSymbols<RuleName | BaseGrammarRuleName>) => RuleOrLiteral[]
>;
}
type GrammarSchema<RuleName extends string> = {
@ -251,7 +261,7 @@ declare function optional(rule: RuleOrLiteral): ChoiceRule;
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
declare const prec: {
(value: String | number, rule: RuleOrLiteral): PrecRule;
(value: string | number, rule: RuleOrLiteral): PrecRule;
/**
* Marks the given rule as left-associative (and optionally applies a
@ -267,7 +277,7 @@ declare const prec: {
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
left(rule: RuleOrLiteral): PrecLeftRule;
left(value: String | number, rule: RuleOrLiteral): PrecLeftRule;
left(value: string | number, rule: RuleOrLiteral): PrecLeftRule;
/**
* Marks the given rule as right-associative (and optionally applies a
@ -283,7 +293,7 @@ declare const prec: {
* @see https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dh3/index.html
*/
right(rule: RuleOrLiteral): PrecRightRule;
right(value: String | number, rule: RuleOrLiteral): PrecRightRule;
right(value: string | number, rule: RuleOrLiteral): PrecRightRule;
/**
* Marks the given rule with a numerical precedence which will be used to
@ -300,7 +310,7 @@ declare const prec: {
*
* @see https://www.gnu.org/software/bison/manual/html_node/Generalized-LR-Parsing.html
*/
dynamic(value: String | number, rule: RuleOrLiteral): PrecDynamicRule;
dynamic(value: string | number, rule: RuleOrLiteral): PrecDynamicRule;
};
/**
@ -320,6 +330,15 @@ declare function repeat(rule: RuleOrLiteral): RepeatRule;
*/
declare function repeat1(rule: RuleOrLiteral): Repeat1Rule;
/**
* Overrides the global reserved word set for a given rule. The word set name
* should be defined in the `reserved` field in the grammar.
*
* @param wordset name of the reserved word set
* @param rule rule that will use the reserved word set
*/
declare function reserved(wordset: string, rule: RuleOrLiteral): ReservedRule;
/**
* Creates a rule that matches any number of other rules, one after another.
* It is analogous to simply writing multiple symbols next to each other
@ -338,7 +357,7 @@ declare function sym<Name extends string>(name: Name): SymbolRule<Name>;
/**
* Marks the given rule as producing only a single token. Tree-sitter's
* default is to treat each String or RegExp literal in the grammar as a
* default is to treat each string or RegExp literal in the grammar as a
* separate token. Each token is matched separately by the lexer and
* returned as its own leaf node in the tree. The token function allows
* you to express a complex rule using the DSL functions (rather

3
cli/npm/install.js Executable file → Normal file
View file

@ -6,7 +6,8 @@ const http = require('http');
const https = require('https');
const packageJSON = require('./package.json');
// Look to a results table in https://github.com/tree-sitter/tree-sitter/issues/2196
https.globalAgent.keepAlive = false;
const matrix = {
platform: {
'darwin': {

View file

@ -1,6 +1,6 @@
{
"name": "tree-sitter-cli",
"version": "0.25.1",
"version": "0.25.10",
"author": {
"name": "Max Brunsfeld",
"email": "maxbrunsfeld@gmail.com"
@ -14,14 +14,14 @@
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/tree-sitter/tree-sitter.git"
"url": "git+https://github.com/tree-sitter/tree-sitter.git",
"directory": "crates/cli/npm"
},
"description": "CLI for generating fast incremental parsers",
"keywords": [
"parser",
"lexer"
],
"main": "lib/api/index.js",
"engines": {
"node": ">=12.0.0"
},

View file

@ -109,7 +109,7 @@ unsafe extern "C" fn ts_record_realloc(ptr: *mut c_void, size: usize) -> *mut c_
let result = realloc(ptr, size);
if ptr.is_null() {
record_alloc(result);
} else if ptr != result {
} else if !core::ptr::eq(ptr, result) {
record_dealloc(ptr);
record_alloc(result);
}

View file

@ -56,7 +56,9 @@ fn regex_env_var(name: &'static str) -> Option<Regex> {
pub fn new_seed() -> usize {
int_env_var("TREE_SITTER_SEED").unwrap_or_else(|| {
let mut rng = rand::thread_rng();
rng.gen::<usize>()
let seed = rng.gen::<usize>();
eprintln!("Seed: {seed}");
seed
})
}
@ -213,8 +215,9 @@ pub fn fuzz_language_corpus(
}
// Perform a random series of edits and reparse.
let mut undo_stack = Vec::new();
for _ in 0..=rand.unsigned(*EDIT_COUNT) {
let edit_count = rand.unsigned(*EDIT_COUNT);
let mut undo_stack = Vec::with_capacity(edit_count);
for _ in 0..=edit_count {
let edit = get_random_edit(&mut rand, &input);
undo_stack.push(invert_edit(&input, &edit));
perform_edit(&mut tree, &mut input, &edit).unwrap();

View file

@ -20,8 +20,8 @@ impl Rand {
}
pub fn words(&mut self, max_count: usize) -> Vec<u8> {
let mut result = Vec::new();
let word_count = self.unsigned(max_count);
let mut result = Vec::with_capacity(2 * word_count);
for i in 0..word_count {
if i > 0 {
if self.unsigned(5) == 0 {

View file

@ -1,5 +1,5 @@
use std::{
collections::{HashMap, HashSet},
collections::{BTreeMap, HashSet},
fmt::Write,
fs,
io::{self, Write as _},
@ -82,9 +82,9 @@ impl<'de> Deserialize<'de> for Theme {
{
let mut styles = Vec::new();
let mut highlight_names = Vec::new();
if let Ok(colors) = HashMap::<String, Value>::deserialize(deserializer) {
highlight_names.reserve(colors.len());
if let Ok(colors) = BTreeMap::<String, Value>::deserialize(deserializer) {
styles.reserve(colors.len());
highlight_names.reserve(colors.len());
for (name, style_value) in colors {
let mut style = Style::default();
parse_style(&mut style, style_value);
@ -127,7 +127,7 @@ impl Serialize for Theme {
|| effects.contains(Effects::ITALIC)
|| effects.contains(Effects::UNDERLINE)
{
let mut style_json = HashMap::new();
let mut style_json = BTreeMap::new();
if let Some(color) = color {
style_json.insert("color", color);
}

View file

@ -98,6 +98,7 @@ const TESTS_SWIFT_TEMPLATE: &str = include_str!("./templates/tests.swift");
const BUILD_ZIG_TEMPLATE: &str = include_str!("./templates/build.zig");
const BUILD_ZIG_ZON_TEMPLATE: &str = include_str!("./templates/build.zig.zon");
const ROOT_ZIG_TEMPLATE: &str = include_str!("./templates/root.zig");
const TEST_ZIG_TEMPLATE: &str = include_str!("./templates/test.zig");
const TREE_SITTER_JSON_SCHEMA: &str =
"https://tree-sitter.github.io/tree-sitter/assets/schemas/config.schema.json";
@ -301,14 +302,36 @@ pub fn generate_grammar_files(
};
// Create package.json
missing_path(repo_path.join("package.json"), |path| {
generate_file(
path,
PACKAGE_JSON_TEMPLATE,
dashed_language_name.as_str(),
&generate_opts,
)
})?;
missing_path_else(
repo_path.join("package.json"),
allow_update,
|path| {
generate_file(
path,
PACKAGE_JSON_TEMPLATE,
dashed_language_name.as_str(),
&generate_opts,
)
},
|path| {
let contents = fs::read_to_string(path)?
.replace(
r#""node-addon-api": "^8.3.1"#,
r#""node-addon-api": "^8.5.0""#,
)
.replace(
indoc! {r#"
"prebuildify": "^6.0.1",
"tree-sitter-cli":"#},
indoc! {r#"
"prebuildify": "^6.0.1",
"tree-sitter": "^0.22.4",
"tree-sitter-cli":"#},
);
write_file(path, contents)?;
Ok(())
},
)?;
// Do not create a grammar.js file in a repo with multiple language configs
if !tree_sitter_config.has_multiple_language_configs() {
@ -371,14 +394,25 @@ pub fn generate_grammar_files(
generate_file(path, BUILD_RS_TEMPLATE, language_name, &generate_opts)
})?;
missing_path(repo_path.join("Cargo.toml"), |path| {
generate_file(
path,
CARGO_TOML_TEMPLATE,
dashed_language_name.as_str(),
&generate_opts,
)
})?;
missing_path_else(
repo_path.join("Cargo.toml"),
allow_update,
|path| {
generate_file(
path,
CARGO_TOML_TEMPLATE,
dashed_language_name.as_str(),
&generate_opts,
)
},
|path| {
let contents = fs::read_to_string(path)?;
if contents.contains("\"LICENSE\"") {
write_file(path, contents.replace("\"LICENSE\"", "\"/LICENSE\""))?;
}
Ok(())
},
)?;
Ok(())
})?;
@ -394,6 +428,7 @@ pub fn generate_grammar_files(
|path| {
let contents = fs::read_to_string(path)?;
if !contents.contains("bun") {
eprintln!("Replacing index.js");
generate_file(path, INDEX_JS_TEMPLATE, language_name, &generate_opts)?;
}
Ok(())
@ -597,14 +632,32 @@ pub fn generate_grammar_files(
})?;
missing_path(path.join("tests"), create_dir)?.apply(|path| {
missing_path(path.join("test_binding.py"), |path| {
generate_file(
path,
TEST_BINDING_PY_TEMPLATE,
language_name,
&generate_opts,
)
})?;
missing_path_else(
path.join("test_binding.py"),
allow_update,
|path| {
generate_file(
path,
TEST_BINDING_PY_TEMPLATE,
language_name,
&generate_opts,
)
},
|path| {
let mut contents = fs::read_to_string(path)?;
if !contents.contains("Parser(Language(") {
contents = contents
.replace("tree_sitter.Language(", "Parser(Language(")
.replace(".language())\n", ".language()))\n")
.replace(
"import tree_sitter\n",
"from tree_sitter import Language, Parser\n",
);
write_file(path, contents)?;
}
Ok(())
},
)?;
Ok(())
})?;
@ -614,7 +667,7 @@ pub fn generate_grammar_files(
|path| generate_file(path, SETUP_PY_TEMPLATE, language_name, &generate_opts),
|path| {
let contents = fs::read_to_string(path)?;
if !contents.contains("egg_info") || !contents.contains("Py_GIL_DISABLED") {
if !contents.contains("build_ext") {
eprintln!("Replacing setup.py");
generate_file(path, SETUP_PY_TEMPLATE, language_name, &generate_opts)?;
}
@ -653,22 +706,17 @@ pub fn generate_grammar_files(
// Generate Swift bindings
if tree_sitter_config.bindings.swift {
missing_path(bindings_dir.join("swift"), create_dir)?.apply(|path| {
let lang_path = path.join(format!("TreeSitter{camel_name}"));
let lang_path = path.join(&class_name);
missing_path(&lang_path, create_dir)?;
missing_path(lang_path.join(format!("{language_name}.h")), |path| {
generate_file(path, PARSER_NAME_H_TEMPLATE, language_name, &generate_opts)
})?;
missing_path(
path.join(format!("TreeSitter{camel_name}Tests")),
create_dir,
)?
.apply(|path| {
missing_path(
path.join(format!("TreeSitter{camel_name}Tests.swift")),
|path| generate_file(path, TESTS_SWIFT_TEMPLATE, language_name, &generate_opts),
)?;
missing_path(path.join(format!("{class_name}Tests")), create_dir)?.apply(|path| {
missing_path(path.join(format!("{class_name}Tests.swift")), |path| {
generate_file(path, TESTS_SWIFT_TEMPLATE, language_name, &generate_opts)
})?;
Ok(())
})?;
@ -679,10 +727,13 @@ pub fn generate_grammar_files(
|path| generate_file(path, PACKAGE_SWIFT_TEMPLATE, language_name, &generate_opts),
|path| {
let mut contents = fs::read_to_string(path)?;
contents = contents.replace(
"https://github.com/ChimeHQ/SwiftTreeSitter",
"https://github.com/tree-sitter/swift-tree-sitter",
);
contents = contents
.replace(
"https://github.com/ChimeHQ/SwiftTreeSitter",
"https://github.com/tree-sitter/swift-tree-sitter",
)
.replace("version: \"0.8.0\")", "version: \"0.9.0\")")
.replace("(url:", "(name: \"SwiftTreeSitter\", url:");
write_file(path, contents)?;
Ok(())
},
@ -694,17 +745,54 @@ pub fn generate_grammar_files(
// Generate Zig bindings
if tree_sitter_config.bindings.zig {
missing_path(repo_path.join("build.zig"), |path| {
generate_file(path, BUILD_ZIG_TEMPLATE, language_name, &generate_opts)
})?;
missing_path_else(
repo_path.join("build.zig"),
allow_update,
|path| generate_file(path, BUILD_ZIG_TEMPLATE, language_name, &generate_opts),
|path| {
let contents = fs::read_to_string(path)?;
if !contents.contains("b.pkg_hash.len") {
eprintln!("Replacing build.zig");
generate_file(path, BUILD_ZIG_TEMPLATE, language_name, &generate_opts)
} else {
Ok(())
}
},
)?;
missing_path(repo_path.join("build.zig.zon"), |path| {
generate_file(path, BUILD_ZIG_ZON_TEMPLATE, language_name, &generate_opts)
})?;
missing_path_else(
repo_path.join("build.zig.zon"),
allow_update,
|path| generate_file(path, BUILD_ZIG_ZON_TEMPLATE, language_name, &generate_opts),
|path| {
let contents = fs::read_to_string(path)?;
if !contents.contains(".name = .tree_sitter_") {
eprintln!("Replacing build.zig.zon");
generate_file(path, BUILD_ZIG_ZON_TEMPLATE, language_name, &generate_opts)
} else {
Ok(())
}
},
)?;
missing_path(bindings_dir.join("zig"), create_dir)?.apply(|path| {
missing_path(path.join("root.zig"), |path| {
generate_file(path, ROOT_ZIG_TEMPLATE, language_name, &generate_opts)
missing_path_else(
path.join("root.zig"),
allow_update,
|path| generate_file(path, ROOT_ZIG_TEMPLATE, language_name, &generate_opts),
|path| {
let contents = fs::read_to_string(path)?;
if contents.contains("ts.Language") {
eprintln!("Replacing root.zig");
generate_file(path, ROOT_ZIG_TEMPLATE, language_name, &generate_opts)
} else {
Ok(())
}
},
)?;
missing_path(path.join("test.zig"), |path| {
generate_file(path, TEST_ZIG_TEMPLATE, language_name, &generate_opts)
})?;
Ok(())

View file

@ -89,8 +89,8 @@ pub fn get_input(
let Some(path_str) = path.to_str() else {
bail!("Invalid path: {}", path.display());
};
let paths =
glob(path_str).with_context(|| format!("Invalid glob pattern {path:?}"))?;
let paths = glob(path_str)
.with_context(|| format!("Invalid glob pattern {}", path.display()))?;
for path in paths {
incorporate_path(path?, positive);
}

View file

@ -206,7 +206,8 @@ struct Parse {
#[arg(long, short)]
pub quiet: bool,
#[allow(clippy::doc_markdown)]
/// Apply edits in the format: \"row, col delcount insert_text\"
/// Apply edits in the format: \"row,col|position delcount insert_text\", can be supplied
/// multiple times
#[arg(
long,
num_args = 1..,
@ -964,8 +965,11 @@ impl Parse {
for path in &paths {
let path = Path::new(&path);
let language =
loader.select_language(path, current_dir, self.scope.as_deref())?;
let language = loader
.select_language(path, current_dir, self.scope.as_deref())
.with_context(|| {
anyhow!("Failed to load langauge for path \"{}\"", path.display())
})?;
parse::parse_file_at_path(
&mut parser,

View file

@ -29,18 +29,28 @@ pub struct Stats {
impl fmt::Display for Stats {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let duration_us = self.total_duration.as_micros();
let success_rate = if self.total_parses > 0 {
format!(
"{:.2}%",
((self.successful_parses as f64) / (self.total_parses as f64)) * 100.0,
)
} else {
"N/A".to_string()
};
let duration_str = match (self.total_parses, duration_us) {
(0, _) => "N/A".to_string(),
(_, 0) => "0 bytes/ms".to_string(),
(_, _) => format!(
"{} bytes/ms",
((self.total_bytes as u128) * 1_000) / duration_us
),
};
writeln!(
f,
"Total parses: {}; successful parses: {}; failed parses: {}; success percentage: {:.2}%; average speed: {} bytes/ms",
"Total parses: {}; successful parses: {}; failed parses: {}; success percentage: {success_rate}; average speed: {duration_str}",
self.total_parses,
self.successful_parses,
self.total_parses - self.successful_parses,
((self.successful_parses as f64) / (self.total_parses as f64)) * 100.0,
if duration_us != 0 {
((self.total_bytes as u128) * 1_000) / duration_us
} else {
0
}
)
}
}
@ -225,7 +235,7 @@ pub struct ParseStats {
pub cumulative_stats: Stats,
}
#[derive(Serialize, ValueEnum, Debug, Clone, Default, Eq, PartialEq)]
#[derive(Serialize, ValueEnum, Debug, Copy, Clone, Default, Eq, PartialEq)]
pub enum ParseDebugType {
#[default]
Quiet,
@ -273,10 +283,11 @@ pub fn parse_file_at_path(
}
// Log to stderr if `--debug` was passed
else if opts.debug != ParseDebugType::Quiet {
let mut curr_version: usize = 0usize;
let mut curr_version: usize = 0;
let use_color = std::env::var("NO_COLOR").map_or(true, |v| v != "1");
parser.set_logger(Some(Box::new(|log_type, message| {
if opts.debug == ParseDebugType::Normal {
let debug = opts.debug;
parser.set_logger(Some(Box::new(move |log_type, message| {
if debug == ParseDebugType::Normal {
if log_type == LogType::Lex {
write!(&mut io::stderr(), " ").unwrap();
}
@ -686,19 +697,23 @@ pub fn parse_file_at_path(
if let Some(node) = first_error {
let start = node.start_position();
let end = node.end_position();
let mut node_text = String::new();
for c in node.kind().chars() {
if let Some(escaped) = escape_invisible(c) {
node_text += escaped;
} else {
node_text.push(c);
}
}
write!(&mut stdout, "\t(")?;
if node.is_missing() {
if node.is_named() {
write!(&mut stdout, "MISSING {}", node.kind())?;
write!(&mut stdout, "MISSING {node_text}")?;
} else {
write!(
&mut stdout,
"MISSING \"{}\"",
node.kind().replace('\n', "\\n")
)?;
write!(&mut stdout, "MISSING \"{node_text}\"")?;
}
} else {
write!(&mut stdout, "{}", node.kind())?;
write!(&mut stdout, "{node_text}")?;
}
write!(
&mut stdout,

View file

@ -34,7 +34,7 @@ pub fn query_file_at_path(
let mut stdout = stdout.lock();
let query_source = fs::read_to_string(query_path)
.with_context(|| format!("Error reading query file {query_path:?}"))?;
.with_context(|| format!("Error reading query file {}", query_path.display()))?;
let query = Query::new(language, &query_source).with_context(|| "Query compilation failed")?;
let mut query_cursor = QueryCursor::new();
@ -55,7 +55,7 @@ pub fn query_file_at_path(
}
let source_code =
fs::read(path).with_context(|| format!("Error reading source file {path:?}"))?;
fs::read(path).with_context(|| format!("Error reading source file {}", path.display()))?;
let tree = parser.parse(&source_code, None).unwrap();
let start = Instant::now();

View file

@ -18,7 +18,7 @@ include = [
"queries/*",
"src/*",
"tree-sitter.json",
"LICENSE",
"/LICENSE",
]
[lib]

View file

@ -7,24 +7,24 @@ pub fn build(b: *std.Build) !void {
const shared = b.option(bool, "build-shared", "Build a shared library") orelse true;
const reuse_alloc = b.option(bool, "reuse-allocator", "Reuse the library allocator") orelse false;
const lib: *std.Build.Step.Compile = if (shared) b.addSharedLibrary(.{
.name = "tree-sitter-PARSER_NAME",
.pic = true,
.target = target,
.optimize = optimize,
.link_libc = true,
}) else b.addStaticLibrary(.{
.name = "tree-sitter-PARSER_NAME",
.target = target,
.optimize = optimize,
.link_libc = true,
const library_name = "tree-sitter-PARSER_NAME";
const lib: *std.Build.Step.Compile = b.addLibrary(.{
.name = library_name,
.linkage = if (shared) .dynamic else .static,
.root_module = b.createModule(.{
.target = target,
.optimize = optimize,
.link_libc = true,
.pic = if (shared) true else null,
}),
});
lib.addCSourceFile(.{
.file = b.path("src/parser.c"),
.flags = &.{"-std=c11"},
});
if (hasScanner(b.build_root.handle)) {
if (fileExists(b, "src/scanner.c")) {
lib.addCSourceFile(.{
.file = b.path("src/scanner.c"),
.flags = &.{"-std=c11"},
@ -42,38 +42,52 @@ pub fn build(b: *std.Build) !void {
b.installArtifact(lib);
b.installFile("src/node-types.json", "node-types.json");
b.installDirectory(.{ .source_dir = b.path("queries"), .install_dir = .prefix, .install_subdir = "queries", .include_extensions = &.{"scm"} });
const module = b.addModule("tree-sitter-PARSER_NAME", .{
if (fileExists(b, "queries")) {
b.installDirectory(.{
.source_dir = b.path("queries"),
.install_dir = .prefix,
.install_subdir = "queries",
.include_extensions = &.{"scm"},
});
}
const module = b.addModule(library_name, .{
.root_source_file = b.path("bindings/zig/root.zig"),
.target = target,
.optimize = optimize,
});
module.linkLibrary(lib);
const ts_dep = b.dependency("tree-sitter", .{});
const ts_mod = ts_dep.module("tree-sitter");
module.addImport("tree-sitter", ts_mod);
//
// Tests
//
const tests = b.addTest(.{
.root_source_file = b.path("bindings/zig/root.zig"),
.target = target,
.optimize = optimize,
.root_module = b.createModule(.{
.root_source_file = b.path("bindings/zig/test.zig"),
.target = target,
.optimize = optimize,
}),
});
tests.linkLibrary(lib);
tests.root_module.addImport("tree-sitter", ts_mod);
tests.root_module.addImport(library_name, module);
// HACK: fetch tree-sitter dependency only when testing this module
if (b.pkg_hash.len == 0) {
var args = try std.process.argsWithAllocator(b.allocator);
defer args.deinit();
while (args.next()) |a| {
if (std.mem.eql(u8, a, "test")) {
const ts_dep = b.lazyDependency("tree_sitter", .{}) orelse continue;
tests.root_module.addImport("tree-sitter", ts_dep.module("tree-sitter"));
break;
}
}
}
const run_tests = b.addRunArtifact(tests);
const test_step = b.step("test", "Run unit tests");
test_step.dependOn(&run_tests.step);
}
inline fn hasScanner(dir: std.fs.Dir) bool {
dir.access("src/scanner.c", .{}) catch return false;
inline fn fileExists(b: *std.Build, filename: []const u8) bool {
const dir = b.build_root.handle;
dir.access(filename, .{}) catch return false;
return true;
}

View file

@ -1,10 +1,13 @@
.{
.name = "tree-sitter-PARSER_NAME",
.name = .tree_sitter_PARSER_NAME,
.version = "PARSER_VERSION",
.dependencies = .{ .@"tree-sitter" = .{
.url = "https://github.com/tree-sitter/zig-tree-sitter/archive/refs/tags/v0.25.0.tar.gz",
.hash = "12201a8d5e840678bbbf5128e605519c4024af422295d68e2ba2090e675328e5811d",
} },
.dependencies = .{
.tree_sitter = .{
.url = "git+https://github.com/tree-sitter/zig-tree-sitter#b4b72c903e69998fc88e27e154a5e3cc9166551b",
.hash = "tree_sitter-0.25.0-8heIf51vAQConvVIgvm-9mVIbqh7yabZYqPXfOpS3YoG",
.lazy = true,
},
},
.paths = .{
"build.zig",
"build.zig.zon",

View file

@ -15,6 +15,8 @@ if(NOT ${TREE_SITTER_ABI_VERSION} MATCHES "^[0-9]+$")
message(FATAL_ERROR "TREE_SITTER_ABI_VERSION must be an integer")
endif()
include(GNUInstallDirs)
find_program(TREE_SITTER_CLI tree-sitter DOC "Tree-sitter CLI")
add_custom_command(OUTPUT "${CMAKE_CURRENT_SOURCE_DIR}/src/parser.c"
@ -47,13 +49,11 @@ set_target_properties(tree-sitter-KEBAB_PARSER_NAME
configure_file(bindings/c/tree-sitter-KEBAB_PARSER_NAME.pc.in
"${CMAKE_CURRENT_BINARY_DIR}/tree-sitter-KEBAB_PARSER_NAME.pc" @ONLY)
include(GNUInstallDirs)
install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/bindings/c/tree_sitter"
DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}"
FILES_MATCHING PATTERN "*.h")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter-KEBAB_PARSER_NAME.pc"
DESTINATION "${CMAKE_INSTALL_DATAROOTDIR}/pkgconfig")
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
install(TARGETS tree-sitter-KEBAB_PARSER_NAME
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}")

View file

@ -37,5 +37,6 @@ Package.swift linguist-generated
Package.resolved linguist-generated
# Zig bindings
bindings/zig/* linguist-generated
build.zig linguist-generated
build.zig.zon linguist-generated

View file

@ -1,13 +1,16 @@
# Rust artifacts
target/
Cargo.lock
# Node artifacts
build/
prebuilds/
node_modules/
package-lock.json
# Swift artifacts
.build/
Package.resolved
# Go artifacts
_obj/

View file

@ -1,7 +1,7 @@
//! This crate provides CAMEL_PARSER_NAME language support for the [tree-sitter][] parsing library.
//! This crate provides TITLE_PARSER_NAME language support for the [tree-sitter] parsing library.
//!
//! Typically, you will use the [LANGUAGE][] constant to add this language to a
//! tree-sitter [Parser][], and then use the parser to parse some code:
//! Typically, you will use the [`LANGUAGE`] constant to add this language to a
//! tree-sitter [`Parser`], and then use the parser to parse some code:
//!
//! ```
//! let code = r#"
@ -15,7 +15,7 @@
//! assert!(!tree.root_node().has_error());
//! ```
//!
//! [Parser]: https://docs.rs/tree-sitter/*/tree_sitter/struct.Parser.html
//! [`Parser`]: https://docs.rs/tree-sitter/RUST_BINDING_VERSION/tree_sitter/struct.Parser.html
//! [tree-sitter]: https://tree-sitter.github.io/
use tree_sitter_language::LanguageFn;
@ -24,12 +24,10 @@ extern "C" {
fn tree_sitter_PARSER_NAME() -> *const ();
}
/// The tree-sitter [`LanguageFn`][LanguageFn] for this grammar.
///
/// [LanguageFn]: https://docs.rs/tree-sitter-language/*/tree_sitter_language/struct.LanguageFn.html
/// The tree-sitter [`LanguageFn`] for this grammar.
pub const LANGUAGE: LanguageFn = unsafe { LanguageFn::from_raw(tree_sitter_PARSER_NAME) };
/// The content of the [`node-types.json`][] file for this grammar.
/// The content of the [`node-types.json`] file for this grammar.
///
/// [`node-types.json`]: https://tree-sitter.github.io/tree-sitter/using-parsers/6-static-node-types
pub const NODE_TYPES: &str = include_str!("../../src/node-types.json");

View file

@ -77,7 +77,9 @@ install: all
install -m755 lib$(LANGUAGE_NAME).$(SOEXT) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER)
ln -sf lib$(LANGUAGE_NAME).$(SOEXTVER) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER_MAJOR)
ln -sf lib$(LANGUAGE_NAME).$(SOEXTVER_MAJOR) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXT)
ifneq ($(wildcard queries/*.scm),)
install -m644 queries/*.scm '$(DESTDIR)$(DATADIR)'/tree-sitter/queries/KEBAB_PARSER_NAME
endif
uninstall:
$(RM) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).a \

View file

@ -29,15 +29,16 @@
"*.wasm"
],
"dependencies": {
"node-addon-api": "^8.2.1",
"node-gyp-build": "^4.8.2"
"node-addon-api": "^8.5.0",
"node-gyp-build": "^4.8.4"
},
"devDependencies": {
"prebuildify": "^6.0.1",
"tree-sitter": "^0.22.4",
"tree-sitter-cli": "^CLI_VERSION"
},
"peerDependencies": {
"tree-sitter": "^0.21.1"
"tree-sitter": "^0.22.4"
},
"peerDependenciesMeta": {
"tree-sitter": {

View file

@ -14,7 +14,7 @@ let package = Package(
.library(name: "PARSER_CLASS_NAME", targets: ["PARSER_CLASS_NAME"]),
],
dependencies: [
.package(url: "https://github.com/tree-sitter/swift-tree-sitter", from: "0.8.0"),
.package(name: "SwiftTreeSitter", url: "https://github.com/tree-sitter/swift-tree-sitter", from: "0.9.0"),
],
targets: [
.target(

View file

@ -1,5 +1,5 @@
[build-system]
requires = ["setuptools>=42", "wheel"]
requires = ["setuptools>=62.4.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]

View file

@ -1,19 +1,5 @@
const testing = @import("std").testing;
extern fn tree_sitter_PARSER_NAME() callconv(.c) *const anyopaque;
const ts = @import("tree-sitter");
const Language = ts.Language;
const Parser = ts.Parser;
pub extern fn tree_sitter_PARSER_NAME() callconv(.C) *const Language;
pub export fn language() *const Language {
pub fn language() *const anyopaque {
return tree_sitter_PARSER_NAME();
}
test "can load grammar" {
const parser = Parser.create();
defer parser.destroy();
try testing.expectEqual(parser.setLanguage(language()), void{});
try testing.expectEqual(parser.getLanguage(), tree_sitter_PARSER_NAME());
}

View file

@ -1,31 +1,12 @@
from os import path
from platform import system
from sysconfig import get_config_var
from setuptools import Extension, find_packages, setup
from setuptools.command.build import build
from setuptools.command.build_ext import build_ext
from setuptools.command.egg_info import egg_info
from wheel.bdist_wheel import bdist_wheel
sources = [
"bindings/python/tree_sitter_LOWER_PARSER_NAME/binding.c",
"src/parser.c",
]
if path.exists("src/scanner.c"):
sources.append("src/scanner.c")
macros: list[tuple[str, str | None]] = [
("PY_SSIZE_T_CLEAN", None),
("TREE_SITTER_HIDE_SYMBOLS", None),
]
if limited_api := not get_config_var("Py_GIL_DISABLED"):
macros.append(("Py_LIMITED_API", "0x030A0000"))
if system() != "Windows":
cflags = ["-std=c11", "-fvisibility=hidden"]
else:
cflags = ["/std:c11", "/utf-8"]
class Build(build):
def run(self):
@ -35,6 +16,19 @@ class Build(build):
super().run()
class BuildExt(build_ext):
def build_extension(self, ext: Extension):
if self.compiler.compiler_type != "msvc":
ext.extra_compile_args = ["-std=c11", "-fvisibility=hidden"]
else:
ext.extra_compile_args = ["/std:c11", "/utf-8"]
if path.exists("src/scanner.c"):
ext.sources.append("src/scanner.c")
if ext.py_limited_api:
ext.define_macros.append(("Py_LIMITED_API", "0x030A0000"))
super().build_extension(ext)
class BdistWheel(bdist_wheel):
def get_tag(self):
python, abi, platform = super().get_tag()
@ -61,15 +55,21 @@ setup(
ext_modules=[
Extension(
name="_binding",
sources=sources,
extra_compile_args=cflags,
define_macros=macros,
sources=[
"bindings/python/tree_sitter_LOWER_PARSER_NAME/binding.c",
"src/parser.c",
],
define_macros=[
("PY_SSIZE_T_CLEAN", None),
("TREE_SITTER_HIDE_SYMBOLS", None),
],
include_dirs=["src"],
py_limited_api=limited_api,
py_limited_api=not get_config_var("Py_GIL_DISABLED"),
)
],
cmdclass={
"build": Build,
"build_ext": BuildExt,
"bdist_wheel": BdistWheel,
"egg_info": EggInfo,
},

View file

@ -0,0 +1,17 @@
const testing = @import("std").testing;
const ts = @import("tree-sitter");
const root = @import("tree-sitter-PARSER_NAME");
const Language = ts.Language;
const Parser = ts.Parser;
test "can load grammar" {
const parser = Parser.create();
defer parser.destroy();
const lang: *const ts.Language = @ptrCast(root.language());
defer lang.destroy();
try testing.expectEqual(void{}, parser.setLanguage(lang));
try testing.expectEqual(lang, parser.getLanguage());
}

View file

@ -1,12 +1,12 @@
from unittest import TestCase
import tree_sitter
from tree_sitter import Language, Parser
import tree_sitter_LOWER_PARSER_NAME
class TestLanguage(TestCase):
def test_can_load_grammar(self):
try:
tree_sitter.Language(tree_sitter_LOWER_PARSER_NAME.language())
Parser(Language(tree_sitter_LOWER_PARSER_NAME.language()))
except Exception:
self.fail("Error loading TITLE_PARSER_NAME grammar")

View file

@ -172,7 +172,7 @@ pub fn iterate_assertions(
let mut j = i;
while let (false, Some(highlight)) = (passed, highlights.get(j)) {
end_column = position.column + length - 1;
if highlight.0.column > end_column {
if highlight.0.row >= position.row && highlight.0.column > end_column {
break 'highlight_loop;
}

View file

@ -238,7 +238,7 @@ async fn yield_now() {
SimpleYieldNow { yielded: false }.await;
}
pub fn noop_waker() -> Waker {
pub const fn noop_waker() -> Waker {
const VTABLE: RawWakerVTable = RawWakerVTable::new(
// Cloning just returns a new no-op raw waker
|_| RAW,

View file

@ -23,7 +23,7 @@ use crate::{
};
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_bash(seed: usize) {
fn test_corpus_for_bash_language(seed: usize) {
test_language_corpus(
"bash",
seed,
@ -39,73 +39,77 @@ fn test_corpus_for_bash(seed: usize) {
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_c(seed: usize) {
fn test_corpus_for_c_language(seed: usize) {
test_language_corpus("c", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_cpp(seed: usize) {
fn test_corpus_for_cpp_language(seed: usize) {
test_language_corpus("cpp", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_embedded_template(seed: usize) {
fn test_corpus_for_embedded_template_language(seed: usize) {
test_language_corpus("embedded-template", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_go(seed: usize) {
fn test_corpus_for_go_language(seed: usize) {
test_language_corpus("go", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_html(seed: usize) {
fn test_corpus_for_html_language(seed: usize) {
test_language_corpus("html", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_java(seed: usize) {
test_language_corpus("java", seed, None, None);
fn test_corpus_for_java_language(seed: usize) {
test_language_corpus(
"java",
seed,
Some(&["java - corpus - expressions - switch with unnamed pattern variable"]),
None,
);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_javascript(seed: usize) {
fn test_corpus_for_javascript_language(seed: usize) {
test_language_corpus("javascript", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_json(seed: usize) {
fn test_corpus_for_json_language(seed: usize) {
test_language_corpus("json", seed, None, None);
}
#[ignore]
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_php(seed: usize) {
test_language_corpus("php", seed, None, None);
fn test_corpus_for_php_language(seed: usize) {
test_language_corpus("php", seed, None, Some("php"));
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_python(seed: usize) {
fn test_corpus_for_python_language(seed: usize) {
test_language_corpus("python", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_ruby(seed: usize) {
fn test_corpus_for_ruby_language(seed: usize) {
test_language_corpus("ruby", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_rust(seed: usize) {
fn test_corpus_for_rust_language(seed: usize) {
test_language_corpus("rust", seed, None, None);
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_typescript(seed: usize) {
fn test_corpus_for_typescript_language(seed: usize) {
test_language_corpus("typescript", seed, None, Some("typescript"));
}
#[test_with_seed(retry=10, seed=*START_SEED, seed_fn=new_seed)]
fn test_corpus_for_tsx(seed: usize) {
fn test_corpus_for_tsx_language(seed: usize) {
test_language_corpus("typescript", seed, None, Some("tsx"));
}
@ -239,8 +243,9 @@ pub fn test_language_corpus(
}
// Perform a random series of edits and reparse.
let mut undo_stack = Vec::new();
for _ in 0..=rand.unsigned(*EDIT_COUNT) {
let edit_count = rand.unsigned(*EDIT_COUNT);
let mut undo_stack = Vec::with_capacity(edit_count);
for _ in 0..=edit_count {
let edit = get_random_edit(&mut rand, &input);
undo_stack.push(invert_edit(&input, &edit));
perform_edit(&mut tree, &mut input, &edit).unwrap();
@ -376,7 +381,7 @@ fn test_feature_corpus_files() {
let actual_message = e.to_string().replace("\r\n", "\n");
if expected_message != actual_message {
eprintln!(
"Unexpected error message.\n\nExpected:\n\n{expected_message}\nActual:\n\n{actual_message}\n",
"Unexpected error message.\n\nExpected:\n\n`{expected_message}`\nActual:\n\n`{actual_message}`\n",
);
failure_count += 1;
}

View file

@ -108,7 +108,7 @@ unsafe extern "C" fn ts_record_realloc(ptr: *mut c_void, size: usize) -> *mut c_
let result = realloc(ptr, size);
if ptr.is_null() {
record_alloc(result);
} else if ptr != result {
} else if !core::ptr::eq(ptr, result) {
record_dealloc(ptr);
record_alloc(result);
}

View file

@ -6,11 +6,13 @@ use std::{
use anyhow::Context;
use tree_sitter::Language;
use tree_sitter_generate::{ALLOC_HEADER, ARRAY_HEADER};
use tree_sitter_generate::{load_grammar_file, ALLOC_HEADER, ARRAY_HEADER};
use tree_sitter_highlight::HighlightConfiguration;
use tree_sitter_loader::{CompileConfig, Loader};
use tree_sitter_tags::TagsConfiguration;
use crate::tests::generate_parser;
include!("./dirs.rs");
static TEST_LOADER: LazyLock<Loader> = LazyLock::new(|| {
@ -21,6 +23,9 @@ static TEST_LOADER: LazyLock<Loader> = LazyLock::new(|| {
loader
});
#[cfg(feature = "wasm")]
pub static ENGINE: LazyLock<tree_sitter::wasmtime::Engine> = LazyLock::new(Default::default);
pub fn test_loader() -> &'static Loader {
&TEST_LOADER
}
@ -40,6 +45,22 @@ pub fn get_language(name: &str) -> Language {
TEST_LOADER.load_language_at_path(config).unwrap()
}
pub fn get_test_fixture_language(name: &str) -> Language {
get_test_fixture_language_internal(name, false)
}
#[cfg(feature = "wasm")]
pub fn get_test_fixture_language_wasm(name: &str) -> Language {
get_test_fixture_language_internal(name, true)
}
fn get_test_fixture_language_internal(name: &str, wasm: bool) -> Language {
let grammar_dir_path = fixtures_dir().join("test_grammars").join(name);
let grammar_json = load_grammar_file(&grammar_dir_path.join("grammar.js"), None).unwrap();
let (parser_name, parser_code) = generate_parser(&grammar_json).unwrap();
get_test_language_internal(&parser_name, &parser_code, Some(&grammar_dir_path), wasm)
}
pub fn get_language_queries_path(language_name: &str) -> PathBuf {
GRAMMARS_DIR.join(language_name).join("queries")
}
@ -78,6 +99,15 @@ pub fn get_tags_config(language_name: &str) -> TagsConfiguration {
}
pub fn get_test_language(name: &str, parser_code: &str, path: Option<&Path>) -> Language {
get_test_language_internal(name, parser_code, path, false)
}
fn get_test_language_internal(
name: &str,
parser_code: &str,
path: Option<&Path>,
wasm: bool,
) -> Language {
let src_dir = scratch_dir().join("src").join(name);
fs::create_dir_all(&src_dir).unwrap();
@ -127,5 +157,21 @@ pub fn get_test_language(name: &str, parser_code: &str, path: Option<&Path>) ->
config.header_paths = vec![&HEADER_DIR];
config.name = name.to_string();
TEST_LOADER.load_language_at_path_with_name(config).unwrap()
if wasm {
#[cfg(feature = "wasm")]
{
let mut loader = Loader::with_parser_lib_path(SCRATCH_DIR.clone());
loader.use_wasm(&ENGINE);
if env::var("TREE_SITTER_GRAMMAR_DEBUG").is_ok() {
loader.debug_build(true);
}
loader.load_language_at_path_with_name(config).unwrap()
}
#[cfg(not(feature = "wasm"))]
{
unimplemented!("Wasm feature is not enabled")
}
} else {
TEST_LOADER.load_language_at_path_with_name(config).unwrap()
}
}

View file

@ -350,12 +350,11 @@ fn test_highlighting_empty_lines() {
fn test_highlighting_carriage_returns() {
let source = "a = \"a\rb\"\r\nb\r";
// FIXME(amaanq): figure why this changed w/ JS's grammar changes
assert_eq!(
&to_html(source, &JS_HIGHLIGHT).unwrap(),
&[
"<span class=variable>a</span> <span class=operator>=</span> <span class=string>&quot;a<span class=variable>b</span>&quot;</span>\n",
"<span class=variable>b</span>\n",
"<span class=variable>a</span> <span class=operator>=</span> <span class=string>&quot;a<span class=carriage-return></span><span class=variable>b</span>&quot;</span>\n",
"<span class=variable>b</span><span class=carriage-return></span>\n",
],
);
}
@ -598,7 +597,7 @@ fn test_highlighting_via_c_api() {
let output_line_offsets =
unsafe { slice::from_raw_parts(output_line_offsets, output_line_count as usize) };
let mut lines = Vec::new();
let mut lines = Vec::with_capacity(output_line_count as usize);
for i in 0..(output_line_count as usize) {
let line_start = output_line_offsets[i] as usize;
let line_end = output_line_offsets

View file

@ -152,6 +152,7 @@ fn test_supertypes() {
"_literal_pattern",
"captured_pattern",
"const_block",
"generic_pattern",
"identifier",
"macro_invocation",
"mut_pattern",

View file

@ -6,7 +6,10 @@ use super::{
helpers::fixtures::{fixtures_dir, get_language, get_test_language},
Rand,
};
use crate::{parse::perform_edit, tests::generate_parser};
use crate::{
parse::perform_edit,
tests::{generate_parser, helpers::fixtures::get_test_fixture_language},
};
const JSON_EXAMPLE: &str = r#"
@ -308,19 +311,8 @@ fn test_parent_of_zero_width_node() {
#[test]
fn test_next_sibling_of_zero_width_node() {
let grammar_json = load_grammar_file(
&fixtures_dir()
.join("test_grammars")
.join("next_sibling_from_zwt")
.join("grammar.js"),
None,
)
.unwrap();
let (parser_name, parser_code) = generate_parser(&grammar_json).unwrap();
let mut parser = Parser::new();
let language = get_test_language(&parser_name, &parser_code, None);
let language = get_test_fixture_language("next_sibling_from_zwt");
parser.set_language(&language).unwrap();
let tree = parser.parse("abdef", None).unwrap();

View file

@ -6,7 +6,6 @@ use std::{
use tree_sitter::{
Decode, IncludedRangesError, InputEdit, LogType, ParseOptions, ParseState, Parser, Point, Range,
};
use tree_sitter_generate::load_grammar_file;
use tree_sitter_proc_macro::retry;
use super::helpers::{
@ -17,7 +16,7 @@ use super::helpers::{
use crate::{
fuzz::edits::Edit,
parse::perform_edit,
tests::{generate_parser, helpers::fixtures::fixtures_dir, invert_edit},
tests::{generate_parser, helpers::fixtures::get_test_fixture_language, invert_edit},
};
#[test]
@ -88,7 +87,6 @@ fn test_parsing_with_logging() {
}
#[test]
#[cfg(unix)]
fn test_parsing_with_debug_graph_enabled() {
use std::io::{BufRead, BufReader, Seek};
@ -482,15 +480,9 @@ fn test_parsing_empty_file_with_reused_tree() {
#[test]
fn test_parsing_after_editing_tree_that_depends_on_column_values() {
let dir = fixtures_dir()
.join("test_grammars")
.join("uses_current_column");
let grammar_json = load_grammar_file(&dir.join("grammar.js"), None).unwrap();
let (grammar_name, parser_code) = generate_parser(&grammar_json).unwrap();
let mut parser = Parser::new();
parser
.set_language(&get_test_language(&grammar_name, &parser_code, Some(&dir)))
.set_language(&get_test_fixture_language("uses_current_column"))
.unwrap();
let mut code = b"
@ -559,16 +551,9 @@ h + i
#[test]
fn test_parsing_after_editing_tree_that_depends_on_column_position() {
let dir = fixtures_dir()
.join("test_grammars")
.join("depends_on_column");
let grammar_json = load_grammar_file(&dir.join("grammar.js"), None).unwrap();
let (grammar_name, parser_code) = generate_parser(grammar_json.as_str()).unwrap();
let mut parser = Parser::new();
parser
.set_language(&get_test_language(&grammar_name, &parser_code, Some(&dir)))
.set_language(&get_test_fixture_language("depends_on_column"))
.unwrap();
let mut code = b"\n x".to_vec();
@ -1702,13 +1687,9 @@ if foo && bar || baz {}
#[test]
fn test_parsing_with_scanner_logging() {
let dir = fixtures_dir().join("test_grammars").join("external_tokens");
let grammar_json = load_grammar_file(&dir.join("grammar.js"), None).unwrap();
let (grammar_name, parser_code) = generate_parser(&grammar_json).unwrap();
let mut parser = Parser::new();
parser
.set_language(&get_test_language(&grammar_name, &parser_code, Some(&dir)))
.set_language(&get_test_fixture_language("external_tokens"))
.unwrap();
let mut found = false;
@ -1726,13 +1707,9 @@ fn test_parsing_with_scanner_logging() {
#[test]
fn test_parsing_get_column_at_eof() {
let dir = fixtures_dir().join("test_grammars").join("get_col_eof");
let grammar_json = load_grammar_file(&dir.join("grammar.js"), None).unwrap();
let (grammar_name, parser_code) = generate_parser(&grammar_json).unwrap();
let mut parser = Parser::new();
parser
.set_language(&get_test_language(&grammar_name, &parser_code, Some(&dir)))
.set_language(&get_test_fixture_language("get_col_eof"))
.unwrap();
parser.parse("a", None).unwrap();

View file

@ -17,7 +17,10 @@ use super::helpers::{
};
use crate::tests::{
generate_parser,
helpers::query_helpers::{collect_captures, collect_matches},
helpers::{
fixtures::get_test_fixture_language,
query_helpers::{collect_captures, collect_matches},
},
ITERATION_COUNT,
};
@ -330,6 +333,16 @@ fn test_query_errors_on_invalid_symbols() {
message: "alternatives".to_string()
}
);
assert_eq!(
Query::new(&language, "fakefield: (identifier)").unwrap_err(),
QueryError {
row: 0,
offset: 0,
column: 0,
kind: QueryErrorKind::Field,
message: "fakefield".to_string()
}
);
});
}
@ -2978,6 +2991,61 @@ fn test_query_matches_with_deeply_nested_patterns_with_fields() {
});
}
#[test]
fn test_query_matches_with_alternations_and_predicates() {
allocations::record(|| {
let language = get_language("java");
let query = Query::new(
&language,
"
(block
[
(local_variable_declaration
(variable_declarator
(identifier) @def.a
(string_literal) @lit.a
)
)
(local_variable_declaration
(variable_declarator
(identifier) @def.b
(null_literal) @lit.b
)
)
]
(expression_statement
(method_invocation [
(argument_list
(identifier) @ref.a
(string_literal)
)
(argument_list
(null_literal)
(identifier) @ref.b
)
])
)
(#eq? @def.a @ref.a )
(#eq? @def.b @ref.b )
)
",
)
.unwrap();
assert_query_matches(
&language,
&query,
r#"
void test() {
int a = "foo";
f(null, b);
}
"#,
&[],
);
});
}
#[test]
fn test_query_matches_with_indefinite_step_containing_no_captures() {
allocations::record(|| {
@ -5621,3 +5689,63 @@ const foo = [
assert_eq!(matches.len(), 1);
assert_eq!(matches[0].1, captures);
}
#[test]
fn test_query_with_predicate_causing_oob_access() {
let language = get_language("rust");
let query = "(call_expression
function: (scoped_identifier
path: (scoped_identifier (identifier) @_regex (#any-of? @_regex \"Regex\" \"RegexBuilder\") .))
(#set! injection.language \"regex\"))";
Query::new(&language, query).unwrap();
}
#[test]
fn test_query_with_anonymous_error_node() {
let language = get_test_fixture_language("anonymous_error");
let mut parser = Parser::new();
parser.set_language(&language).unwrap();
let source = "ERROR";
let tree = parser.parse(source, None).unwrap();
let query = Query::new(
&language,
r#"
"ERROR" @error
(document "ERROR" @error)
"#,
)
.unwrap();
let mut cursor = QueryCursor::new();
let matches = cursor.matches(&query, tree.root_node(), source.as_bytes());
let matches = collect_matches(matches, &query, source);
assert_eq!(
matches,
vec![(1, vec![("error", "ERROR")]), (0, vec![("error", "ERROR")])]
);
}
#[test]
fn test_query_allows_error_nodes_with_children() {
allocations::record(|| {
let language = get_language("cpp");
let code = "SomeStruct foo{.bar{}};";
let mut parser = Parser::new();
parser.set_language(&language).unwrap();
let tree = parser.parse(code, None).unwrap();
let root = tree.root_node();
let query = Query::new(&language, "(initializer_list (ERROR) @error)").unwrap();
let mut cursor = QueryCursor::new();
let matches = cursor.matches(&query, root, code.as_bytes());
let matches = collect_matches(matches, &query, code);
assert_eq!(matches, &[(0, vec![("error", ".bar")])]);
});
}

View file

@ -401,8 +401,11 @@ fn test_tags_via_c_api() {
let syntax_types = unsafe {
let mut len = 0;
let ptr =
c::ts_tagger_syntax_kinds_for_scope_name(tagger, c_scope_name.as_ptr(), &mut len);
let ptr = c::ts_tagger_syntax_kinds_for_scope_name(
tagger,
c_scope_name.as_ptr(),
&raw mut len,
);
slice::from_raw_parts(ptr, len as usize)
.iter()
.map(|i| CStr::from_ptr(*i).to_str().unwrap())

View file

@ -107,6 +107,19 @@ fn test_text_provider_for_arc_of_bytes_slice() {
check_parsing(text.clone(), text.as_ref());
}
#[test]
fn test_text_provider_for_vec_utf16_text() {
let source_text = "你好".encode_utf16().collect::<Vec<_>>();
let language = get_language("c");
let mut parser = Parser::new();
parser.set_language(&language).unwrap();
let tree = parser.parse_utf16_le(&source_text, None).unwrap();
let tree_text = tree.root_node().utf16_text(&source_text);
assert_eq!(source_text, tree_text);
}
#[test]
fn test_text_provider_callback_with_str_slice() {
let text: &str = "// comment";

View file

@ -3,7 +3,11 @@ use std::str;
use tree_sitter::{InputEdit, Parser, Point, Range, Tree};
use super::helpers::fixtures::get_language;
use crate::{fuzz::edits::Edit, parse::perform_edit, tests::invert_edit};
use crate::{
fuzz::edits::Edit,
parse::perform_edit,
tests::{helpers::fixtures::get_test_fixture_language, invert_edit},
};
#[test]
fn test_tree_edit() {
@ -377,6 +381,40 @@ fn test_tree_cursor() {
assert_eq!(copy.node().kind(), "struct_item");
}
#[test]
fn test_tree_cursor_previous_sibling_with_aliases() {
let mut parser = Parser::new();
parser
.set_language(&get_test_fixture_language("aliases_in_root"))
.unwrap();
let text = "# comment\n# \nfoo foo";
let tree = parser.parse(text, None).unwrap();
let mut cursor = tree.walk();
assert_eq!(cursor.node().kind(), "document");
cursor.goto_first_child();
assert_eq!(cursor.node().kind(), "comment");
assert!(cursor.goto_next_sibling());
assert_eq!(cursor.node().kind(), "comment");
assert!(cursor.goto_next_sibling());
assert_eq!(cursor.node().kind(), "bar");
assert!(cursor.goto_previous_sibling());
assert_eq!(cursor.node().kind(), "comment");
assert!(cursor.goto_previous_sibling());
assert_eq!(cursor.node().kind(), "comment");
assert!(cursor.goto_next_sibling());
assert_eq!(cursor.node().kind(), "comment");
assert!(cursor.goto_next_sibling());
assert_eq!(cursor.node().kind(), "bar");
}
#[test]
fn test_tree_cursor_previous_sibling() {
let mut parser = Parser::new();

View file

@ -1,14 +1,13 @@
use std::{fs, sync::LazyLock};
use std::fs;
use streaming_iterator::StreamingIterator;
use tree_sitter::{
wasmtime::Engine, Parser, Query, QueryCursor, WasmError, WasmErrorKind, WasmStore,
use tree_sitter::{Parser, Query, QueryCursor, WasmError, WasmErrorKind, WasmStore};
use crate::tests::helpers::{
allocations,
fixtures::{get_test_fixture_language_wasm, ENGINE, WASM_DIR},
};
use crate::tests::helpers::{allocations, fixtures::WASM_DIR};
static ENGINE: LazyLock<Engine> = LazyLock::new(Engine::default);
#[test]
fn test_wasm_stdlib_symbols() {
let symbols = tree_sitter::wasm_stdlib_symbols().collect::<Vec<_>>();
@ -92,6 +91,33 @@ fn test_load_wasm_javascript_language() {
});
}
#[test]
fn test_load_wasm_python_language() {
allocations::record(|| {
let mut store = WasmStore::new(&ENGINE).unwrap();
let mut parser = Parser::new();
let wasm = fs::read(WASM_DIR.join("tree-sitter-python.wasm")).unwrap();
let language = store.load_language("python", &wasm).unwrap();
parser.set_wasm_store(store).unwrap();
parser.set_language(&language).unwrap();
let tree = parser.parse("a = b\nc = d", None).unwrap();
assert_eq!(tree.root_node().to_sexp(), "(module (expression_statement (assignment left: (identifier) right: (identifier))) (expression_statement (assignment left: (identifier) right: (identifier))))");
});
}
#[test]
fn test_load_fixture_language_wasm() {
allocations::record(|| {
let store = WasmStore::new(&ENGINE).unwrap();
let mut parser = Parser::new();
let language = get_test_fixture_language_wasm("epsilon_external_tokens");
parser.set_wasm_store(store).unwrap();
parser.set_language(&language).unwrap();
let tree = parser.parse("hello", None).unwrap();
assert_eq!(tree.root_node().to_sexp(), "(document (zero_width))");
});
}
#[test]
fn test_load_multiple_wasm_languages() {
allocations::record(|| {

View file

@ -5,7 +5,7 @@ use std::{
use anyhow::{anyhow, Context, Result};
use tree_sitter::wasm_stdlib_symbols;
use tree_sitter_generate::parse_grammar::GrammarJSON;
use tree_sitter_generate::{load_grammar_file, parse_grammar::GrammarJSON};
use tree_sitter_loader::Loader;
use wasmparser::Parser;
@ -23,10 +23,18 @@ pub fn load_language_wasm_file(language_dir: &Path) -> Result<(String, Vec<u8>)>
pub fn get_grammar_name(language_dir: &Path) -> Result<String> {
let src_dir = language_dir.join("src");
let grammar_json_path = src_dir.join("grammar.json");
let grammar_json = fs::read_to_string(&grammar_json_path)
.with_context(|| format!("Failed to read grammar file {grammar_json_path:?}"))?;
let grammar: GrammarJSON = serde_json::from_str(&grammar_json)
.with_context(|| format!("Failed to parse grammar file {grammar_json_path:?}"))?;
let grammar_json = fs::read_to_string(&grammar_json_path).with_context(|| {
format!(
"Failed to read grammar file {}",
grammar_json_path.display()
)
})?;
let grammar: GrammarJSON = serde_json::from_str(&grammar_json).with_context(|| {
format!(
"Failed to parse grammar file {}",
grammar_json_path.display()
)
})?;
Ok(grammar.name)
}
@ -38,7 +46,8 @@ pub fn compile_language_to_wasm(
output_file: Option<PathBuf>,
force_docker: bool,
) -> Result<()> {
let grammar_name = get_grammar_name(language_dir)?;
let grammar_name = get_grammar_name(language_dir)
.or_else(|_| load_grammar_file(&language_dir.join("grammar.js"), None))?;
let output_filename =
output_file.unwrap_or_else(|| output_dir.join(format!("tree-sitter-{grammar_name}.wasm")));
let src_path = language_dir.join("src");

View file

@ -22,8 +22,15 @@
"examples": [
"Rust",
"HTML"
],
"$comment": "This is used in the description and the class names."
]
},
"title": {
"type": "string",
"description": "The title of the language.",
"examples": [
"Rust",
"HTML"
]
},
"scope": {
"type": "string",
@ -237,9 +244,7 @@
"properties": {
"c": {
"type": "boolean",
"default": true,
"const": true,
"$comment": "Always generated"
"default": true
},
"go": {
"type": "boolean",
@ -255,9 +260,7 @@
},
"node": {
"type": "boolean",
"default": true,
"const": true,
"$comment": "Always generated (for now)"
"default": true
},
"python": {
"type": "boolean",
@ -265,9 +268,7 @@
},
"rust": {
"type": "boolean",
"default": true,
"const": true,
"$comment": "Always generated"
"default": true
},
"swift": {
"type": "boolean",

View file

@ -246,6 +246,21 @@
"required": ["type", "content"]
},
"reserved-rule": {
"type": "object",
"properties": {
"type": {
"type": "string",
"const": "RESERVED"
},
"context_name": { "type": "string" },
"content": {
"$ref": "#/definitions/rule"
}
},
"required": ["type", "context_name", "content"]
},
"token-rule": {
"type": "object",
"properties": {
@ -313,6 +328,7 @@
{ "$ref": "#/definitions/choice-rule" },
{ "$ref": "#/definitions/repeat1-rule" },
{ "$ref": "#/definitions/repeat-rule" },
{ "$ref": "#/definitions/reserved-rule" },
{ "$ref": "#/definitions/token-rule" },
{ "$ref": "#/definitions/field-rule" },
{ "$ref": "#/definitions/prec-rule" }

View file

@ -66,7 +66,7 @@ Suppress main output.
### `--edits <EDITS>...`
Apply edits after parsing the file. Edits are in the form of `row, col delcount insert_text` where row and col are 0-indexed.
Apply edits after parsing the file. Edits are in the form of `row,col|position delcount insert_text` where row and col, or position are 0-indexed.
### `--encoding <ENCODING>`

View file

@ -143,6 +143,8 @@ pub struct HtmlRenderer {
pub html: Vec<u8>,
pub line_offsets: Vec<u32>,
carriage_return_highlight: Option<Highlight>,
// The offset in `self.html` of the last carriage return.
last_carriage_return: Option<usize>,
}
#[derive(Debug)]
@ -1090,6 +1092,7 @@ impl HtmlRenderer {
html: Vec::with_capacity(BUFFER_HTML_RESERVE_CAPACITY),
line_offsets: Vec::with_capacity(BUFFER_LINES_RESERVE_CAPACITY),
carriage_return_highlight: None,
last_carriage_return: None,
};
result.line_offsets.push(0);
result
@ -1131,6 +1134,9 @@ impl HtmlRenderer {
Err(a) => return Err(a),
}
}
if let Some(offset) = self.last_carriage_return.take() {
self.add_carriage_return(offset, attribute_callback);
}
if self.html.last() != Some(&b'\n') {
self.html.push(b'\n');
}
@ -1155,14 +1161,21 @@ impl HtmlRenderer {
})
}
fn add_carriage_return<F>(&mut self, attribute_callback: &F)
fn add_carriage_return<F>(&mut self, offset: usize, attribute_callback: &F)
where
F: Fn(Highlight, &mut Vec<u8>),
{
if let Some(highlight) = self.carriage_return_highlight {
// If a CR is the last character in a `HighlightEvent::Source`
// region, then we don't know until the next `Source` event or EOF
// whether it is part of CRLF or on its own. To avoid unbounded
// lookahead, save the offset of the CR and insert there now that we
// know.
let rest = self.html.split_off(offset);
self.html.extend(b"<span ");
(attribute_callback)(highlight, &mut self.html);
self.html.extend(b"></span>");
self.html.extend(rest);
}
}
@ -1194,19 +1207,17 @@ impl HtmlRenderer {
}
}
let mut last_char_was_cr = false;
for c in LossyUtf8::new(src).flat_map(|p| p.bytes()) {
// Don't render carriage return characters, but allow lone carriage returns (not
// followed by line feeds) to be styled via the attribute callback.
if c == b'\r' {
last_char_was_cr = true;
self.last_carriage_return = Some(self.html.len());
continue;
}
if last_char_was_cr {
if let Some(offset) = self.last_carriage_return.take() {
if c != b'\n' {
self.add_carriage_return(attribute_callback);
self.add_carriage_return(offset, attribute_callback);
}
last_char_was_cr = false;
}
// At line boundaries, close and re-open all of the open tags.

View file

@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.13)
project(tree-sitter
VERSION "0.25.1"
VERSION "0.25.10"
DESCRIPTION "An incremental parsing system for programming tools"
HOMEPAGE_URL "https://tree-sitter.github.io/tree-sitter/"
LANGUAGES C)
@ -81,15 +81,15 @@ set_target_properties(tree-sitter
SOVERSION "${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}"
DEFINE_SYMBOL "")
target_compile_definitions(tree-sitter PRIVATE _POSIX_C_SOURCE=200112L _DEFAULT_SOURCE)
configure_file(tree-sitter.pc.in "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter.pc" @ONLY)
target_compile_definitions(tree-sitter PRIVATE _POSIX_C_SOURCE=200112L _DEFAULT_SOURCE _DARWIN_C_SOURCE)
include(GNUInstallDirs)
configure_file(tree-sitter.pc.in "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter.pc" @ONLY)
install(FILES include/tree_sitter/api.h
DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/tree_sitter")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/tree-sitter.pc"
DESTINATION "${CMAKE_INSTALL_DATAROOTDIR}/pkgconfig")
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
install(TARGETS tree-sitter
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}")

View file

@ -43,6 +43,7 @@ fn main() {
.include(&include_path)
.define("_POSIX_C_SOURCE", "200112L")
.define("_DEFAULT_SOURCE", None)
.define("_DARWIN_C_SOURCE", None)
.warnings(false)
.file(src_path.join("lib.c"))
.compile("tree-sitter");
@ -112,7 +113,10 @@ fn generate_bindings(out_dir: &std::path::Path) {
.expect("Failed to generate bindings");
let bindings_rs = out_dir.join("bindings.rs");
bindings
.write_to_file(&bindings_rs)
.unwrap_or_else(|_| panic!("Failed to write bindings into path: {bindings_rs:?}"));
bindings.write_to_file(&bindings_rs).unwrap_or_else(|_| {
panic!(
"Failed to write bindings into path: {}",
bindings_rs.display()
)
});
}

View file

@ -16,6 +16,7 @@ extern "C" {
}
#[cfg(windows)]
#[cfg(feature = "std")]
extern "C" {
pub(crate) fn _ts_dup(handle: *mut std::os::raw::c_void) -> std::os::raw::c_int;
}

View file

@ -699,8 +699,7 @@ impl Parser {
drop(unsafe { Box::from_raw(prev_logger.payload.cast::<Logger>()) });
}
let c_logger;
if let Some(logger) = logger {
let c_logger = if let Some(logger) = logger {
let container = Box::new(logger);
unsafe extern "C" fn log(
@ -721,16 +720,16 @@ impl Parser {
let raw_container = Box::into_raw(container);
c_logger = ffi::TSLogger {
ffi::TSLogger {
payload: raw_container.cast::<c_void>(),
log: Some(log),
};
}
} else {
c_logger = ffi::TSLogger {
ffi::TSLogger {
payload: ptr::null_mut(),
log: None,
};
}
}
};
unsafe { ffi::ts_parser_set_logger(self.0.as_ptr(), c_logger) };
}
@ -1222,7 +1221,7 @@ impl Parser {
len: u32,
code_point: *mut i32,
) -> u32 {
let (c, len) = D::decode(std::slice::from_raw_parts(data, len as usize));
let (c, len) = D::decode(core::slice::from_raw_parts(data, len as usize));
if let Some(code_point) = code_point.as_mut() {
*code_point = c;
}
@ -1422,7 +1421,7 @@ impl Parser {
if let Some(flag) = flag {
ffi::ts_parser_set_cancellation_flag(
self.0.as_ptr(),
std::ptr::from_ref::<AtomicUsize>(flag).cast::<usize>(),
core::ptr::from_ref::<AtomicUsize>(flag).cast::<usize>(),
);
} else {
ffi::ts_parser_set_cancellation_flag(self.0.as_ptr(), ptr::null());
@ -1432,12 +1431,21 @@ impl Parser {
impl Drop for Parser {
fn drop(&mut self) {
self.stop_printing_dot_graphs();
#[cfg(feature = "std")]
#[cfg(not(target_os = "wasi"))]
{
self.stop_printing_dot_graphs();
}
self.set_logger(None);
unsafe { ffi::ts_parser_delete(self.0.as_ptr()) }
}
}
#[cfg(windows)]
extern "C" {
fn _open_osfhandle(osfhandle: isize, flags: core::ffi::c_int) -> core::ffi::c_int;
}
impl Tree {
/// Get the root node of the syntax tree.
#[doc(alias = "ts_tree_root_node")]
@ -1547,7 +1555,8 @@ impl Tree {
#[cfg(windows)]
{
let handle = file.as_raw_handle();
unsafe { ffi::ts_tree_print_dot_graph(self.0.as_ptr(), handle as i32) }
let fd = unsafe { _open_osfhandle(handle as isize, 0) };
unsafe { ffi::ts_tree_print_dot_graph(self.0.as_ptr(), fd) }
}
}
}
@ -2058,7 +2067,7 @@ impl<'tree> Node<'tree> {
#[must_use]
pub fn utf16_text<'a>(&self, source: &'a [u16]) -> &'a [u16] {
&source[self.start_byte()..self.end_byte()]
&source[self.start_byte() / 2..self.end_byte() / 2]
}
/// Create a new [`TreeCursor`] starting from this node.
@ -2087,7 +2096,7 @@ impl<'tree> Node<'tree> {
impl PartialEq for Node<'_> {
fn eq(&self, other: &Self) -> bool {
self.0.id == other.0.id
core::ptr::eq(self.0.id, other.0.id)
}
}
@ -2440,7 +2449,7 @@ impl Query {
// Error types that report names
ffi::TSQueryErrorNodeType | ffi::TSQueryErrorField | ffi::TSQueryErrorCapture => {
let suffix = source.split_at(offset).1;
let in_quotes = source.as_bytes()[offset - 1] == b'"';
let in_quotes = offset > 0 && source.as_bytes()[offset - 1] == b'"';
let mut backslashes = 0;
let end_offset = suffix
.find(|c| {
@ -3349,9 +3358,11 @@ impl<'tree> QueryMatch<'_, 'tree> {
.iter()
.all(|predicate| match predicate {
TextPredicateCapture::EqCapture(i, j, is_positive, match_all_nodes) => {
let mut nodes_1 = self.nodes_for_capture_index(*i);
let mut nodes_2 = self.nodes_for_capture_index(*j);
while let (Some(node1), Some(node2)) = (nodes_1.next(), nodes_2.next()) {
let mut nodes_1 = self.nodes_for_capture_index(*i).peekable();
let mut nodes_2 = self.nodes_for_capture_index(*j).peekable();
while nodes_1.peek().is_some() && nodes_2.peek().is_some() {
let node1 = nodes_1.next().unwrap();
let node2 = nodes_2.next().unwrap();
let mut text1 = text_provider.text(node1);
let mut text2 = text_provider.text(node2);
let text1 = node_text1.get_text(&mut text1);

View file

@ -44,17 +44,22 @@ static inline void marshal_node(const void **buffer, TSNode node) {
buffer[4] = (const void *)node.context[3];
}
static inline TSNode unmarshal_node(const TSTree *tree) {
static inline TSNode unmarshal_node_at(const TSTree *tree, uint32_t index) {
TSNode node;
node.id = TRANSFER_BUFFER[0];
node.context[0] = code_unit_to_byte((uint32_t)TRANSFER_BUFFER[1]);
node.context[1] = (uint32_t)TRANSFER_BUFFER[2];
node.context[2] = code_unit_to_byte((uint32_t)TRANSFER_BUFFER[3]);
node.context[3] = (uint32_t)TRANSFER_BUFFER[4];
const void **buffer = TRANSFER_BUFFER + index * SIZE_OF_NODE;
node.id = buffer[0];
node.context[0] = code_unit_to_byte((uint32_t)buffer[1]);
node.context[1] = (uint32_t)buffer[2];
node.context[2] = code_unit_to_byte((uint32_t)buffer[3]);
node.context[3] = (uint32_t)buffer[4];
node.tree = tree;
return node;
}
static inline TSNode unmarshal_node(const TSTree *tree) {
return unmarshal_node_at(tree, 0);
}
static inline void marshal_cursor(const TSTreeCursor *cursor) {
TRANSFER_BUFFER[0] = cursor->id;
TRANSFER_BUFFER[1] = (const void *)cursor->context[0];
@ -616,7 +621,7 @@ void ts_node_parent_wasm(const TSTree *tree) {
void ts_node_child_with_descendant_wasm(const TSTree *tree) {
TSNode node = unmarshal_node(tree);
TSNode descendant = unmarshal_node(tree);
TSNode descendant = unmarshal_node_at(tree, 1);
marshal_node(TRANSFER_BUFFER, ts_node_child_with_descendant(node, descendant));
}

View file

@ -50,7 +50,7 @@ declare namespace RuntimeExports {
function setValue(ptr: number, value: number, type?: string): void;
let currentParseCallback: ((index: number, position: {row: number, column: number}) => string | undefined) | null;
let currentLogCallback: ((message: string, isLex: boolean) => void) | null;
let currentProgressCallback: ((state: {currentOffset: number}) => void) | null;
let currentProgressCallback: ((state: {currentOffset: number, hasError: boolean}) => void) | null;
let currentQueryProgressCallback: ((state: {currentOffset: number}) => void) | null;
let HEAPF32: Float32Array;
let HEAPF64: Float64Array;

File diff suppressed because it is too large Load diff

View file

@ -1,9 +1,12 @@
{
"name": "web-tree-sitter",
"version": "0.25.1",
"version": "0.25.10",
"description": "Tree-sitter bindings for the web",
"repository": "https://github.com/tree-sitter/tree-sitter",
"homepage": "https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web",
"repository": {
"type": "git",
"url": "git+https://github.com/tree-sitter/tree-sitter.git",
"directory": "lib/binding_web"
},
"license": "MIT",
"author": {
"name": "Max Brunsfeld",
@ -19,12 +22,16 @@
"exports": {
".": {
"import": "./tree-sitter.js",
"require": "./tree-sitter.cjs"
"require": "./tree-sitter.cjs",
"types": "./web-tree-sitter.d.ts"
},
"./tree-sitter.wasm": "./tree-sitter.wasm",
"./debug": {
"import": "./debug/tree-sitter.js",
"require": "./debug/tree-sitter.cjs"
}
"require": "./debug/tree-sitter.cjs",
"types": "./web-tree-sitter.d.ts"
},
"./debug/tree-sitter.wasm": "./debug/tree-sitter.wasm"
},
"types": "web-tree-sitter.d.ts",
"keywords": [
@ -54,10 +61,9 @@
"lib/*.h"
],
"devDependencies": {
"@eslint/js": "^9.19.0",
"@types/emscripten": "^1.40.0",
"@types/node": "^22.12.0",
"@vitest/coverage-v8": "^3.0.4",
"@eslint/js": "^9.20.0",
"@types/node": "^22.13.1",
"@vitest/coverage-v8": "^3.0.5",
"dts-buddy": "^0.5.4",
"esbuild": "^0.24.2",
"eslint": "^9.19.0",
@ -67,8 +73,16 @@
"typescript-eslint": "^8.22.0",
"vitest": "^3.0.4"
},
"peerDependencies": {
"@types/emscripten": "^1.40.0"
},
"peerDependenciesMeta": {
"@types/emscripten": {
"optional": true
}
},
"scripts": {
"build:ts": "node script/build.js",
"build:ts": "tsc -b . && node script/build.js",
"build:wasm": "cd ../../ && cargo xtask build-wasm",
"build:wasm:debug": "cd ../../ && cargo xtask build-wasm --debug",
"build": "npm run build:wasm && npm run build:ts",

View file

@ -1,4 +1,4 @@
export {
export type {
Point,
Range,
Edit,
@ -7,8 +7,8 @@ export {
LogCallback,
} from './constants';
export {
ParseOptions,
ParseState,
type ParseOptions,
type ParseState,
LANGUAGE_VERSION,
MIN_COMPATIBLE_VERSION,
Parser,
@ -18,14 +18,14 @@ export { Tree } from './tree';
export { Node } from './node';
export { TreeCursor } from './tree_cursor';
export {
QueryOptions,
QueryState,
QueryProperties,
QueryPredicate,
QueryCapture,
QueryMatch,
type QueryOptions,
type QueryState,
type QueryProperties,
type QueryPredicate,
type QueryCapture,
type QueryMatch,
CaptureQuantifier,
PredicateStep,
type PredicateStep,
Query,
} from './query';
} from './query';
export { LookaheadIterator } from './lookahead_iterator';

View file

@ -6,7 +6,7 @@ import { Query } from './query';
const LANGUAGE_FUNCTION_REGEX = /^tree_sitter_\w+$/;
export class LanguageMetadata {
export interface LanguageMetadata {
readonly major_version: number;
readonly minor_version: number;
readonly patch_version: number;
@ -261,8 +261,7 @@ export class Language {
} else {
// eslint-disable-next-line @typescript-eslint/no-unnecessary-condition
if (globalThis.process?.versions.node) {
// eslint-disable-next-line @typescript-eslint/no-unsafe-assignment, @typescript-eslint/no-require-imports
const fs: typeof import('fs/promises') = require('fs/promises');
const fs: typeof import('fs/promises') = await import('fs/promises');
bytes = fs.readFile(input);
} else {
bytes = fetch(input)

View file

@ -34,8 +34,8 @@ export function unmarshalCaptures(
*
* Marshals a {@link Node} to the transfer buffer.
*/
export function marshalNode(node: Node) {
let address = TRANSFER_BUFFER;
export function marshalNode(node: Node, index = 0) {
let address = TRANSFER_BUFFER + index * SIZE_OF_NODE;
C.setValue(address, node.id, 'i32');
address += SIZE_OF_INT;
C.setValue(address, node.startIndex, 'i32');
@ -168,10 +168,9 @@ export function marshalEdit(edit: Edit, address = TRANSFER_BUFFER) {
*
* Unmarshals a {@link LanguageMetadata} from the transfer buffer.
*/
export function unmarshalLanguageMetadata(address: number): LanguageMetadata {
const result = {} as LanguageMetadata;
result.major_version = C.getValue(address, 'i32'); address += SIZE_OF_INT;
result.minor_version = C.getValue(address, 'i32'); address += SIZE_OF_INT;
result.field_count = C.getValue(address, 'i32');
return result;
export function unmarshalLanguageMetadata(address: number): LanguageMetadata {
const major_version = C.getValue(address, 'i32');
const minor_version = C.getValue(address += SIZE_OF_INT, 'i32');
const patch_version = C.getValue(address += SIZE_OF_INT, 'i32');
return { major_version, minor_version, patch_version };
}

View file

@ -9,7 +9,8 @@ import { TRANSFER_BUFFER } from './parser';
/** A single node within a syntax {@link Tree}. */
export class Node {
/** @internal */
private [0] = 0; // Internal handle for WASM
// @ts-expect-error: never read
private [0] = 0; // Internal handle for Wasm
/** @internal */
private _children?: (Node | null)[];
@ -416,6 +417,11 @@ export class Node {
// Convert the type strings to numeric type symbols
const symbols: number[] = [];
const typesBySymbol = this.tree.language.types;
for (const node_type of types) {
if (node_type == "ERROR") {
symbols.push(65535); // Internally, ts_builtin_sym_error is -1, which is UINT_16MAX
}
}
for (let i = 0, n = typesBySymbol.length; i < n; i++) {
if (types.includes(typesBySymbol[i])) {
symbols.push(i);
@ -517,7 +523,7 @@ export class Node {
*/
childWithDescendant(descendant: Node): Node | null {
marshalNode(this);
marshalNode(descendant);
marshalNode(descendant, 1);
C._ts_node_child_with_descendant_wasm(this.tree[0]);
return unmarshalNode(this.tree);
}
@ -630,7 +636,7 @@ export class Node {
}
/** Get the S-expression representation of this node. */
toString() {
toString(): string {
marshalNode(this);
const address = C._ts_node_to_string_wasm(this.tree[0]);
const result = C.AsciiToString(address);

View file

@ -7,16 +7,20 @@ import { getText, Tree } from './tree';
/** A stateful object for walking a syntax {@link Tree} efficiently. */
export class TreeCursor {
/** @internal */
private [0] = 0; // Internal handle for WASM
// @ts-expect-error: never read
private [0] = 0; // Internal handle for Wasm
/** @internal */
private [1] = 0; // Internal handle for WASM
// @ts-expect-error: never read
private [1] = 0; // Internal handle for Wasm
/** @internal */
private [2] = 0; // Internal handle for WASM
// @ts-expect-error: never read
private [2] = 0; // Internal handle for Wasm
/** @internal */
private [3] = 0; // Internal handle for WASM
// @ts-expect-error: never read
private [3] = 0; // Internal handle for Wasm
/** @internal */
private tree: Tree;

View file

@ -89,6 +89,7 @@ describe('Language', () => {
'_literal_pattern',
'captured_pattern',
'const_block',
'generic_pattern',
'identifier',
'macro_invocation',
'mut_pattern',

View file

@ -63,7 +63,7 @@ describe('Node', () => {
tree = parser.parse('x10 + 1000')!;
expect(tree.rootNode.children).toHaveLength(1);
const sumNode = tree.rootNode.firstChild!.firstChild!;
expect(sumNode.children.map(child => child!.type)).toEqual(['identifier', '+', 'number' ]);
expect(sumNode.children.map(child => child!.type)).toEqual(['identifier', '+', 'number']);
});
});
@ -189,6 +189,21 @@ describe('Node', () => {
});
});
describe('.childWithDescendant()', () => {
it('correctly retrieves immediate children', () => {
const sourceCode = 'let x = 1; console.log(x);';
tree = parser.parse(sourceCode)!;
const root = tree.rootNode;
const child = root.children[0]!.children[0]!;
const a = root.childWithDescendant(child);
expect(a!.startIndex).toBe(0);
const b = a!.childWithDescendant(child);
expect(b).toEqual(child);
const c = b!.childWithDescendant(child);
expect(c).toBeNull();
});
});
describe('.nextSibling and .previousSibling', () => {
it('returns the node\'s next and previous sibling', () => {
tree = parser.parse('x10 + 1000')!;
@ -449,6 +464,24 @@ describe('Node', () => {
});
});
describe('.descendantsOfType("ERROR")', () => {
it('finds all of the descendants of an ERROR node', () => {
tree = parser.parse(
`if ({a: 'b'} {c: 'd'}) {
// ^ ERROR
x = function(a) { b; } function(c) { d; }
}`
)!;
const errorNode = tree.rootNode;
const descendants = errorNode.descendantsOfType('ERROR');
expect(
descendants.map((node) => node!.startIndex)
).toEqual(
[4]
);
});
});
describe('.descendantsOfType', () => {
it('finds all descendants of a given type in the given range', () => {
tree = parser.parse('a + 1 * b * 2 + c + 3')!;

View file

@ -256,7 +256,7 @@ describe('Parser', () => {
expect(() => parser.parse({})).toThrow('Argument must be a string or a function');
});
it('handles long input strings', { timeout: 5000 }, () => {
it('handles long input strings', { timeout: 10000 }, () => {
const repeatCount = 10000;
const inputString = `[${Array(repeatCount).fill('0').join(',')}]`;

View file

@ -64,7 +64,7 @@ describe('Query', () => {
});
describe('.matches', () => {
it('returns all of the matches for the given query', () => {
it('returns all of the matches for the given query', { timeout: 10000 }, () => {
tree = parser.parse('function one() { two(); function three() {} }')!;
query = new Query(JavaScript, `
(function_declaration name: (identifier) @fn-def)
@ -462,7 +462,7 @@ describe('Query', () => {
});
describe('Set a timeout', () => {
it('returns less than the expected matches', () => {
it('returns less than the expected matches', { timeout: 10000 }, () => {
tree = parser.parse('function foo() while (true) { } }\n'.repeat(1000))!;
query = new Query(JavaScript, '(function_declaration name: (identifier) @function)');
const matches = query.matches(tree.rootNode, { timeoutMicros: 1000 });
@ -538,7 +538,7 @@ describe('Query', () => {
});
});
describe('Executes with a timeout', () => {
describe('Executes with a timeout', { timeout: 10000 }, () => {
it('Returns less than the expected matches', () => {
tree = parser.parse('function foo() while (true) { } }\n'.repeat(1000))!;
query = new Query(JavaScript, '(function_declaration) @function');

View file

@ -25,11 +25,14 @@
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"skipLibCheck": true,
"composite": true,
"isolatedModules": true,
},
"include": [
"src/**/*",
"script/**/*",
"test/**/*",
"src/*.ts",
"script/*",
"test/*",
"lib/*.ts"
],
"exclude": [
"node_modules",

View file

@ -42,7 +42,6 @@ typedef uint16_t TSStateId;
typedef uint16_t TSSymbol;
typedef uint16_t TSFieldId;
typedef struct TSLanguage TSLanguage;
typedef struct TSLanguageMetadata TSLanguageMetadata;
typedef struct TSParser TSParser;
typedef struct TSTree TSTree;
typedef struct TSQuery TSQuery;

View file

@ -1,10 +1,10 @@
[package]
name = "tree-sitter-language"
description = "The tree-sitter Language type, used by the library and by language implementations"
version = "0.1.4"
version = "0.1.5"
authors.workspace = true
edition.workspace = true
rust-version.workspace = true
rust-version = "1.76"
readme = "README.md"
homepage.workspace = true
repository.workspace = true

View file

@ -34,7 +34,7 @@ bool ts_range_array_intersects(
uint32_t end_byte
) {
for (unsigned i = start_index; i < self->size; i++) {
TSRange *range = &self->contents[i];
TSRange *range = array_get(self, i);
if (range->end_byte > start_byte) {
if (range->start_byte >= end_byte) break;
return true;
@ -108,6 +108,7 @@ typedef struct {
const TSLanguage *language;
unsigned visible_depth;
bool in_padding;
Subtree prev_external_token;
} Iterator;
static Iterator iterator_new(
@ -127,6 +128,7 @@ static Iterator iterator_new(
.language = language,
.visible_depth = 1,
.in_padding = false,
.prev_external_token = NULL_SUBTREE,
};
}
@ -157,7 +159,7 @@ static bool iterator_tree_is_visible(const Iterator *self) {
TreeCursorEntry entry = *array_back(&self->cursor.stack);
if (ts_subtree_visible(*entry.subtree)) return true;
if (self->cursor.stack.size > 1) {
Subtree parent = *self->cursor.stack.contents[self->cursor.stack.size - 2].subtree;
Subtree parent = *array_get(&self->cursor.stack, self->cursor.stack.size - 2)->subtree;
return ts_language_alias_at(
self->language,
parent.ptr->production_id,
@ -181,10 +183,10 @@ static void iterator_get_visible_state(
}
for (; i + 1 > 0; i--) {
TreeCursorEntry entry = self->cursor.stack.contents[i];
TreeCursorEntry entry = *array_get(&self->cursor.stack, i);
if (i > 0) {
const Subtree *parent = self->cursor.stack.contents[i - 1].subtree;
const Subtree *parent = array_get(&self->cursor.stack, i - 1)->subtree;
*alias_symbol = ts_language_alias_at(
self->language,
parent->ptr->production_id,
@ -244,6 +246,10 @@ static bool iterator_descend(Iterator *self, uint32_t goal_position) {
position = child_right;
if (!ts_subtree_extra(*child)) structural_child_index++;
Subtree last_external_token = ts_subtree_last_external_token(*child);
if (last_external_token.ptr) {
self->prev_external_token = last_external_token;
}
}
} while (did_descend);
@ -268,6 +274,10 @@ static void iterator_advance(Iterator *self) {
const Subtree *parent = array_back(&self->cursor.stack)->subtree;
uint32_t child_index = entry.child_index + 1;
Subtree last_external_token = ts_subtree_last_external_token(*entry.subtree);
if (last_external_token.ptr) {
self->prev_external_token = last_external_token;
}
if (ts_subtree_child_count(*parent) > child_index) {
Length position = length_add(entry.position, ts_subtree_total_size(*entry.subtree));
uint32_t structural_child_index = entry.structural_child_index;
@ -313,29 +323,41 @@ static IteratorComparison iterator_compare(
TSSymbol new_alias_symbol = 0;
iterator_get_visible_state(old_iter, &old_tree, &old_alias_symbol, &old_start);
iterator_get_visible_state(new_iter, &new_tree, &new_alias_symbol, &new_start);
TSSymbol old_symbol = ts_subtree_symbol(old_tree);
TSSymbol new_symbol = ts_subtree_symbol(new_tree);
if (!old_tree.ptr && !new_tree.ptr) return IteratorMatches;
if (!old_tree.ptr || !new_tree.ptr) return IteratorDiffers;
if (old_alias_symbol != new_alias_symbol || old_symbol != new_symbol) return IteratorDiffers;
uint32_t old_size = ts_subtree_size(old_tree).bytes;
uint32_t new_size = ts_subtree_size(new_tree).bytes;
TSStateId old_state = ts_subtree_parse_state(old_tree);
TSStateId new_state = ts_subtree_parse_state(new_tree);
bool old_has_external_tokens = ts_subtree_has_external_tokens(old_tree);
bool new_has_external_tokens = ts_subtree_has_external_tokens(new_tree);
uint32_t old_error_cost = ts_subtree_error_cost(old_tree);
uint32_t new_error_cost = ts_subtree_error_cost(new_tree);
if (
old_alias_symbol == new_alias_symbol &&
ts_subtree_symbol(old_tree) == ts_subtree_symbol(new_tree)
old_start != new_start ||
old_symbol == ts_builtin_sym_error ||
old_size != new_size ||
old_state == TS_TREE_STATE_NONE ||
new_state == TS_TREE_STATE_NONE ||
((old_state == ERROR_STATE) != (new_state == ERROR_STATE)) ||
old_error_cost != new_error_cost ||
old_has_external_tokens != new_has_external_tokens ||
ts_subtree_has_changes(old_tree) ||
(
old_has_external_tokens &&
!ts_subtree_external_scanner_state_eq(old_iter->prev_external_token, new_iter->prev_external_token)
)
) {
if (old_start == new_start &&
!ts_subtree_has_changes(old_tree) &&
ts_subtree_symbol(old_tree) != ts_builtin_sym_error &&
ts_subtree_size(old_tree).bytes == ts_subtree_size(new_tree).bytes &&
ts_subtree_parse_state(old_tree) != TS_TREE_STATE_NONE &&
ts_subtree_parse_state(new_tree) != TS_TREE_STATE_NONE &&
(ts_subtree_parse_state(old_tree) == ERROR_STATE) ==
(ts_subtree_parse_state(new_tree) == ERROR_STATE)) {
return IteratorMatches;
} else {
return IteratorMayDiffer;
}
return IteratorMayDiffer;
}
return IteratorDiffers;
return IteratorMatches;
}
#ifdef DEBUG_GET_CHANGED_RANGES
@ -348,8 +370,8 @@ static inline void iterator_print_state(Iterator *self) {
"(%-25s %s\t depth:%u [%u, %u] - [%u, %u])",
name, self->in_padding ? "(p)" : " ",
self->visible_depth,
start.row + 1, start.column,
end.row + 1, end.column
start.row, start.column,
end.row, end.column
);
}
#endif
@ -380,7 +402,7 @@ unsigned ts_subtree_get_changed_ranges(
do {
#ifdef DEBUG_GET_CHANGED_RANGES
printf("At [%-2u, %-2u] Compare ", position.extent.row + 1, position.extent.column);
printf("At [%-2u, %-2u] Compare ", position.extent.row, position.extent.column);
iterator_print_state(&old_iter);
printf("\tvs\t");
iterator_print_state(&new_iter);
@ -475,9 +497,9 @@ unsigned ts_subtree_get_changed_ranges(
// Keep track of the current position in the included range differences
// array in order to avoid scanning the entire array on each iteration.
while (included_range_difference_index < included_range_differences->size) {
const TSRange *range = &included_range_differences->contents[
const TSRange *range = array_get(included_range_differences,
included_range_difference_index
];
);
if (range->end_byte <= position.bytes) {
included_range_difference_index++;
} else {

View file

@ -186,7 +186,7 @@ TSSymbol ts_language_symbol_for_name(
uint32_t length,
bool is_named
) {
if (!strncmp(string, "ERROR", length)) return ts_builtin_sym_error;
if (is_named && !strncmp(string, "ERROR", length)) return ts_builtin_sym_error;
uint16_t count = (uint16_t)ts_language_symbol_count(self);
for (TSSymbol i = 0; i < count; i++) {
TSSymbolMetadata metadata = ts_language_symbol_metadata(self, i);

View file

@ -193,7 +193,7 @@ static bool ts_parser__breakdown_top_of_stack(
did_break_down = true;
pending = false;
for (uint32_t i = 0; i < pop.size; i++) {
StackSlice slice = pop.contents[i];
StackSlice slice = *array_get(&pop, i);
TSStateId state = ts_stack_state(self->stack, slice.version);
Subtree parent = *array_front(&slice.subtrees);
@ -212,7 +212,7 @@ static bool ts_parser__breakdown_top_of_stack(
}
for (uint32_t j = 1; j < slice.subtrees.size; j++) {
Subtree tree = slice.subtrees.contents[j];
Subtree tree = *array_get(&slice.subtrees, j);
ts_stack_push(self->stack, slice.version, tree, false, state);
}
@ -396,20 +396,24 @@ static void ts_parser__external_scanner_destroy(
static unsigned ts_parser__external_scanner_serialize(
TSParser *self
) {
uint32_t length;
if (ts_language_is_wasm(self->language)) {
return ts_wasm_store_call_scanner_serialize(
length = ts_wasm_store_call_scanner_serialize(
self->wasm_store,
(uintptr_t)self->external_scanner_payload,
self->lexer.debug_buffer
);
if (ts_wasm_store_has_error(self->wasm_store)) {
self->has_scanner_error = true;
}
} else {
uint32_t length = self->language->external_scanner.serialize(
length = self->language->external_scanner.serialize(
self->external_scanner_payload,
self->lexer.debug_buffer
);
ts_assert(length <= TREE_SITTER_SERIALIZATION_BUFFER_SIZE);
return length;
}
ts_assert(length <= TREE_SITTER_SERIALIZATION_BUFFER_SIZE);
return length;
}
static void ts_parser__external_scanner_deserialize(
@ -556,27 +560,29 @@ static Subtree ts_parser__lex(
external_scanner_state_len
);
// When recovering from an error, ignore any zero-length external tokens
// unless they have changed the external scanner's state. This helps to
// avoid infinite loops which could otherwise occur, because the lexer is
// looking for any possible token, instead of looking for the specific set of
// tokens that are valid in some parse state.
// Avoid infinite loops caused by the external scanner returning empty tokens.
// Empty tokens are needed in some circumstances, e.g. indent/dedent tokens
// in Python. Ignore the following classes of empty tokens:
//
// Note that it's possible that the token end position may be *before* the
// original position of the lexer because of the way that tokens are positioned
// at included range boundaries: when a token is terminated at the start of
// an included range, it is marked as ending at the *end* of the preceding
// included range.
// * Tokens produced during error recovery. When recovering from an error,
// all tokens are allowed, so it's easy to accidentally return unwanted
// empty tokens.
// * Tokens that are marked as 'extra' in the grammar. These don't change
// the parse state, so they would definitely cause an infinite loop.
if (
self->lexer.token_end_position.bytes <= current_position.bytes &&
(error_mode || !ts_stack_has_advanced_since_error(self->stack, version)) &&
!external_scanner_state_changed
) {
LOG(
"ignore_empty_external_token symbol:%s",
SYM_NAME(self->language->external_scanner.symbol_map[self->lexer.data.result_symbol])
)
found_token = false;
TSSymbol symbol = self->language->external_scanner.symbol_map[self->lexer.data.result_symbol];
TSStateId next_parse_state = ts_language_next_state(self->language, parse_state, symbol);
bool token_is_extra = (next_parse_state == parse_state);
if (error_mode || !ts_stack_has_advanced_since_error(self->stack, version) || token_is_extra) {
LOG(
"ignore_empty_external_token symbol:%s",
SYM_NAME(self->language->external_scanner.symbol_map[self->lexer.data.result_symbol])
);
found_token = false;
}
}
}
@ -947,20 +953,22 @@ static StackVersion ts_parser__reduce(
// children.
StackSliceArray pop = ts_stack_pop_count(self->stack, version, count);
uint32_t removed_version_count = 0;
uint32_t halted_version_count = ts_stack_halted_version_count(self->stack);
for (uint32_t i = 0; i < pop.size; i++) {
StackSlice slice = pop.contents[i];
StackSlice slice = *array_get(&pop, i);
StackVersion slice_version = slice.version - removed_version_count;
// This is where new versions are added to the parse stack. The versions
// will all be sorted and truncated at the end of the outer parsing loop.
// Allow the maximum version count to be temporarily exceeded, but only
// by a limited threshold.
if (slice_version > MAX_VERSION_COUNT + MAX_VERSION_COUNT_OVERFLOW) {
if (slice_version > MAX_VERSION_COUNT + MAX_VERSION_COUNT_OVERFLOW + halted_version_count) {
ts_stack_remove_version(self->stack, slice_version);
ts_subtree_array_delete(&self->tree_pool, &slice.subtrees);
removed_version_count++;
while (i + 1 < pop.size) {
StackSlice next_slice = pop.contents[i + 1];
LOG("aborting reduce with too many versions")
StackSlice next_slice = *array_get(&pop, i + 1);
if (next_slice.version != slice.version) break;
ts_subtree_array_delete(&self->tree_pool, &next_slice.subtrees);
i++;
@ -983,7 +991,7 @@ static StackVersion ts_parser__reduce(
// choose one of the arrays of trees to be the parent node's children, and
// delete the rest of the tree arrays.
while (i + 1 < pop.size) {
StackSlice next_slice = pop.contents[i + 1];
StackSlice next_slice = *array_get(&pop, i + 1);
if (next_slice.version != slice.version) break;
i++;
@ -1025,7 +1033,7 @@ static StackVersion ts_parser__reduce(
// were previously on top of the stack.
ts_stack_push(self->stack, slice_version, ts_subtree_from_mut(parent), false, next_state);
for (uint32_t j = 0; j < self->trailing_extras.size; j++) {
ts_stack_push(self->stack, slice_version, self->trailing_extras.contents[j], false, next_state);
ts_stack_push(self->stack, slice_version, *array_get(&self->trailing_extras, j), false, next_state);
}
for (StackVersion j = 0; j < slice_version; j++) {
@ -1053,11 +1061,11 @@ static void ts_parser__accept(
StackSliceArray pop = ts_stack_pop_all(self->stack, version);
for (uint32_t i = 0; i < pop.size; i++) {
SubtreeArray trees = pop.contents[i].subtrees;
SubtreeArray trees = array_get(&pop, i)->subtrees;
Subtree root = NULL_SUBTREE;
for (uint32_t j = trees.size - 1; j + 1 > 0; j--) {
Subtree tree = trees.contents[j];
Subtree tree = *array_get(&trees, j);
if (!ts_subtree_extra(tree)) {
ts_assert(!tree.data.is_inline);
uint32_t child_count = ts_subtree_child_count(tree);
@ -1092,7 +1100,7 @@ static void ts_parser__accept(
}
}
ts_stack_remove_version(self->stack, pop.contents[0].version);
ts_stack_remove_version(self->stack, array_get(&pop, 0)->version);
ts_stack_halt(self->stack, version);
}
@ -1158,7 +1166,7 @@ static bool ts_parser__do_all_potential_reductions(
StackVersion reduction_version = STACK_VERSION_NONE;
for (uint32_t j = 0; j < self->reduce_actions.size; j++) {
ReduceAction action = self->reduce_actions.contents[j];
ReduceAction action = *array_get(&self->reduce_actions, j);
reduction_version = ts_parser__reduce(
self, version, action.symbol, action.count,
@ -1196,7 +1204,7 @@ static bool ts_parser__recover_to_state(
StackVersion previous_version = STACK_VERSION_NONE;
for (unsigned i = 0; i < pop.size; i++) {
StackSlice slice = pop.contents[i];
StackSlice slice = *array_get(&pop, i);
if (slice.version == previous_version) {
ts_subtree_array_delete(&self->tree_pool, &slice.subtrees);
@ -1214,12 +1222,12 @@ static bool ts_parser__recover_to_state(
SubtreeArray error_trees = ts_stack_pop_error(self->stack, slice.version);
if (error_trees.size > 0) {
ts_assert(error_trees.size == 1);
Subtree error_tree = error_trees.contents[0];
Subtree error_tree = *array_get(&error_trees, 0);
uint32_t error_child_count = ts_subtree_child_count(error_tree);
if (error_child_count > 0) {
array_splice(&slice.subtrees, 0, 0, error_child_count, ts_subtree_children(error_tree));
for (unsigned j = 0; j < error_child_count; j++) {
ts_subtree_retain(slice.subtrees.contents[j]);
ts_subtree_retain(*array_get(&slice.subtrees, j));
}
}
ts_subtree_array_delete(&self->tree_pool, &error_trees);
@ -1235,7 +1243,7 @@ static bool ts_parser__recover_to_state(
}
for (unsigned j = 0; j < self->trailing_extras.size; j++) {
Subtree tree = self->trailing_extras.contents[j];
Subtree tree = *array_get(&self->trailing_extras, j);
ts_stack_push(self->stack, slice.version, tree, false, goal_state);
}
@ -1271,7 +1279,7 @@ static void ts_parser__recover(
// if the current lookahead token would be valid in that state.
if (summary && !ts_subtree_is_error(lookahead)) {
for (unsigned i = 0; i < summary->size; i++) {
StackSummaryEntry entry = summary->contents[i];
StackSummaryEntry entry = *array_get(summary, i);
if (entry.state == ERROR_STATE) continue;
if (entry.position.bytes == position.bytes) continue;
@ -1316,10 +1324,23 @@ static void ts_parser__recover(
// and subsequently halted. Remove those versions.
for (unsigned i = previous_version_count; i < ts_stack_version_count(self->stack); i++) {
if (!ts_stack_is_active(self->stack, i)) {
LOG("removed paused version:%u", i);
ts_stack_remove_version(self->stack, i--);
LOG_STACK();
}
}
// If the parser is still in the error state at the end of the file, just wrap everything
// in an ERROR node and terminate.
if (ts_subtree_is_eof(lookahead)) {
LOG("recover_eof");
SubtreeArray children = array_new();
Subtree parent = ts_subtree_new_error_node(&children, false, self->language);
ts_stack_push(self->stack, version, parent, false, 1);
ts_parser__accept(self, version, lookahead);
return;
}
// If strategy 1 succeeded, a new stack version will have been created which is able to handle
// the current lookahead token. Now, in addition, try strategy 2 described above: skip the
// current lookahead token by wrapping it in an ERROR node.
@ -1340,17 +1361,6 @@ static void ts_parser__recover(
return;
}
// If the parser is still in the error state at the end of the file, just wrap everything
// in an ERROR node and terminate.
if (ts_subtree_is_eof(lookahead)) {
LOG("recover_eof");
SubtreeArray children = array_new();
Subtree parent = ts_subtree_new_error_node(&children, false, self->language);
ts_stack_push(self->stack, version, parent, false, 1);
ts_parser__accept(self, version, lookahead);
return;
}
// Do not recover if the result would clearly be worse than some existing stack version.
unsigned new_cost =
current_error_cost + ERROR_COST_PER_SKIPPED_TREE +
@ -1396,18 +1406,18 @@ static void ts_parser__recover(
// arbitrarily and discard the rest.
if (pop.size > 1) {
for (unsigned i = 1; i < pop.size; i++) {
ts_subtree_array_delete(&self->tree_pool, &pop.contents[i].subtrees);
ts_subtree_array_delete(&self->tree_pool, &array_get(&pop, i)->subtrees);
}
while (ts_stack_version_count(self->stack) > pop.contents[0].version + 1) {
ts_stack_remove_version(self->stack, pop.contents[0].version + 1);
while (ts_stack_version_count(self->stack) > array_get(&pop, 0)->version + 1) {
ts_stack_remove_version(self->stack, array_get(&pop, 0)->version + 1);
}
}
ts_stack_renumber_version(self->stack, pop.contents[0].version, version);
array_push(&pop.contents[0].subtrees, ts_subtree_from_mut(error_repeat));
ts_stack_renumber_version(self->stack, array_get(&pop, 0)->version, version);
array_push(&array_get(&pop, 0)->subtrees, ts_subtree_from_mut(error_repeat));
error_repeat = ts_subtree_new_node(
ts_builtin_sym_error_repeat,
&pop.contents[0].subtrees,
&array_get(&pop, 0)->subtrees,
0,
self->language
);
@ -1534,7 +1544,7 @@ static bool ts_parser__check_progress(TSParser *self, Subtree *lookahead, const
if (self->operation_count >= OP_COUNT_PER_PARSER_TIMEOUT_CHECK) {
self->operation_count = 0;
}
if (self->parse_options.progress_callback && position != NULL) {
if (position != NULL) {
self->parse_state.current_byte_offset = *position;
self->parse_state.has_error = self->has_error;
}
@ -1616,6 +1626,7 @@ static bool ts_parser__advance(
// an ambiguous state. REDUCE actions always create a new stack
// version, whereas SHIFT actions update the existing stack version
// and terminate this loop.
bool did_reduce = false;
StackVersion last_reduction_version = STACK_VERSION_NONE;
for (uint32_t i = 0; i < table_entry.action_count; i++) {
TSParseAction action = table_entry.actions[i];
@ -1651,6 +1662,7 @@ static bool ts_parser__advance(
action.reduce.dynamic_precedence, action.reduce.production_id,
is_fragile, end_of_non_terminal_extra
);
did_reduce = true;
if (reduction_version != STACK_VERSION_NONE) {
last_reduction_version = reduction_version;
}
@ -1702,9 +1714,12 @@ static bool ts_parser__advance(
continue;
}
// A non-terminal extra rule was reduced and merged into an existing
// stack version. This version can be discarded.
if (!lookahead.ptr) {
// A reduction was performed, but was merged into an existing stack version.
// This version can be discarded.
if (did_reduce) {
if (lookahead.ptr) {
ts_subtree_release(&self->tree_pool, lookahead);
}
ts_stack_halt(self->stack, version);
return true;
}
@ -1753,7 +1768,7 @@ static bool ts_parser__advance(
// versions that exist. If some other version advances successfully, then
// this version can simply be removed. But if all versions end up paused,
// then error recovery is needed.
LOG("detect_error");
LOG("detect_error lookahead:%s", TREE_NAME(lookahead));
ts_stack_pause(self->stack, version, lookahead);
return true;
}
@ -1842,6 +1857,7 @@ static unsigned ts_parser__condense_stack(TSParser *self) {
has_unpaused_version = true;
} else {
ts_stack_remove_version(self->stack, i);
made_changes = true;
i--;
n--;
}
@ -1877,9 +1893,9 @@ static bool ts_parser__balance_subtree(TSParser *self) {
return false;
}
MutableSubtree tree = self->tree_pool.tree_stack.contents[
MutableSubtree tree = *array_get(&self->tree_pool.tree_stack,
self->tree_pool.tree_stack.size - 1
];
);
if (tree.ptr->repeat_depth > 0) {
Subtree child1 = ts_subtree_children(tree)[0];
@ -2128,7 +2144,7 @@ TSTree *ts_parser_parse(
LOG("parse_after_edit");
LOG_TREE(self->old_tree);
for (unsigned i = 0; i < self->included_range_differences.size; i++) {
TSRange *range = &self->included_range_differences.contents[i];
TSRange *range = array_get(&self->included_range_differences, i);
LOG("different_included_range %u - %u", range->start_byte, range->end_byte);
}
} else {
@ -2185,7 +2201,7 @@ TSTree *ts_parser_parse(
}
while (self->included_range_difference_index < self->included_range_differences.size) {
TSRange *range = &self->included_range_differences.contents[self->included_range_difference_index];
TSRange *range = array_get(&self->included_range_differences, self->included_range_difference_index);
if (range->end_byte <= position) {
self->included_range_difference_index++;
} else {
@ -2226,6 +2242,8 @@ TSTree *ts_parser_parse_with_options(
self->parse_options = parse_options;
self->parse_state.payload = parse_options.payload;
TSTree *result = ts_parser_parse(self, old_tree, input);
// Reset parser options before further parse calls.
self->parse_options = (TSParseOptions) {0};
return result;
}

View file

@ -18,7 +18,6 @@ typedef uint16_t TSStateId;
typedef uint16_t TSSymbol;
typedef uint16_t TSFieldId;
typedef struct TSLanguage TSLanguage;
typedef struct TSLanguageMetadata TSLanguageMetadata;
typedef struct TSLanguageMetadata {
uint8_t major_version;
uint8_t minor_version;

View file

@ -18,16 +18,24 @@
#if defined(HAVE_ENDIAN_H) || \
defined(__linux__) || \
defined(__GNU__) || \
defined(__HAIKU__) || \
defined(__illumos__) || \
defined(__NetBSD__) || \
defined(__OpenBSD__) || \
defined(__CYGWIN__) || \
defined(__MSYS__) || \
defined(__EMSCRIPTEN__)
defined(__EMSCRIPTEN__) || \
defined(__wasi__) || \
defined(__wasm__)
#if defined(__NetBSD__)
#define _NETBSD_SOURCE 1
#endif
# include <endian.h>
#elif defined(HAVE_SYS_ENDIAN_H) || \
defined(__FreeBSD__) || \
defined(__NetBSD__) || \
defined(__DragonFly__)
# include <sys/endian.h>

Some files were not shown because too many files have changed in this diff Show more