This patch updates the Rust binding's build script to output [build
metadata][links]. This makes it easier for downstream crates to
determine the include path, in case they need to compile their own C
code that requires the tree-sitter headers.
[links]: https://doc.rust-lang.org/cargo/reference/build-scripts.html#the-links-manifest-key
We have several test cases defined in the `cli` crate that depend on the
`lib` crate's `allocation-tracking` feature. The implementation of the
actual allocation tracker used to live in the `cli` crate, close to the
test cases that use it. The `allocation-tracking` feature in the `lib`
crate was just used to tell the tree-sitter implementation to expect
that the allocation tracker exists, and to use it.
That pattern meant that we had a circular dependency: `cli` depends on
`lib`, but `lib` required some code that was implemented in `cli`.
That, in turn, caused linker errors — but only when compiling in certain
configurations! [1]
This patch moves all of the allocation tracking implementation into the
`lib` crate, gated on the existing `allocation-tracking` feature, which
fixes the circular dependency.
Note that this patch does **not** fix the fact that feature unification
causes the `lib` crate to be built with the `allocation-tracking`
feature enabled, even though it's not a default. Fixing that depends on
the forthcoming version 2 feature resolver [2], or using the `dev_dep`
workaround [3] in the meantime.
[1] https://github.com/tree-sitter/tree-sitter/issues/919
[2] https://doc.rust-lang.org/nightly/cargo/reference/features.html#feature-resolver-version-2
[3] https://github.com/tree-sitter/tree-sitter/issues/919#issuecomment-777107086
This is just a minimal set of changes to dependencies.
macOS aarch64 support was only introduced in `cc` version 1.0.58, so this now allows tree-sitter to build natively on M1 computers
* Remove dependency on utf8proc
This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API.
* .gitmodules: remove utf8proc submodule
* docs/section-2-using-parsers.md: remove requirement for utf8proc submodule
* docs/section-6-contributing.md: likewise
* lib/Cargo.toml: remove utf8proc subdirectory package include
* lib/README.md: remove utf8proc subdirectory description
* lib/binding_rust/build.rs: remove utf8proc compiler include directory
* lib/src/lexer.c: remove utf8proc dependencies and types
* lib/src/lib.c: remove utf8proc dependency
* lib/src/unicode.h: define types for Unicode decoders
* lib/src/utf16.{c,h}: implement more readable UTF-16 decoder
* lib/src/utf8.{c,h}: implement fast UTF-8 decoder
* lib/utf8proc: remove utf8proc submodule directory
* script/build-lib: remove utf8proc compiler include directory
* script/build-wasm: likewise
* Optimize ts_lexer__get_lookahead.
Try to favor non-failure code path and assign lookahead values directly to lexer
* lib/src/lexer.c: optimize for non-failure code path
* Fix some compiler errors
* lib/src/lexer.c: cast from signed to unsigned for decode_next result
* lib/src/utf16.c: fix non-constant initializers for older compilers
* Remove some missed remnants of utf8proc
* docs/section-2-using-parsers.md: only two include paths necessary now
* lib/src/lib.c: no need to define UTF8PROC_STATIC
* Use ICU's utf8 and utf16 decoding routines
* Remove unnecessary casts when calling icu macros
* Check buffer length before attempting to decode a unicode character
* Use new unicode function when parsing Queries
Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com>
* Mark libicu files as vendored for GitHub's stats