Since cargo 1.63, $CARGO_PKG_RUST_VERSION is set in the build
environment to the value of the rust-version Cargo.toml field.
This removes the need to manually invoke cargo from build.rs during a
build of the tree-sitter crate with the bindgen feature enabled.
Removing the cargo invocation also ensures the build doesn't write to
the current directory when the target directory has been redirected
elsewhere. "cargo metadata" will attempt to update Cargo.lock, which
will fail if the source tree is read-only.
The `CARGO_MANIFEST_DIR` environment variable should be accessed by
`build.rs` at run time rather than compile time. This was for example
causing issues when importing `tree-sitter` via
[`rules_rust`](https://github.com/bazelbuild/rules_rust) in bazel,
where compilation and running happen in separate environments.
This patch updates the Rust binding's build script to output [build
metadata][links]. This makes it easier for downstream crates to
determine the include path, in case they need to compile their own C
code that requires the tree-sitter headers.
[links]: https://doc.rust-lang.org/cargo/reference/build-scripts.html#the-links-manifest-key
Production builds shouldn't include -Werror by default since that could
cause spurious build failures when there are toolchain updates.
CI uses -Werror to prevent warnings, so that should be sufficient.
* Remove dependency on utf8proc
This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API.
* .gitmodules: remove utf8proc submodule
* docs/section-2-using-parsers.md: remove requirement for utf8proc submodule
* docs/section-6-contributing.md: likewise
* lib/Cargo.toml: remove utf8proc subdirectory package include
* lib/README.md: remove utf8proc subdirectory description
* lib/binding_rust/build.rs: remove utf8proc compiler include directory
* lib/src/lexer.c: remove utf8proc dependencies and types
* lib/src/lib.c: remove utf8proc dependency
* lib/src/unicode.h: define types for Unicode decoders
* lib/src/utf16.{c,h}: implement more readable UTF-16 decoder
* lib/src/utf8.{c,h}: implement fast UTF-8 decoder
* lib/utf8proc: remove utf8proc submodule directory
* script/build-lib: remove utf8proc compiler include directory
* script/build-wasm: likewise
* Optimize ts_lexer__get_lookahead.
Try to favor non-failure code path and assign lookahead values directly to lexer
* lib/src/lexer.c: optimize for non-failure code path
* Fix some compiler errors
* lib/src/lexer.c: cast from signed to unsigned for decode_next result
* lib/src/utf16.c: fix non-constant initializers for older compilers
* Remove some missed remnants of utf8proc
* docs/section-2-using-parsers.md: only two include paths necessary now
* lib/src/lib.c: no need to define UTF8PROC_STATIC
* Use ICU's utf8 and utf16 decoding routines
* Remove unnecessary casts when calling icu macros
* Check buffer length before attempting to decode a unicode character
* Use new unicode function when parsing Queries
Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com>
* Mark libicu files as vendored for GitHub's stats