tree-sitter/lib/src/unicode/README.md

30 lines
1.2 KiB
Markdown
Raw Normal View History

lib: remove utf8proc dependency (#436) * Remove dependency on utf8proc This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API. * .gitmodules: remove utf8proc submodule * docs/section-2-using-parsers.md: remove requirement for utf8proc submodule * docs/section-6-contributing.md: likewise * lib/Cargo.toml: remove utf8proc subdirectory package include * lib/README.md: remove utf8proc subdirectory description * lib/binding_rust/build.rs: remove utf8proc compiler include directory * lib/src/lexer.c: remove utf8proc dependencies and types * lib/src/lib.c: remove utf8proc dependency * lib/src/unicode.h: define types for Unicode decoders * lib/src/utf16.{c,h}: implement more readable UTF-16 decoder * lib/src/utf8.{c,h}: implement fast UTF-8 decoder * lib/utf8proc: remove utf8proc submodule directory * script/build-lib: remove utf8proc compiler include directory * script/build-wasm: likewise * Optimize ts_lexer__get_lookahead. Try to favor non-failure code path and assign lookahead values directly to lexer * lib/src/lexer.c: optimize for non-failure code path * Fix some compiler errors * lib/src/lexer.c: cast from signed to unsigned for decode_next result * lib/src/utf16.c: fix non-constant initializers for older compilers * Remove some missed remnants of utf8proc * docs/section-2-using-parsers.md: only two include paths necessary now * lib/src/lib.c: no need to define UTF8PROC_STATIC * Use ICU's utf8 and utf16 decoding routines * Remove unnecessary casts when calling icu macros * Check buffer length before attempting to decode a unicode character * Use new unicode function when parsing Queries Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com> * Mark libicu files as vendored for GitHub's stats
2019-10-14 14:18:39 -04:00
# ICU Parts
This directory contains a small subset of files from the Unicode organization's [ICU repository](https://github.com/unicode-org/icu).
### License
The license for these files is contained in the `LICENSE` file within this directory.
### Contents
* Source files taken from the [`icu4c/source/common/unicode`](https://github.com/unicode-org/icu/tree/552b01f61127d30d6589aa4bf99468224979b661/icu4c/source/common/unicode) directory:
* `utf8.h`
* `utf16.h`
* `umachine.h`
* Empty source files that are referenced by the above source files, but whose original contents in `libicu` are not needed:
* `ptypes.h`
* `urename.h`
* `utf.h`
* `ICU_SHA` - File containing the Git SHA of the commit in the `icu` repository from which the files were obtained.
* `LICENSE` - The license file from the [`icu4c`](https://github.com/unicode-org/icu/tree/552b01f61127d30d6589aa4bf99468224979b661/icu4c) directory of the `icu` repository.
* `README.md` - This text file.
### Updating ICU
To incorporate changes from the upstream `icu` repository:
* Update `ICU_SHA` with the new Git SHA.
* Update `LICENSE` with the license text from the directory mentioned above.
* Update `utf8.h`, `utf16.h`, and `umachine.h` with their new contents in the `icu` repository.