Commit graph

35 commits

Author SHA1 Message Date
ObserverOfTime
b75196bb81 feat(c): rename DecodeFunction to TSDecodeFunction
Keep a typedef for backwards compatibility until ABI 16.
2025-09-01 03:17:44 -04:00
Amaan Qureshi
6e88672dac chore: cleanup unused code 2025-01-21 01:17:03 -05:00
Amaan Qureshi
694d636322 fix(lib): correct fix for parsing hang with ranges containing empty points
It's more correct to check the bytes of the `size` length, rather than
use the point as a condition for resetting the lexer's token start
position
2024-12-25 04:49:39 -05:00
Amaan Qureshi
500f4326d5 feat: add the ability to specify a custom decode function 2024-10-31 22:51:40 -04:00
Amaan Qureshi
aaba7cd2f9 feat: implement a cache for get_column 2024-10-30 18:35:38 -04:00
Amaan Qureshi
fe92e978f9 fix(lib): properly reset the lexer's start postiion 2024-10-11 19:02:41 -04:00
Amaan Qureshi
538a197976 fix(lib): correct unexpected side effect in get_column when the lexer is at EOF 2024-10-08 23:27:42 -04:00
Amaan Qureshi
8943983df6 feat!: properly handle UTF-16 endianness encoding 2024-10-05 21:12:48 -04:00
Ron Panduwana
2bb20fe2fe
feat: allow external scanners to use the logger
Co-authored-by: Amaan Qureshi <amaanq12@gmail.com>
2024-08-17 14:46:28 -04:00
Amaan Qureshi
4c083252ec fix(lib): advance the lookahead end byte by 4 when there's an invalid code point
This helps in the case where an edit was made in the middle of a code
point, but bytes 1-3 are valid, thus we could advance by at most 4 bytes
2024-04-30 20:55:43 -04:00
Amaan Qureshi
a4ea4737ac fix: do not increment current_included_range_index past included_range_count in __do_advance 2023-08-27 14:16:18 +03:00
Amaan Qureshi
13f6ec2b0c
fix: rename shadowed variables from -Wshadow warnings and apply some useful clang-tidy warnings 2023-07-19 18:12:26 -04:00
Andrew Hlynskyi
63d9f7146f Fix get_column() segfault on EOF, don't do lookahead without EOF check first 2023-04-22 12:11:26 +03:00
Max Brunsfeld
efd22e452b Fix suppression of empty tokens during error handling at included range boundaries 2022-11-14 12:20:39 -08:00
Max Brunsfeld
d07f864815 Fix parse error when reusing a node at the end of an included range 2022-11-11 16:34:57 -08:00
Andrew Helwer
5a6530a413 Added tests 2022-01-11 12:05:37 -05:00
Andrew Helwer
80c34d62ab Fixed rust build, updated docs 2022-01-07 10:36:25 -05:00
Andrew Helwer
3ab6d1b937 Improve diff further 2022-01-07 10:17:53 -05:00
Andrew Helwer
bfb692d2f7 Improve diff 2022-01-07 10:16:20 -05:00
Andrew Helwer
ace81f6267 Don't log when counting codepoints 2022-01-07 10:13:57 -05:00
Andrew Helwer
0a52e90b01 Fixed pointer type 2022-01-07 10:13:57 -05:00
Andrew Helwer
75aa295b66 get_column now counts codepoints 2022-01-07 10:13:57 -05:00
Cameron Forbis
9182ebef86 update set_included_ranges to modify extent if the current position is at the very beginning of the included range 2021-06-17 16:42:25 -07:00
Max Brunsfeld
a40045a419 When editing, properly invalidate trees that depend on get_column 2021-03-11 14:46:13 -08:00
Max Brunsfeld
e29d3714f7 Fix behavior of Lexer.get_column when at EOF 2021-03-11 12:11:33 -08:00
Hansraj Das
000455ee79 Multiple typo fixes
* This is a patch from neovim PR: https://github.com/neovim/neovim/pull/13063
2020-10-11 13:02:40 +05:30
Björn Linse
00c470ab2a Fix a few cases of Clang 10 with UBSAN detecting undefined behavior
Clang 10 considers adding any offset, including 0, to the null pointer
to be undefined behavior. `(void *)NULL + 0 = kaboom`.
2020-08-25 19:34:44 +02:00
Max Brunsfeld
9f63139a10 Fix error when set_included_ranges is called with an invalid range list 2020-01-17 10:31:28 -08:00
Max Brunsfeld
d3b7caa565 Add a TSLexer.eof() API, use it in generated parsers 2019-10-31 14:11:52 -07:00
Max Brunsfeld
a62b7a70f3 Lexer: track EOF state without relying on null character as lookahead 2019-10-31 14:11:52 -07:00
Max Brunsfeld
077cd4970c Handle empty list of included ranges w/ non-null pointer 2019-10-29 13:45:04 -07:00
Matthew Krupcale
ee9a3c0ebb lib: remove utf8proc dependency (#436)
* Remove dependency on utf8proc

This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API.

 * .gitmodules: remove utf8proc submodule
 * docs/section-2-using-parsers.md: remove requirement for utf8proc submodule
 * docs/section-6-contributing.md: likewise
 * lib/Cargo.toml: remove utf8proc subdirectory package include
 * lib/README.md: remove utf8proc subdirectory description
 * lib/binding_rust/build.rs: remove utf8proc compiler include directory
 * lib/src/lexer.c: remove utf8proc dependencies and types
 * lib/src/lib.c: remove utf8proc dependency
 * lib/src/unicode.h: define types for Unicode decoders
 * lib/src/utf16.{c,h}: implement more readable UTF-16 decoder
 * lib/src/utf8.{c,h}: implement fast UTF-8 decoder
 * lib/utf8proc: remove utf8proc submodule directory
 * script/build-lib: remove utf8proc compiler include directory
 * script/build-wasm: likewise

* Optimize ts_lexer__get_lookahead.

Try to favor non-failure code path and assign lookahead values directly to lexer

 * lib/src/lexer.c: optimize for non-failure code path

* Fix some compiler errors

 * lib/src/lexer.c: cast from signed to unsigned for decode_next result
 * lib/src/utf16.c: fix non-constant initializers for older compilers

* Remove some missed remnants of utf8proc

 * docs/section-2-using-parsers.md: only two include paths necessary now
 * lib/src/lib.c: no need to define UTF8PROC_STATIC

* Use ICU's utf8 and utf16 decoding routines

* Remove unnecessary casts when calling icu macros

* Check buffer length before attempting to decode a unicode character

* Use new unicode function when parsing Queries

Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com>

* Mark libicu files as vendored for GitHub's stats
2019-10-14 11:18:39 -07:00
Max Brunsfeld
0afbc31789 Automatically skip BOM characters at beginnings of files
Refs tree-sitter/tree-sitter-python#48
2019-08-02 12:03:04 -07:00
Max Brunsfeld
dd416b0955 Update include paths to not reference 'runtime' directory 2019-01-04 17:33:34 -08:00
Max Brunsfeld
47607cecf4 Reorganize repo, add rust CLI and binding code, 2019-01-04 17:31:49 -08:00