tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	666dfb76d2	Remove document parameter from ts_node_type, ts_node_string Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 16:47:47 -07:00
Max Brunsfeld	92255bbfdd	Remove document parameter from ts_node_type, ts_node_string Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 15:28:28 -07:00
Max Brunsfeld	b06747b6ca	Remove stale unit tests Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 14:14:42 -07:00
Max Brunsfeld	973e4a44f0	Start work on removing parent pointers Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 12:22:19 -07:00
Max Brunsfeld	d5cfc06fa2	Fix unit test for invalid utf8 at EOF	2018-04-17 17:33:45 -07:00
Max Brunsfeld	09be0b6ef5	Store trees' children in TreeArrays, not w/ separate pointer and length	2018-04-06 13:26:18 -07:00
Max Brunsfeld	a6cf2e87e7	Fix halt_on_error tests	2018-04-06 13:26:18 -07:00
Max Brunsfeld	dbe77e7199	Simplify testing-only ts_stack_iterate function	2018-03-29 17:50:07 -07:00
Max Brunsfeld	5520983144	Clean up Stack API * Remove StackPopResult * Rename top_state() -> state() * Rename top_position() -> position() * Improve docs	2018-03-29 17:37:54 -07:00
Max Brunsfeld	ee995c3d6b	Avoid redundant retains/releases by giving ts_stack_push move semantics	2018-03-29 17:18:43 -07:00
Max Brunsfeld	0810971f3e	🔥 symbol iterator API This idea was never fully baked.	2018-03-08 14:16:37 -08:00
Max Brunsfeld	e927d02f43	Allow reusing leaf nodes unless the next leaf has changes	2018-03-07 17:44:54 -08:00
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	82c7e170b3	Fix case where loop was created in the parse stack Fixes #133	2018-03-02 09:05:20 -08:00
Max Brunsfeld	46dcd53090	Do not insert missing tokens if halt_on_error option is passed	2018-01-24 14:04:55 -08:00
Max Brunsfeld	315dff3285	Add an API for getting a node's child index	2018-01-09 14:01:36 -08:00
Max Brunsfeld	f653f2b3bb	Add ts_node_first_{child,named_child}_for_byte methods	2018-01-09 13:44:59 -08:00
Max Brunsfeld	addeb6c4c1	Allocate and free trees using an object pool	2017-12-27 10:34:29 -08:00
Max Brunsfeld	0e69da37a5	Return a character count from the lexer's get_column method	2017-12-20 16:26:38 -08:00
Max Brunsfeld	36c2b685b9	Always invalidate old chunk of text when parsing after an edit	2017-10-04 15:09:46 -07:00
Max Brunsfeld	d291af9a31	Refactor error comparisons * Deal with mergeability outside of error comparison function * Make `better_version_exists` function pure (don't halt other versions as a side effect). * Tweak error comparison logic Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-13 16:38:15 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	f6325746aa	Provide symbol metadata with dummy language in stack test	2017-08-08 17:47:24 -07:00
Max Brunsfeld	cc7277fd7d	Avoid using IsNull bandit assertion	2017-08-08 12:52:35 -07:00
Max Brunsfeld	94dc703bfc	Require that grammars' start rules be visible	2017-08-04 17:07:37 -07:00
Max Brunsfeld	e5c3bf742d	Update fixture grammars	2017-08-03 16:32:39 -07:00
Max Brunsfeld	09f4796f6b	Get tests passing w/ new alias API	2017-08-01 14:35:34 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	cbdfd89675	Mark reductions as fragile based on their final properties We previously maintained a set of individual productions that were involved in conflicts, but that was subtly incorrect because we don't compare productions themselves when comparing parse items; we only compare the parse items properties that could affect the final reduce actions.	2017-07-21 09:54:24 -07:00
Max Brunsfeld	f33421c53e	Fix incorrect node renames in the presence of extra tokens	2017-07-18 21:24:34 -07:00
Max Brunsfeld	10d28d4b56	Merge pull request #92 from tree-sitter/utf16-oob Add test for UTF16 out-of-bound read	2017-07-18 17:24:31 -07:00
Phil Turnbull	52cec9ed39	Rework SpyInput buffer handling SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for catching logic errors but defeats valgrind's memory tracking. Use a separate buffer of exactly the correct size for each request. This correctly catches the problem under valgrind: ``` ==8694== Invalid read of size 2 ==8694== at 0x54EFFB: utf16_iterate (utf16.c:10) ==8694== by 0x551126: ts_lexer__get_lookahead (lexer.c:54) ==8694== by 0x5515CD: ts_lexer_start (lexer.c:154) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ==8694== by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267) ==8694== Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd ==8694== at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8694== by 0x507C3E: SpyInput::read(void, unsigned int) (spy_input.cc:66) ==8694== by 0x55103D: ts_lexer__get_chunk (lexer.c:29) ==8694== by 0x5515B6: ts_lexer_start (lexer.c:152) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ```	2017-07-18 12:16:37 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	de17c92462	Fix setup in stack test	2017-07-18 08:21:35 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Phil Turnbull	e7662c2213	Handle out-of-bound read in utf16_iterate Also simplify the test so we call `utf16_iterate` directly. Calling `utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably trigger the problem using valgrind. valgrind also doesn't detect the problem if we use a string literal like: `utf16_iterate("", 1, &code_point);`	2017-07-17 13:57:12 -07:00
Phil Turnbull	035abc1e15	Add test for UTF16 out-of-bound read utf16_iterate does not check that 'length' is a multiple of two which leads to an out-of-bound read: ==105293== Conditional jump or move depends on uninitialised value(s) ==105293== at 0x54F014: utf16_iterate (utf16.c:7) ==105293== by 0x539251: string_iterate(TSInputEncoding, unsigned char const, unsigned long, int) (encoding_helpers.cc:15) ==105293== by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43) ==105293== by 0x507BAD: SpyInput::read(void, unsigned int) (spy_input.cc:47) ==105293== by 0x551049: ts_lexer__get_chunk (lexer.c:29) ==105293== by 0x5515C2: ts_lexer_start (lexer.c:152) ==105293== by 0x5469AB: parser(long,...)(long long) (parser.c:297) ==105293== by 0x547896: parser__get_lookahead (parser.c:439) ==105293== by 0x54B2DF: parser__advance (parser.c:1150) ==105293== by 0x54C2B6: parser_parse (parser.c:1348) ==105293== by 0x53F06F: ts_document_parse_with_options (document.c:136) ==105293== by 0x53EF4F: ts_document_parse (document.c:107)	2017-07-17 12:34:39 -07:00
Max Brunsfeld	4b40a1ed6c	Support anonymous tokens inside of RENAME rules	2017-07-14 10:19:58 -07:00
Max Brunsfeld	8f028ebf68	Avoid deep tree comparison when both trees have errors	2017-07-05 17:33:35 -07:00
Max Brunsfeld	d322f0b6a7	🎨	2017-07-04 21:59:54 -07:00
Max Brunsfeld	a89322c5f1	Remove unneeded parameters from public interface of stack_iterate callback	2017-06-29 16:43:56 -07:00
Max Brunsfeld	66be393b78	Stack - consider empty external token state identical to NULL	2017-06-29 15:00:20 -07:00
Max Brunsfeld	0143bfdad4	Avoid use-after-free of external token states Previously, it was possible for references to external token states to outlive the trees to which those states belonged. Now, instead of storing references to external token states in the Stack and in the Lexer, we store references to the external token trees themselves, and we retain the trees to prevent use-after-free.	2017-06-27 14:54:27 -07:00
Max Brunsfeld	f62ee5a0f3	Fix OOB reads at ends of chunks Signed-off-by: Philip Turnbull <philipturnbull@github.com>	2017-06-23 12:09:16 -07:00
Max Brunsfeld	513edec7c1	Merge pull request #77 from philipturnbull/scan-build-fixes Fix errors found by scan-build	2017-06-20 10:15:20 -07:00
Max Brunsfeld	c66fddd3aa	Add TSInput option to measure columns in bytes not characters	2017-06-15 16:35:34 -07:00
Max Brunsfeld	b862db766e	Merge remote-tracking branch 'origin/master' into update-fixture-grammars	2017-06-14 17:11:44 -07:00
Phil Turnbull	18f261ad51	Initialise all fields of TSParseOptions in tests This should prevent any confusing failures in the unit tests: test/runtime/document_test.cc:381:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count') ts_document_parse_with_options(document, options); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test/runtime/document_test.cc:408:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count') ts_document_parse_with_options(document, options); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	2017-06-14 11:12:06 -04:00
Max Brunsfeld	74f5ceddf7	Fix parsing of valid code with halt_on_error flag set Signed-off-by: Tim Clem <timothy.clem@gmail.com>	2017-05-01 14:25:25 -07:00
Max Brunsfeld	a98d449d88	Add an option to immediately halt on syntax error	2017-05-01 13:50:49 -07:00

1 2

59 commits