tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	f6325746aa	Provide symbol metadata with dummy language in stack test	2017-08-08 17:47:24 -07:00
Max Brunsfeld	cc7277fd7d	Avoid using IsNull bandit assertion	2017-08-08 12:52:35 -07:00
Max Brunsfeld	94dc703bfc	Require that grammars' start rules be visible	2017-08-04 17:07:37 -07:00
Max Brunsfeld	e5c3bf742d	Update fixture grammars	2017-08-03 16:32:39 -07:00
Max Brunsfeld	09f4796f6b	Get tests passing w/ new alias API	2017-08-01 14:35:34 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	cbdfd89675	Mark reductions as fragile based on their final properties We previously maintained a set of individual productions that were involved in conflicts, but that was subtly incorrect because we don't compare productions themselves when comparing parse items; we only compare the parse items properties that could affect the final reduce actions.	2017-07-21 09:54:24 -07:00
Max Brunsfeld	f33421c53e	Fix incorrect node renames in the presence of extra tokens	2017-07-18 21:24:34 -07:00
Max Brunsfeld	10d28d4b56	Merge pull request #92 from tree-sitter/utf16-oob Add test for UTF16 out-of-bound read	2017-07-18 17:24:31 -07:00
Phil Turnbull	52cec9ed39	Rework SpyInput buffer handling SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for catching logic errors but defeats valgrind's memory tracking. Use a separate buffer of exactly the correct size for each request. This correctly catches the problem under valgrind: ``` ==8694== Invalid read of size 2 ==8694== at 0x54EFFB: utf16_iterate (utf16.c:10) ==8694== by 0x551126: ts_lexer__get_lookahead (lexer.c:54) ==8694== by 0x5515CD: ts_lexer_start (lexer.c:154) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ==8694== by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267) ==8694== Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd ==8694== at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8694== by 0x507C3E: SpyInput::read(void, unsigned int) (spy_input.cc:66) ==8694== by 0x55103D: ts_lexer__get_chunk (lexer.c:29) ==8694== by 0x5515B6: ts_lexer_start (lexer.c:152) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ```	2017-07-18 12:16:37 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	de17c92462	Fix setup in stack test	2017-07-18 08:21:35 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Phil Turnbull	e7662c2213	Handle out-of-bound read in utf16_iterate Also simplify the test so we call `utf16_iterate` directly. Calling `utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably trigger the problem using valgrind. valgrind also doesn't detect the problem if we use a string literal like: `utf16_iterate("", 1, &code_point);`	2017-07-17 13:57:12 -07:00
Phil Turnbull	035abc1e15	Add test for UTF16 out-of-bound read utf16_iterate does not check that 'length' is a multiple of two which leads to an out-of-bound read: ==105293== Conditional jump or move depends on uninitialised value(s) ==105293== at 0x54F014: utf16_iterate (utf16.c:7) ==105293== by 0x539251: string_iterate(TSInputEncoding, unsigned char const, unsigned long, int) (encoding_helpers.cc:15) ==105293== by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43) ==105293== by 0x507BAD: SpyInput::read(void, unsigned int) (spy_input.cc:47) ==105293== by 0x551049: ts_lexer__get_chunk (lexer.c:29) ==105293== by 0x5515C2: ts_lexer_start (lexer.c:152) ==105293== by 0x5469AB: parser(long,...)(long long) (parser.c:297) ==105293== by 0x547896: parser__get_lookahead (parser.c:439) ==105293== by 0x54B2DF: parser__advance (parser.c:1150) ==105293== by 0x54C2B6: parser_parse (parser.c:1348) ==105293== by 0x53F06F: ts_document_parse_with_options (document.c:136) ==105293== by 0x53EF4F: ts_document_parse (document.c:107)	2017-07-17 12:34:39 -07:00
Max Brunsfeld	4b40a1ed6c	Support anonymous tokens inside of RENAME rules	2017-07-14 10:19:58 -07:00
Max Brunsfeld	8f028ebf68	Avoid deep tree comparison when both trees have errors	2017-07-05 17:33:35 -07:00
Max Brunsfeld	d322f0b6a7	🎨	2017-07-04 21:59:54 -07:00
Max Brunsfeld	a89322c5f1	Remove unneeded parameters from public interface of stack_iterate callback	2017-06-29 16:43:56 -07:00
Max Brunsfeld	66be393b78	Stack - consider empty external token state identical to NULL	2017-06-29 15:00:20 -07:00
Max Brunsfeld	0143bfdad4	Avoid use-after-free of external token states Previously, it was possible for references to external token states to outlive the trees to which those states belonged. Now, instead of storing references to external token states in the Stack and in the Lexer, we store references to the external token trees themselves, and we retain the trees to prevent use-after-free.	2017-06-27 14:54:27 -07:00
Max Brunsfeld	f62ee5a0f3	Fix OOB reads at ends of chunks Signed-off-by: Philip Turnbull <philipturnbull@github.com>	2017-06-23 12:09:16 -07:00
Max Brunsfeld	513edec7c1	Merge pull request #77 from philipturnbull/scan-build-fixes Fix errors found by scan-build	2017-06-20 10:15:20 -07:00
Max Brunsfeld	c66fddd3aa	Add TSInput option to measure columns in bytes not characters	2017-06-15 16:35:34 -07:00
Max Brunsfeld	b862db766e	Merge remote-tracking branch 'origin/master' into update-fixture-grammars	2017-06-14 17:11:44 -07:00
Phil Turnbull	18f261ad51	Initialise all fields of TSParseOptions in tests This should prevent any confusing failures in the unit tests: test/runtime/document_test.cc:381:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count') ts_document_parse_with_options(document, options); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test/runtime/document_test.cc:408:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count') ts_document_parse_with_options(document, options); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	2017-06-14 11:12:06 -04:00
Max Brunsfeld	74f5ceddf7	Fix parsing of valid code with halt_on_error flag set Signed-off-by: Tim Clem <timothy.clem@gmail.com>	2017-05-01 14:25:25 -07:00
Max Brunsfeld	a98d449d88	Add an option to immediately halt on syntax error	2017-05-01 13:50:49 -07:00
Max Brunsfeld	03a555a86e	Finish test for invalid UTF8 handling Signed-off-by: Tim Clem <timothy.clem@gmail.com>	2017-04-27 14:48:16 -07:00
Timothy Clem	37f2a4745f	Test demonstrating non-UT8 input failure	2017-04-27 14:46:36 -07:00
Max Brunsfeld	a15e974150	Make clearer assertions about SpyInput's read strings	2017-03-21 12:14:04 -07:00
Max Brunsfeld	ca943f09a4	Update expected trees in error recovery test	2017-03-21 11:41:01 -07:00
Max Brunsfeld	f032da198e	Finish test for invalid UTF8 handling Signed-off-by: Tim Clem <timothy.clem@gmail.com>	2017-03-21 11:05:32 -07:00
Timothy Clem	7092d4522a	Test demonstrating non-UT8 input failure	2017-03-21 09:58:35 -07:00
Max Brunsfeld	42b05b4b5e	Add simple unit test for invalidating trees preceding an edit due to lookahead	2017-03-13 17:34:31 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	6dc0ff359d	Rename spec -> test 'Test' is a lot more straightforward of a name.	2017-03-09 20:40:01 -08:00

38 commits