tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	ebddb1a0b5	Add ts_tree_cursor_goto_first_child_for_byte method Atom needs this for efficiently seeking to the leaf node at a given position, visiting all of its ancestors along the way.	2018-05-16 13:51:21 -07:00
Max Brunsfeld	32c06b9b59	Make multi-threaded test work on windows Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-11 17:10:05 -07:00
Max Brunsfeld	043a2fc0d9	Assert absence of memory leaks in randomized multi-threaded tree test	2018-05-11 16:53:47 -07:00
Max Brunsfeld	a3e08e7c31	Add randomized multi-threaded tests on parse trees Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-11 16:10:36 -07:00
Max Brunsfeld	fe53506175	Declare subtrees as const wherever possible Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-11 15:06:13 -07:00
Max Brunsfeld	20c183b7cd	Rename ts_subtree_make_* -> ts_subtree_new_*	2018-05-11 13:02:12 -07:00
Max Brunsfeld	bf1bb1604f	Rename TSExternalTokenState -> ExternalScannerState	2018-05-11 12:57:41 -07:00
Max Brunsfeld	199a94cc26	Allow the parser to print dot graphs to any file	2018-05-11 12:48:51 -07:00
Max Brunsfeld	e75ecd1bb1	Rework API completely	2018-05-11 10:46:13 -07:00
Max Brunsfeld	35510a612d	Rename Tree -> Subtree	2018-05-10 15:11:14 -07:00
Max Brunsfeld	61327b627a	Add a unit test asserting that ts_tree_edit doesn't mutate the tree Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-10 12:28:16 -07:00
Max Brunsfeld	09e663c7d1	Make ts_tree_edit return a new tree rather than mutating its argument Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-10 12:23:05 -07:00
Max Brunsfeld	df79ff5997	Refactor ts_tree_edit Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-10 12:04:18 -07:00
Max Brunsfeld	666dfb76d2	Remove document parameter from ts_node_type, ts_node_string Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 16:47:47 -07:00
Max Brunsfeld	92255bbfdd	Remove document parameter from ts_node_type, ts_node_string Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 15:28:28 -07:00
Max Brunsfeld	b06747b6ca	Remove stale unit tests Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 14:14:42 -07:00
Max Brunsfeld	973e4a44f0	Start work on removing parent pointers Co-Authored-By: Rick Winfrey <rewinfrey@github.com>	2018-05-09 12:22:19 -07:00
Max Brunsfeld	d5cfc06fa2	Fix unit test for invalid utf8 at EOF	2018-04-17 17:33:45 -07:00
Max Brunsfeld	09be0b6ef5	Store trees' children in TreeArrays, not w/ separate pointer and length	2018-04-06 13:26:18 -07:00
Max Brunsfeld	a6cf2e87e7	Fix halt_on_error tests	2018-04-06 13:26:18 -07:00
Max Brunsfeld	dbe77e7199	Simplify testing-only ts_stack_iterate function	2018-03-29 17:50:07 -07:00
Max Brunsfeld	5520983144	Clean up Stack API * Remove StackPopResult * Rename top_state() -> state() * Rename top_position() -> position() * Improve docs	2018-03-29 17:37:54 -07:00
Max Brunsfeld	ee995c3d6b	Avoid redundant retains/releases by giving ts_stack_push move semantics	2018-03-29 17:18:43 -07:00
Max Brunsfeld	0810971f3e	🔥 symbol iterator API This idea was never fully baked.	2018-03-08 14:16:37 -08:00
Max Brunsfeld	e927d02f43	Allow reusing leaf nodes unless the next leaf has changes	2018-03-07 17:44:54 -08:00
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	82c7e170b3	Fix case where loop was created in the parse stack Fixes #133	2018-03-02 09:05:20 -08:00
Max Brunsfeld	46dcd53090	Do not insert missing tokens if halt_on_error option is passed	2018-01-24 14:04:55 -08:00
Max Brunsfeld	315dff3285	Add an API for getting a node's child index	2018-01-09 14:01:36 -08:00
Max Brunsfeld	f653f2b3bb	Add ts_node_first_{child,named_child}_for_byte methods	2018-01-09 13:44:59 -08:00
Max Brunsfeld	addeb6c4c1	Allocate and free trees using an object pool	2017-12-27 10:34:29 -08:00
Max Brunsfeld	0e69da37a5	Return a character count from the lexer's get_column method	2017-12-20 16:26:38 -08:00
Max Brunsfeld	36c2b685b9	Always invalidate old chunk of text when parsing after an edit	2017-10-04 15:09:46 -07:00
Max Brunsfeld	d291af9a31	Refactor error comparisons * Deal with mergeability outside of error comparison function * Make `better_version_exists` function pure (don't halt other versions as a side effect). * Tweak error comparison logic Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-13 16:38:15 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	f6325746aa	Provide symbol metadata with dummy language in stack test	2017-08-08 17:47:24 -07:00
Max Brunsfeld	cc7277fd7d	Avoid using IsNull bandit assertion	2017-08-08 12:52:35 -07:00
Max Brunsfeld	94dc703bfc	Require that grammars' start rules be visible	2017-08-04 17:07:37 -07:00
Max Brunsfeld	e5c3bf742d	Update fixture grammars	2017-08-03 16:32:39 -07:00
Max Brunsfeld	09f4796f6b	Get tests passing w/ new alias API	2017-08-01 14:35:34 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	cbdfd89675	Mark reductions as fragile based on their final properties We previously maintained a set of individual productions that were involved in conflicts, but that was subtly incorrect because we don't compare productions themselves when comparing parse items; we only compare the parse items properties that could affect the final reduce actions.	2017-07-21 09:54:24 -07:00
Max Brunsfeld	f33421c53e	Fix incorrect node renames in the presence of extra tokens	2017-07-18 21:24:34 -07:00
Max Brunsfeld	10d28d4b56	Merge pull request #92 from tree-sitter/utf16-oob Add test for UTF16 out-of-bound read	2017-07-18 17:24:31 -07:00
Phil Turnbull	52cec9ed39	Rework SpyInput buffer handling SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for catching logic errors but defeats valgrind's memory tracking. Use a separate buffer of exactly the correct size for each request. This correctly catches the problem under valgrind: ``` ==8694== Invalid read of size 2 ==8694== at 0x54EFFB: utf16_iterate (utf16.c:10) ==8694== by 0x551126: ts_lexer__get_lookahead (lexer.c:54) ==8694== by 0x5515CD: ts_lexer_start (lexer.c:154) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ==8694== by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267) ==8694== Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd ==8694== at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8694== by 0x507C3E: SpyInput::read(void, unsigned int) (spy_input.cc:66) ==8694== by 0x55103D: ts_lexer__get_chunk (lexer.c:29) ==8694== by 0x5515B6: ts_lexer_start (lexer.c:152) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ```	2017-07-18 12:16:37 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	de17c92462	Fix setup in stack test	2017-07-18 08:21:35 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Phil Turnbull	e7662c2213	Handle out-of-bound read in utf16_iterate Also simplify the test so we call `utf16_iterate` directly. Calling `utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably trigger the problem using valgrind. valgrind also doesn't detect the problem if we use a string literal like: `utf16_iterate("", 1, &code_point);`	2017-07-17 13:57:12 -07:00
Phil Turnbull	035abc1e15	Add test for UTF16 out-of-bound read utf16_iterate does not check that 'length' is a multiple of two which leads to an out-of-bound read: ==105293== Conditional jump or move depends on uninitialised value(s) ==105293== at 0x54F014: utf16_iterate (utf16.c:7) ==105293== by 0x539251: string_iterate(TSInputEncoding, unsigned char const, unsigned long, int) (encoding_helpers.cc:15) ==105293== by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43) ==105293== by 0x507BAD: SpyInput::read(void, unsigned int) (spy_input.cc:47) ==105293== by 0x551049: ts_lexer__get_chunk (lexer.c:29) ==105293== by 0x5515C2: ts_lexer_start (lexer.c:152) ==105293== by 0x5469AB: parser(long,...)(long long) (parser.c:297) ==105293== by 0x547896: parser__get_lookahead (parser.c:439) ==105293== by 0x54B2DF: parser__advance (parser.c:1150) ==105293== by 0x54C2B6: parser_parse (parser.c:1348) ==105293== by 0x53F06F: ts_document_parse_with_options (document.c:136) ==105293== by 0x53EF4F: ts_document_parse (document.c:107)	2017-07-17 12:34:39 -07:00

1 2

72 commits