Commit graph

59 commits

Author SHA1 Message Date
Max Brunsfeld
666dfb76d2 Remove document parameter from ts_node_type, ts_node_string
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2018-05-09 16:47:47 -07:00
Max Brunsfeld
92255bbfdd Remove document parameter from ts_node_type, ts_node_string
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2018-05-09 15:28:28 -07:00
Max Brunsfeld
b06747b6ca Remove stale unit tests
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2018-05-09 14:14:42 -07:00
Max Brunsfeld
973e4a44f0 Start work on removing parent pointers
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2018-05-09 12:22:19 -07:00
Max Brunsfeld
d5cfc06fa2 Fix unit test for invalid utf8 at EOF 2018-04-17 17:33:45 -07:00
Max Brunsfeld
09be0b6ef5 Store trees' children in TreeArrays, not w/ separate pointer and length 2018-04-06 13:26:18 -07:00
Max Brunsfeld
a6cf2e87e7 Fix halt_on_error tests 2018-04-06 13:26:18 -07:00
Max Brunsfeld
dbe77e7199 Simplify testing-only ts_stack_iterate function 2018-03-29 17:50:07 -07:00
Max Brunsfeld
5520983144 Clean up Stack API
* Remove StackPopResult
* Rename top_state() -> state()
* Rename top_position() -> position()
* Improve docs
2018-03-29 17:37:54 -07:00
Max Brunsfeld
ee995c3d6b Avoid redundant retains/releases by giving ts_stack_push move semantics 2018-03-29 17:18:43 -07:00
Max Brunsfeld
0810971f3e 🔥 symbol iterator API
This idea was never fully baked.
2018-03-08 14:16:37 -08:00
Max Brunsfeld
e927d02f43 Allow reusing leaf nodes unless the next leaf has changes 2018-03-07 17:44:54 -08:00
Max Brunsfeld
52087de4f0 Remove the concept of fragile reductions
They were a vestige of when Tree-sitter did sentential form-based
incremental parsing (as opposed to simply state matching). This was
elegant but not compatible with GLR as far as I could tell.
2018-03-02 14:51:54 -08:00
Max Brunsfeld
82c7e170b3 Fix case where loop was created in the parse stack
Fixes #133
2018-03-02 09:05:20 -08:00
Max Brunsfeld
46dcd53090 Do not insert missing tokens if halt_on_error option is passed 2018-01-24 14:04:55 -08:00
Max Brunsfeld
315dff3285 Add an API for getting a node's child index 2018-01-09 14:01:36 -08:00
Max Brunsfeld
f653f2b3bb Add ts_node_first_{child,named_child}_for_byte methods 2018-01-09 13:44:59 -08:00
Max Brunsfeld
addeb6c4c1 Allocate and free trees using an object pool 2017-12-27 10:34:29 -08:00
Max Brunsfeld
0e69da37a5 Return a character count from the lexer's get_column method 2017-12-20 16:26:38 -08:00
Max Brunsfeld
36c2b685b9 Always invalidate old chunk of text when parsing after an edit 2017-10-04 15:09:46 -07:00
Max Brunsfeld
d291af9a31 Refactor error comparisons
* Deal with mergeability outside of error comparison function
* Make `better_version_exists` function pure (don't halt other versions
as a side effect).
* Tweak error comparison logic

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-13 16:38:15 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
f6325746aa Provide symbol metadata with dummy language in stack test 2017-08-08 17:47:24 -07:00
Max Brunsfeld
cc7277fd7d Avoid using IsNull bandit assertion 2017-08-08 12:52:35 -07:00
Max Brunsfeld
94dc703bfc Require that grammars' start rules be visible 2017-08-04 17:07:37 -07:00
Max Brunsfeld
e5c3bf742d Update fixture grammars 2017-08-03 16:32:39 -07:00
Max Brunsfeld
09f4796f6b Get tests passing w/ new alias API 2017-08-01 14:35:34 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
cbdfd89675 Mark reductions as fragile based on their final properties
We previously maintained a set of individual productions that were
involved in conflicts, but that was subtly incorrect because
we don't compare productions themselves when comparing parse items;
we only compare the parse items properties that could affect the
final reduce actions.
2017-07-21 09:54:24 -07:00
Max Brunsfeld
f33421c53e Fix incorrect node renames in the presence of extra tokens 2017-07-18 21:24:34 -07:00
Max Brunsfeld
10d28d4b56 Merge pull request #92 from tree-sitter/utf16-oob
Add test for UTF16 out-of-bound read
2017-07-18 17:24:31 -07:00
Phil Turnbull
52cec9ed39 Rework SpyInput buffer handling
SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for
catching logic errors but defeats valgrind's memory tracking. Use a separate
buffer of exactly the correct size for each request. This correctly catches the
problem under valgrind:

```
==8694== Invalid read of size 2
==8694==    at 0x54EFFB: utf16_iterate (utf16.c:10)
==8694==    by 0x551126: ts_lexer__get_lookahead (lexer.c:54)
==8694==    by 0x5515CD: ts_lexer_start (lexer.c:154)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
==8694==    by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267)
==8694==  Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd
==8694==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8694==    by 0x507C3E: SpyInput::read(void*, unsigned int*) (spy_input.cc:66)
==8694==    by 0x55103D: ts_lexer__get_chunk (lexer.c:29)
==8694==    by 0x5515B6: ts_lexer_start (lexer.c:152)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
```
2017-07-18 12:16:37 -07:00
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
de17c92462 Fix setup in stack test 2017-07-18 08:21:35 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Phil Turnbull
e7662c2213 Handle out-of-bound read in utf16_iterate
Also simplify the test so we call `utf16_iterate` directly. Calling
`utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably
trigger the problem using valgrind.

valgrind also doesn't detect the problem if we use a string literal like:
  `utf16_iterate("", 1, &code_point);`
2017-07-17 13:57:12 -07:00
Phil Turnbull
035abc1e15 Add test for UTF16 out-of-bound read
utf16_iterate does not check that 'length' is a multiple of two which leads to
an out-of-bound read:

==105293== Conditional jump or move depends on uninitialised value(s)
==105293==    at 0x54F014: utf16_iterate (utf16.c:7)
==105293==    by 0x539251: string_iterate(TSInputEncoding, unsigned char const*, unsigned long, int*) (encoding_helpers.cc:15)
==105293==    by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43)
==105293==    by 0x507BAD: SpyInput::read(void*, unsigned int*) (spy_input.cc:47)
==105293==    by 0x551049: ts_lexer__get_chunk (lexer.c:29)
==105293==    by 0x5515C2: ts_lexer_start (lexer.c:152)
==105293==    by 0x5469AB: parser(long,...)(long long) (parser.c:297)
==105293==    by 0x547896: parser__get_lookahead (parser.c:439)
==105293==    by 0x54B2DF: parser__advance (parser.c:1150)
==105293==    by 0x54C2B6: parser_parse (parser.c:1348)
==105293==    by 0x53F06F: ts_document_parse_with_options (document.c:136)
==105293==    by 0x53EF4F: ts_document_parse (document.c:107)
2017-07-17 12:34:39 -07:00
Max Brunsfeld
4b40a1ed6c Support anonymous tokens inside of RENAME rules 2017-07-14 10:19:58 -07:00
Max Brunsfeld
8f028ebf68 Avoid deep tree comparison when both trees have errors 2017-07-05 17:33:35 -07:00
Max Brunsfeld
d322f0b6a7 🎨 2017-07-04 21:59:54 -07:00
Max Brunsfeld
a89322c5f1 Remove unneeded parameters from public interface of stack_iterate callback 2017-06-29 16:43:56 -07:00
Max Brunsfeld
66be393b78 Stack - consider empty external token state identical to NULL 2017-06-29 15:00:20 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
f62ee5a0f3 Fix OOB reads at ends of chunks
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:09:16 -07:00
Max Brunsfeld
513edec7c1 Merge pull request #77 from philipturnbull/scan-build-fixes
Fix errors found by scan-build
2017-06-20 10:15:20 -07:00
Max Brunsfeld
c66fddd3aa Add TSInput option to measure columns in bytes not characters 2017-06-15 16:35:34 -07:00
Max Brunsfeld
b862db766e Merge remote-tracking branch 'origin/master' into update-fixture-grammars 2017-06-14 17:11:44 -07:00
Phil Turnbull
18f261ad51 Initialise all fields of TSParseOptions in tests
This should prevent any confusing failures in the unit tests:

test/runtime/document_test.cc:381:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count')
      ts_document_parse_with_options(document, options);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/runtime/document_test.cc:408:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count')
      ts_document_parse_with_options(document, options);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-06-14 11:12:06 -04:00
Max Brunsfeld
74f5ceddf7 Fix parsing of valid code with halt_on_error flag set
Signed-off-by: Tim Clem <timothy.clem@gmail.com>
2017-05-01 14:25:25 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00