Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.
Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -
std::vector<std::pair<int, int>>
to not parse correctly.
This silences a true, but minor, bug in the external json-parser:
externals/json-parser/json.c:653:37: warning: Value stored to 'b' is never read
b = 0;
^ ~
This should prevent any confusing failures in the unit tests:
test/runtime/document_test.cc:381:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count')
ts_document_parse_with_options(document, options);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/runtime/document_test.cc:408:7: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'changed_range_count')
ts_document_parse_with_options(document, options);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is safe but I think it is technically undefined behaviour to use a pointer
after it has been freed:
test/helpers/record_alloc.cc:75:3: warning: Use of memory after it is freed
record_deallocation(pointer);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
parser__select_tree can return true if 'left != NULL' and 'right == NULL' which
will later cause a NULL ptr deref:
src/runtime/parser.c:842:14: warning: Access to field 'ref_count' results in a dereference of a null pointer (loaded from variable 'root')
assert(root->ref_count > 0);
^~~~~~~~~~~~~~~
Because repair_reduction_count is unsigned, the default of '-1' is 0xffffffff
and will cause the loop to be entered if repair_reduction_count is NULL:
src/runtime/parser.c:691:11: warning: Dereference of null pointer
if (repair_reductions[j].params.symbol == repair->symbol) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:
==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
#0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
#1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
#0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
#1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
#2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
#3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
#4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
#5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00