We previously maintained a set of individual productions that were
involved in conflicts, but that was subtly incorrect because
we don't compare productions themselves when comparing parse items;
we only compare the parse items properties that could affect the
final reduce actions.
For some reason, there was previously some extra logic that prevented
the external scanner from being invoked if the only valid external
token also had an internal definition.
It's surprising to not call the external scanner if an external
token is valid.
This speeds up parser generation by increasing the likelihood that we'll recognize
parse item sets as equivalent in advance, rather than having to merge their states
after the fact.
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -
std::vector<std::pair<int, int>>
to not parse correctly.
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:
==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
#0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
#1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
#0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
#1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
#2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
#3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
#4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
#5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00