* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.
Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
parser__select_tree can return true if 'left != NULL' and 'right == NULL' which
will later cause a NULL ptr deref:
src/runtime/parser.c:842:14: warning: Access to field 'ref_count' results in a dereference of a null pointer (loaded from variable 'root')
assert(root->ref_count > 0);
^~~~~~~~~~~~~~~
Because repair_reduction_count is unsigned, the default of '-1' is 0xffffffff
and will cause the loop to be entered if repair_reduction_count is NULL:
src/runtime/parser.c:691:11: warning: Dereference of null pointer
if (repair_reductions[j].params.symbol == repair->symbol) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we already have a stack version in which, for example,
a `function_call` is skipped, don't create another stack
version in which that `function_call` is reduced to an
`expression`, and then the `expression` is skipped. That
doesn't improve the error recovery at all, but adds to the
branching factor of the parse stack and makes things harder
to debug.
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.