The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.
This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.
This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.
Signed-off-by: Rick Winfrey <rewinfrey@github.com>
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.
Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
parser__select_tree can return true if 'left != NULL' and 'right == NULL' which
will later cause a NULL ptr deref:
src/runtime/parser.c:842:14: warning: Access to field 'ref_count' results in a dereference of a null pointer (loaded from variable 'root')
assert(root->ref_count > 0);
^~~~~~~~~~~~~~~
Because repair_reduction_count is unsigned, the default of '-1' is 0xffffffff
and will cause the loop to be entered if repair_reduction_count is NULL:
src/runtime/parser.c:691:11: warning: Dereference of null pointer
if (repair_reductions[j].params.symbol == repair->symbol) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we already have a stack version in which, for example,
a `function_call` is skipped, don't create another stack
version in which that `function_call` is reduced to an
`expression`, and then the `expression` is skipped. That
doesn't improve the error recovery at all, but adds to the
branching factor of the parse stack and makes things harder
to debug.