They were a vestige of when Tree-sitter did sentential form-based
incremental parsing (as opposed to simply state matching). This was
elegant but not compatible with GLR as far as I could tell.
Do not merge a token T into a parse state S if S contains
external tokens that can be *followed* by tokens that could
be shadowed by T.
At this point, the only automated test for this logic is via
the bash grammar, in which the `]` token should not be merged
into states in which `_concat` is valid, because `_concat`
can be followed by a `_special_characters` token, and `]`
would shadow `_special_characters`.
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.
This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.
This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.
Signed-off-by: Rick Winfrey <rewinfrey@github.com>