Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for *any* token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>
This commit is contained in:
parent
8b3941764f
commit
99d048e016
15 changed files with 327 additions and 639 deletions
|
|
@ -166,7 +166,7 @@ describe("Parser", [&]() {
|
|||
ts_document_set_language(document, load_real_language("javascript"));
|
||||
set_text("a; ' this string never ends");
|
||||
assert_root_node(
|
||||
"(ERROR (program (expression_statement (identifier))) (UNEXPECTED EOF))");
|
||||
"(program (expression_statement (identifier)) (ERROR (UNEXPECTED EOF)))");
|
||||
});
|
||||
});
|
||||
|
||||
|
|
@ -198,7 +198,7 @@ describe("Parser", [&]() {
|
|||
|
||||
free(string);
|
||||
|
||||
assert_root_node("(ERROR (UNEXPECTED INVALID))");
|
||||
assert_root_node("(program (ERROR (UNEXPECTED INVALID)))");
|
||||
});
|
||||
});
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue