Simplify error recovery; eliminate recovery states

The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
This commit is contained in:
Max Brunsfeld 2017-09-11 15:22:52 -07:00 committed by Rick Winfrey
parent 8b3941764f
commit 99d048e016
15 changed files with 327 additions and 639 deletions

View file

@ -9,9 +9,11 @@ int x // no semicolon
int a;
#ifdef __cplusplus
extern "C"
extern "C" {
#endif
int c() { return 5; }
int b;
#ifdef __cplusplus
@ -23,20 +25,23 @@ int c;
---
(translation_unit
(preproc_ifdef (identifier)
(preproc_ifdef
(identifier)
(ERROR (type_identifier) (identifier))
(comment))
(declaration (type_identifier) (identifier))
(preproc_ifdef (identifier)
(ERROR (string_literal)))
(declaration (type_identifier) (identifier))
(preproc_ifdef (identifier)
(ERROR))
(preproc_ifdef
(identifier)
(linkage_specification
(string_literal)
(declaration_list
(ERROR)
(function_definition
(type_identifier)
(function_declarator (identifier) (parameter_list))
(compound_statement (return_statement (number_literal))))
(declaration (type_identifier) (identifier))
(ERROR (identifier)))))
(declaration (type_identifier) (identifier)))
========================================
@ -76,8 +81,8 @@ int main() {
(declaration (type_identifier) (init_declarator
(identifier)
(parenthesized_expression
(ERROR (number_literal))
(number_literal)))))))
(number_literal)
(ERROR (number_literal))))))))
========================================
Errors in declarations
@ -124,13 +129,15 @@ int b() {
(compound_statement
(declaration
(type_identifier)
(ERROR (identifier))
(init_declarator
(identifier)
(ERROR (identifier) (identifier))
(ERROR (identifier))
(number_literal)))
(declaration
(type_identifier)
(ERROR (identifier))
(init_declarator
(identifier)
(ERROR (identifier) (identifier))
(ERROR (identifier))
(number_literal))))))

View file

@ -12,12 +12,13 @@ e f;
(program
(if_statement
(parenthesized_expression
(ERROR (identifier))
(identifier))
(identifier)
(ERROR (identifier)))
(statement_block
(ERROR (identifier))
(expression_statement (identifier))))
(expression_statement (ERROR (identifier)) (identifier)))
(ERROR (identifier))
(expression_statement (identifier)))
=======================================================
multiple invalid tokens right after the viable prefix
@ -33,16 +34,13 @@ h i j k;
(program
(if_statement
(parenthesized_expression
(ERROR (identifier))
(identifier)
(ERROR (identifier)))
(ERROR (identifier) (identifier)))
(statement_block
(expression_statement
(identifier)
(ERROR (jsx_attribute (property_identifier)) (jsx_attribute (property_identifier)) (identifier)))))
(expression_statement
(identifier)
(ERROR (jsx_attribute (property_identifier)) (jsx_attribute (property_identifier)) (identifier))))
(ERROR (identifier) (identifier) (identifier))
(expression_statement (identifier))))
(ERROR (identifier) (identifier) (identifier))
(expression_statement (identifier)))
===================================================
one invalid subtree right after the viable prefix
@ -136,3 +134,17 @@ var x = !!!
(function (identifier) (formal_parameters) (statement_block))
(function (identifier) (formal_parameters) (statement_block))
(ERROR (identifier)))
=========================================================
Errors inside of a template string substitution
=========================================================
const a = `b c ${d +} f g`
---
(program
(lexical_declaration
(variable_declarator
(identifier)
(template_string (template_substitution (identifier) (ERROR))))))