Commit graph

654 commits

Author SHA1 Message Date
Max Brunsfeld
285f2272fd Move random string helpers into a separate file 2016-07-17 06:22:05 -07:00
Max Brunsfeld
c3a242740b Allow lookahead to be broken down further after performing reductions 2016-07-04 12:20:23 -07:00
Max Brunsfeld
8c26d99353 Store error recovery actions in the normal parse table 2016-06-27 14:07:47 -07:00
Max Brunsfeld
08e47001f1 Silence mismatched delete warning in spec helper 2016-06-27 13:38:49 -07:00
Max Brunsfeld
9538b5b879 Don't count extra trees toward stack versions' error costs 2016-06-26 22:46:40 -07:00
Max Brunsfeld
6fd3edceae Fix logic for inserting leading & trailing extras into root node on acceptance 2016-06-26 11:57:42 -07:00
Max Brunsfeld
9972709e43 Allow error recovery to skip non-terminal nodes after error detection 2016-06-24 10:28:05 -07:00
Max Brunsfeld
09b019c530 Fix test for invalid blank input 2016-06-23 09:24:26 -07:00
Max Brunsfeld
c6e9b32d3f Print all the same parse log messages for both debugging methods 2016-06-22 22:36:11 -07:00
Max Brunsfeld
f425fbad18 Break down reused node on stack whenever lookahead can't be reused 2016-06-22 22:03:27 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
773e50f26b Update error recovery specs to reflect slightly different recoveries 2016-06-18 20:46:16 -07:00
Max Brunsfeld
94721c7ec0 Rewind and re-tokenize in error mode after detecting an error 2016-06-17 21:26:03 -07:00
Max Brunsfeld
70d3cde775 Remove extra leading newline from corpus spec texts 2016-06-15 10:31:34 -07:00
Max Brunsfeld
ecc7399ed3 Fix stack breakdown procedure when there are trailing extra tokens 2016-06-14 20:25:33 -07:00
Max Brunsfeld
e70547cd11 Allow recoveries that skip leading children of invisible trees
Before this, errors could only be recovered by skipping internal children.
2016-06-14 14:48:35 -07:00
Max Brunsfeld
00a0939504 Abort erroneous parse versions more eagerly 2016-06-02 14:04:48 -07:00
Max Brunsfeld
9b67b21dcd Fix an outdated error corpus entry 2016-06-02 14:04:10 -07:00
Max Brunsfeld
ea47fdc0fe Rework logic for when to abandon parses with errors 2016-05-29 22:36:47 -07:00
Max Brunsfeld
6535704870 Replace stack_merge_new function with two simpler functions
- merge(version1, version2)
- split(version)
2016-05-28 21:22:10 -07:00
Max Brunsfeld
e686478ad2 Rename stack_merge function to stack_merge_all 2016-05-28 20:24:08 -07:00
Max Brunsfeld
e1a3a1daeb Import error corpus entries from grammar repos
Now that error recovery requires no input for the grammar author, it shouldn't
be tested in the individual grammar repos.
2016-05-28 20:12:02 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
59712ec492 Clean up lex table generation 2016-05-19 13:25:46 -07:00
Max Brunsfeld
88053cf723 In tests, don’t record allocations while printing debug graphs 2016-05-16 10:44:19 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
5b74813a5c Refine logic for which tokens to use in error recovery 2016-04-27 14:09:19 -07:00
Max Brunsfeld
fd4c33209e Select ambiguous alternatives by minimizing error size 2016-04-24 00:54:20 -07:00
Max Brunsfeld
0d19f157ed Adjust some spec assertions to reflect finer-grained error recoveries 2016-04-22 10:19:44 -07:00
Max Brunsfeld
cf19b2e58d Make repeat rules left-recursive instead of right recursive 2016-04-18 12:40:14 -07:00
Max Brunsfeld
655d374d0c Recompile test languages if parser.h changes 2016-04-18 11:17:06 -07:00
Max Brunsfeld
cad663b144 Consider multiple error repairs on the same path of the stack
This changes the API to the stack_iterate function so that you can pop
from the stack without stopping iteration
2016-04-15 21:28:00 -07:00
Max Brunsfeld
695be5bc79 Merge equivalent stacks in a separate stage of parsing
* No more automatic merging every time a state is pushed to the stack
* When popping from the stack, the current version is always preserved
2016-04-10 14:12:24 -07:00
Max Brunsfeld
5ba40f15ad Rename stack heads to versions 2016-04-04 12:25:57 -07:00
Max Brunsfeld
6bce6da1e6 Store verifying flag within parse stack 2016-03-31 12:03:21 -07:00
Max Brunsfeld
e7d3d40a59 Explicitly inform stack pop callback when the stack is exhausted
Also, pass non-extra tree count as a single value, rather than keeping
track of the extra count and the total separately.
2016-03-10 11:51:55 -08:00
Max Brunsfeld
240355b04c Make test for allocation failure handling fail more gracefully 2016-03-10 11:36:26 -08:00
Max Brunsfeld
4f726da881 Fix logic for whether to regenerate parsers in specs 2016-03-10 11:35:59 -08:00
Max Brunsfeld
2e35587161 Use new stack_pop_until function for repairing errors 2016-03-07 20:06:46 -08:00
Max Brunsfeld
4348eb89d4 Expose lower stack nodes via pop_until() function
This callback-based API allows the parser to easily visit each interior node
of the stack when searching for an error repair. It also is a better abstraction
over the stack's DAG implementation than having the public functions for
accessing entries and their successor entries.
2016-03-07 16:09:34 -08:00
Max Brunsfeld
bc8df9f5c5 Avoid recompiling test languages when possible 2016-03-03 12:05:04 -08:00
Max Brunsfeld
c0595c21c5 Halt stack pops at all error states, not just error trees 2016-03-03 11:05:37 -08:00
Max Brunsfeld
3d516aeeec Give StackPushResult enumerators shorter names 2016-03-03 10:20:05 -08:00
Max Brunsfeld
8a13b5d120 Rename StackPopResult -> StackSlice 2016-03-03 10:16:10 -08:00
Max Brunsfeld
aef7582a2a Start using the forward move to recover from errors
Some unit tests passing. Corpus tests still failing
2016-03-02 21:08:42 -08:00
Max Brunsfeld
76d072545d Include out-of-context states starting with non-terminals 2016-03-02 20:58:39 -08:00
Max Brunsfeld
e7abfdd373 Prevent string assertion failures from creating later memory leak errors 2016-03-02 20:58:39 -08:00
Max Brunsfeld
8c01b70ce7 Don't skip tokens that are not the start of any non-terminal 2016-03-02 20:56:05 -08:00
Max Brunsfeld
dee1f697c1 Compute the set of variables that can begin with each terminal symbol 2016-02-25 21:51:52 -08:00