The current pretty conservative approach is to avoid merging parse states which
would cause a pair tokens to co-exist for the first time in any parse state,
where the two tokens can start with the same character and at least one of the
tokens can contain a character which is part of the grammar's separators.
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
also compute pairs of tokens which could potentially be incompatible
* While generating the lex table, note which tokens can match the
same string. A token needs to be relexed when it has possible
homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
A token needs to be relexed when there are viable tokens that
could match longer strings in the current state and the next
token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
actions and onto the entire entry for the state & symbol.
* Use GLR stack-splitting to try all numbers of tokens to
discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
rather than the statically-computed 'in-progress symbols' list
A precedence annotation wrapping a sequence of characters now only affects how
tightly those characters bind to *each other*, not how tightly they bind to the
preceding character.
This bug surfaced because a generated lexer was failing to recognize a '\n' character
as a token, instead treating it as ubiquitous whitespace. It made this error
because, even though anonymous ubiquitous tokens have the lowest precedence, the
character immediately *after* the '\n' was part of a normal token, which had
*normal* precedence (0). Advancing into that following token was incorrectly
prioritized above accepting the line-break token.