Timothy Clem
ab00f1b0da
Add support for \W and \D negated character classes too
2017-01-31 15:03:48 -08:00
Timothy Clem
902b7f9745
Allow \S for negated whitespace regex shorthand
2017-01-31 14:45:28 -08:00
Max Brunsfeld
4131e1c16e
Return an error when external token name matches non-terminal rule
2017-01-31 11:36:51 -08:00
Max Brunsfeld
42c41c158c
Refactor logic for handling shared internal/external tokens
2016-12-21 10:49:55 -08:00
Max Brunsfeld
a09409900f
Silence missing intializer warnings in compiler unit tests
2016-12-05 16:37:06 -08:00
Max Brunsfeld
0f8e130687
Call external scanner functions when lexing
2016-12-02 22:03:48 -08:00
Max Brunsfeld
c966af0412
Start work on external tokens
2016-12-02 16:24:19 -08:00
Max Brunsfeld
6cf4ccb840
Represent rule metadata as a struct, not a map
2016-11-19 13:59:34 -08:00
Max Brunsfeld
32387400c6
Rework LR conflict resolution
...
* Unify precedence/associativity-based resolution with the
search for a whitelisted conflict
* Improve conflict error messages
2016-11-18 13:50:55 -08:00
Max Brunsfeld
1118a9142a
Introduce Symbol::Index type alias
2016-11-14 10:25:26 -08:00
Max Brunsfeld
fad7294ba4
Store shift states for non-terminals directly in the main parse table
2016-11-14 08:36:06 -08:00
Max Brunsfeld
8d9c261e3a
Don't include reduce actions for nonterminal lookaheads
2016-11-10 11:33:37 -08:00
Max Brunsfeld
7bcae8f6a8
🎨 flatten_grammar
2016-11-09 20:29:21 -08:00
Timothy Clem
693c6d40dd
Move setup of mergeable_symbols to constructor, use set throughout
2016-10-18 15:18:33 -07:00
Max Brunsfeld
b76574e01c
Handle ambiguities between extra and non-extra tokens using normal GLR splitting
2016-09-06 10:22:16 -07:00
Max Brunsfeld
38c144b4a3
Refine logic for deciding when tokens need to be re-lexed
...
* While generating the lex table, note which tokens can match the
same string. A token needs to be relexed when it has possible
homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
A token needs to be relexed when there are viable tokens that
could match longer strings in the current state and the next
token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
a3679fbb1f
Distinguish separators from main tokens via a property on transitions
...
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
59712ec492
Clean up lex table generation
2016-05-19 13:25:46 -07:00
Max Brunsfeld
22c550c9d6
Discard tokens after error detection to find the best repair
...
* Use GLR stack-splitting to try all numbers of tokens to
discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
5b74813a5c
Refine logic for which tokens to use in error recovery
2016-04-27 14:09:19 -07:00
Max Brunsfeld
cf19b2e58d
Make repeat rules left-recursive instead of right recursive
2016-04-18 12:40:14 -07:00
Max Brunsfeld
76d072545d
Include out-of-context states starting with non-terminals
2016-03-02 20:58:39 -08:00
Max Brunsfeld
8c01b70ce7
Don't skip tokens that are not the start of any non-terminal
2016-03-02 20:56:05 -08:00
Max Brunsfeld
dee1f697c1
Compute the set of variables that can begin with each terminal symbol
2016-02-25 21:51:52 -08:00
Max Brunsfeld
6401a065ae
Use different types for advance and accept-token actions
...
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
0f7dbea9a3
Unify test targets, use externally defined languages as fixtures
2016-01-15 11:19:24 -08:00
Max Brunsfeld
ad4089a4bf
Move anonymous tokens grammar into integration spec
2016-01-14 10:35:03 -08:00
Max Brunsfeld
4a5deda071
Add tests that compile a grammar and use its parser
2016-01-14 10:11:30 -08:00
Max Brunsfeld
d4632ab9a9
Make the compile function plain C and take a JSON grammar
2016-01-11 12:33:48 -08:00
Max Brunsfeld
36870bfced
Make Grammar a simple struct
2016-01-08 15:51:30 -08:00
Max Brunsfeld
1c6ad5f7e4
Rename ubiquitous_tokens -> extra_tokens in compiler API
...
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
f065eb0480
Remove unused parameter to LexConflictManager
2015-12-17 15:45:47 -08:00
Max Brunsfeld
a8d2585330
Fix resolution of shift-extra vs reduce actions
2015-12-17 15:19:58 -08:00
Max Brunsfeld
351b4f4aaa
Remove unused parameters to ParseConflictManager
2015-12-17 15:19:00 -08:00
Max Brunsfeld
c495076adb
Record in parse table which actions can hide splits
...
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
d713054d61
Record which tokens are fragile when lexing
2015-12-10 21:05:54 -08:00
Max Brunsfeld
75f31a79a3
Treat reduce actions with different production IDs as distinct
2015-12-10 13:00:26 -08:00
Max Brunsfeld
e11515fb74
Escape backslashes and quotes in symbol name strings
2015-11-09 09:33:24 -08:00
Max Brunsfeld
d5ce268074
Fix handling of changing precedence within lexical rules.
...
A precedence annotation wrapping a sequence of characters now only affects how
tightly those characters bind to *each other*, not how tightly they bind to the
preceding character.
This bug surfaced because a generated lexer was failing to recognize a '\n' character
as a token, instead treating it as ubiquitous whitespace. It made this error
because, even though anonymous ubiquitous tokens have the lowest precedence, the
character immediately *after* the '\n' was part of a normal token, which had
*normal* precedence (0). Advancing into that following token was incorrectly
prioritized above accepting the line-break token.
2015-11-08 13:36:15 -08:00
Max Brunsfeld
d7cb48aae7
Fix handling of precedence for repeat rules
2015-11-01 21:00:44 -08:00
Max Brunsfeld
d6ee28abd0
Make precedence more useful within tokens
...
Choose accept-token actions over advance actions if their rule has a higher precedence.
2015-11-01 12:48:27 -08:00
Max Brunsfeld
998ae533da
Make completion_status() a method on LexItem
2015-10-30 16:48:37 -07:00
Max Brunsfeld
c8be143f65
🔥 get_metadata function
2015-10-30 16:22:25 -07:00
Max Brunsfeld
73b3280fbb
Include precedence calculation in LexItemSet::transitions
2015-10-30 16:07:29 -07:00
Max Brunsfeld
e9be0ff24e
Make completion_status() a method on ParseItem
2015-10-30 14:07:33 -07:00
Max Brunsfeld
4850384b78
Include precedence calculation in ParseItemSet::transitions
2015-10-30 13:54:11 -07:00
Max Brunsfeld
433f060a5b
Fix stream overloads for inspecting PrecedenceRange and ParseItem
2015-10-30 10:45:46 -07:00
Max Brunsfeld
58b5a10607
Fix ParseItemSet::transitions spec description
2015-10-29 12:19:44 -07:00
Max Brunsfeld
a8ead10d6f
In lex error state, don't look for tokens that would match *any* line
2015-10-28 17:45:17 -07:00
Max Brunsfeld
b61b27f22f
Handle inline ubiquitous that are used elsewhere in the grammar
2015-10-26 17:19:37 -07:00