Commit graph

157 commits

Author SHA1 Message Date
Max Brunsfeld
c39f0e9ef9 Rename word_rule -> word_token 2018-06-15 09:15:12 -07:00
Max Brunsfeld
6e72c2943d Avoid missing field initializer warnings w/o default field syntax
The default field syntax aint working on windows
2018-06-14 11:12:04 -07:00
Max Brunsfeld
e17cd42e47 Perform keyword optimization using explicitly selected word token
rather than trying to infer the word token automatically.

Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 09:35:54 -07:00
Max Brunsfeld
6bb63f549f Fix unused lambda captures 2018-05-11 14:59:59 -07:00
Max Brunsfeld
1ca261c79b Fix some regex parsing bugs
* Allow escape sequences to be used in ranges
* Don't give special meaning to dashes outside of character classes
2018-04-06 12:46:06 -07:00
Pieter Goetschalckx
3d0ca31cf1 Add error message for TSCompileErrorTypeInvalidTokenContents 2018-03-30 21:12:09 +02:00
Max Brunsfeld
43e14332ed Avoid creating duplicate metadata rules 2018-03-28 10:03:09 -07:00
Max Brunsfeld
b7d0606fbd Be less conservative in merging parse states with external tokens
Also, clean up the internal representation of external tokens
2018-03-16 16:00:40 -07:00
Max Brunsfeld
8c29841adf Represent repetitions with associative structure 2018-02-12 11:41:56 -08:00
Max Brunsfeld
2e4f76c164 Don't allow an epsilon start rule if it is used in other rules 2018-01-23 17:05:28 -08:00
Max Brunsfeld
532bbeca0d Remove wrong handling of \a in a regex 2017-12-12 16:50:53 -08:00
Max Brunsfeld
493db39363 Never move the start rule of a grammar into the lexical grammar
This preserves a useful invariant that the root node of the AST is never
a token.
2017-12-07 11:50:27 -08:00
Max Brunsfeld
5f40adb70c Recur to sub-rules in a deterministic order in expand_repeats 2017-08-08 17:20:04 -07:00
Max Brunsfeld
94dc703bfc Require that grammars' start rules be visible 2017-08-04 17:07:37 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
d646889922 Simplify flatten_rule function 2017-07-13 09:59:23 -07:00
Max Brunsfeld
65bf1389e1 Add a way to automatically inline rules 2017-07-11 23:13:44 -07:00
Max Brunsfeld
0de93b3bf2 Allow negative dynamic precedences 2017-07-06 22:21:59 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
ed8fbff175 Allow anonymous tokens to be used in grammars' external token lists 2017-03-17 16:31:29 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
416cbb9def Add missing cassert includes 2017-03-17 13:54:40 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
c79fae6d21 Clean up extract_tokens function 2017-03-09 21:16:20 -08:00
Max Brunsfeld
abf8a4f2c2 🎨 2017-03-01 22:15:26 -08:00
Max Brunsfeld
686dc0997c Avoid introducing certain lexical conflicts during parse state merging
The current pretty conservative approach is to avoid merging parse states which
would cause a pair tokens to co-exist for the first time in any parse state,
where the two tokens can start with the same character and at least one of the
tokens can contain a character which is part of the grammar's separators.
2017-02-27 22:54:38 -08:00
Timothy Clem
ab00f1b0da Add support for \W and \D negated character classes too 2017-01-31 15:03:48 -08:00
Timothy Clem
902b7f9745 Allow \S for negated whitespace regex shorthand 2017-01-31 14:45:28 -08:00
Max Brunsfeld
4131e1c16e Return an error when external token name matches non-terminal rule 2017-01-31 11:36:51 -08:00
Max Brunsfeld
42c41c158c Refactor logic for handling shared internal/external tokens 2016-12-21 10:49:55 -08:00
Max Brunsfeld
a1770ce844 Allow external tokens to be used as extras 2016-12-12 22:06:01 -08:00
Max Brunsfeld
10b51a05a1 Allow external scanners to refer to (and return) internally-defined tokens
Tokens that are defined in the grammar's rules may now be included in the
externals list also, so that external scanners can check if they are valid
lookaheads or not, and if so, can return them to the parser if needed.
2016-12-09 13:32:58 -08:00
Max Brunsfeld
83514293b5 Allow external tokens to be either visible or hidden 2016-12-05 17:26:11 -08:00
Max Brunsfeld
49d25bd0f8 Remove EXTERNAL_TOKEN grammar rule type 2016-12-04 15:02:32 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
996ca91e70 Disallow syntax rules that match the empty string (for now) 2016-11-30 23:19:54 -08:00
Max Brunsfeld
6cf4ccb840 Represent rule metadata as a struct, not a map 2016-11-19 13:59:34 -08:00
Max Brunsfeld
32387400c6 Rework LR conflict resolution
* Unify precedence/associativity-based resolution with the
  search for a whitelisted conflict
* Improve conflict error messages
2016-11-18 13:50:55 -08:00
Max Brunsfeld
7bcae8f6a8 🎨 flatten_grammar 2016-11-09 20:29:21 -08:00
Max Brunsfeld
cf19b2e58d Make repeat rules left-recursive instead of right recursive 2016-04-18 12:40:14 -07:00
Max Brunsfeld
d4632ab9a9 Make the compile function plain C and take a JSON grammar 2016-01-11 12:33:48 -08:00
Max Brunsfeld
36870bfced Make Grammar a simple struct 2016-01-08 15:51:30 -08:00
Max Brunsfeld
1c6ad5f7e4 Rename ubiquitous_tokens -> extra_tokens in compiler API
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
c495076adb Record in parse table which actions can hide splits
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
75f31a79a3 Treat reduce actions with different production IDs as distinct 2015-12-10 13:00:26 -08:00
Max Brunsfeld
53424699e4 Comment all the steps of prepare_grammar 2015-12-02 14:56:59 -08:00
Max Brunsfeld
d5ce268074 Fix handling of changing precedence within lexical rules.
A precedence annotation wrapping a sequence of characters now only affects how
tightly those characters bind to *each other*, not how tightly they bind to the
preceding character.

This bug surfaced because a generated lexer was failing to recognize a '\n' character
as a token, instead treating it as ubiquitous whitespace. It made this error
because, even though anonymous ubiquitous tokens have the lowest precedence, the
character immediately *after* the '\n' was part of a normal token, which had
*normal* precedence (0). Advancing into that following token was incorrectly
prioritized above accepting the line-break token.
2015-11-08 13:36:15 -08:00
Max Brunsfeld
d6ee28abd0 Make precedence more useful within tokens
Choose accept-token actions over advance actions if their rule has a higher precedence.
2015-11-01 12:48:27 -08:00
Max Brunsfeld
b61b27f22f Handle inline ubiquitous that are used elsewhere in the grammar 2015-10-26 17:19:37 -07:00