Commit graph

303 commits

Author SHA1 Message Date
Max Brunsfeld
20982fdcb9 Mark tokens as non-reusable in states where shorter takes take precedence
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
2017-06-22 16:04:42 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Phil Turnbull
fdd8792ebc Correctly set is_first
From scan-build: Value stored to 'is_first' is never read
2017-06-14 11:12:06 -04:00
Max Brunsfeld
ed8fbff175 Allow anonymous tokens to be used in grammars' external token lists 2017-03-17 16:31:29 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
d9fb863bea Fix build errors w/ gcc 2017-03-17 14:03:49 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
f049d5d94c Make ParseItem a struct, not a class 2017-03-08 21:06:30 -08:00
Max Brunsfeld
64e9230071 Use LexTableBuilder to detect conflicts between tokens more correctly 2017-03-08 12:47:38 -08:00
Max Brunsfeld
abf8a4f2c2 🎨 2017-03-01 22:15:26 -08:00
Max Brunsfeld
686dc0997c Avoid introducing certain lexical conflicts during parse state merging
The current pretty conservative approach is to avoid merging parse states which
would cause a pair tokens to co-exist for the first time in any parse state,
where the two tokens can start with the same character and at least one of the
tokens can contain a character which is part of the grammar's separators.
2017-02-27 22:54:38 -08:00
Max Brunsfeld
3c8e6f9987 Restructure parse state merging logic
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
  also compute pairs of tokens which could potentially be incompatible
2017-02-26 12:23:48 -08:00
Max Brunsfeld
0a6e5f9ee6 Fix some build warnings on gcc 2017-01-31 11:46:28 -08:00
Max Brunsfeld
83514293b5 Allow external tokens to be either visible or hidden 2016-12-05 17:26:11 -08:00
Max Brunsfeld
c16b6b2059 Run external scanners during error recovery 2016-12-05 11:50:24 -08:00
Max Brunsfeld
d72b49316b Handle external tokens in apply_transitive_closure 2016-12-04 10:40:32 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
be9e79db1b Avoid incorrect application of precedence 2016-12-01 10:24:06 -08:00
Max Brunsfeld
996ca91e70 Disallow syntax rules that match the empty string (for now) 2016-11-30 23:19:54 -08:00
Max Brunsfeld
101e304a8a Avoid unnecessary lookahead set mutations in ParseItemSetBuilder 2016-11-20 21:41:36 -08:00
Max Brunsfeld
06215607d1 Precompute transitive closure contributions by grammar symbol 2016-11-20 11:49:55 -08:00
Max Brunsfeld
5332fd3418 Fix build warnings 2016-11-19 20:47:43 -08:00
Max Brunsfeld
6cf4ccb840 Represent rule metadata as a struct, not a map 2016-11-19 13:59:34 -08:00
Max Brunsfeld
cab1bd3ac5 Make conflict messages explicit about precedence combinations 2016-11-18 17:05:16 -08:00
Max Brunsfeld
5924285e69 🎨 2016-11-18 16:14:05 -08:00
Max Brunsfeld
32387400c6 Rework LR conflict resolution
* Unify precedence/associativity-based resolution with the
  search for a whitelisted conflict
* Improve conflict error messages
2016-11-18 13:50:55 -08:00
Max Brunsfeld
6935f1d26f Use hash_combine everywhere 2016-11-16 11:46:22 -08:00
Max Brunsfeld
6cfd009503 Compute parse state group signature based on the item set 2016-11-16 10:21:30 -08:00
Max Brunsfeld
42d37656ea Optimize remove_duplicate_parse_states method
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-11-15 17:51:52 -08:00
Max Brunsfeld
1118a9142a Introduce Symbol::Index type alias 2016-11-14 10:25:26 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
8d9c261e3a Don't include reduce actions for nonterminal lookaheads 2016-11-10 11:33:37 -08:00
Max Brunsfeld
255bc2427c 🎨 build_parse_table 2016-11-09 20:47:47 -08:00
Timothy Clem
693c6d40dd Move setup of mergeable_symbols to constructor, use set throughout 2016-10-18 15:18:33 -07:00
Timothy Clem
14bae584d4 WIP: New check for mergable symbols in merge_state 2016-10-18 13:03:41 -07:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
b76574e01c Handle ambiguities between extra and non-extra tokens using normal GLR splitting 2016-09-06 10:22:16 -07:00
Max Brunsfeld
0ee1994078 Don't have both shift and shift-extra actions in recovery states 2016-07-17 13:35:58 -07:00
Max Brunsfeld
0e2bbbd7ee Compress parse table by allowing reductions w/ unexpected lookaheads 2016-07-04 12:20:23 -07:00
Max Brunsfeld
8c26d99353 Store error recovery actions in the normal parse table 2016-06-27 14:07:47 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
45f7cee0c8 Handle extra tokens properly during error recovery 2016-06-18 20:46:25 -07:00
Max Brunsfeld
6d40e317df Ensure that reductions are ordered by child count in parse table 2016-06-10 13:11:52 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
59712ec492 Clean up lex table generation 2016-05-19 13:25:46 -07:00
Max Brunsfeld
507d5ad9f7 Include shift-extra actions alongside other actions in recovery states 2016-05-16 10:33:18 -07:00
Max Brunsfeld
19bd09b81d Don't include accept actions in recovery states 2016-05-11 14:02:26 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00