Max Brunsfeld
3bcb221379
Include non-terminal lookahead symbols for reduction actions
...
This is necessary for re-using the right subtree after an edit
2014-10-10 12:06:16 -07:00
Max Brunsfeld
070dc76050
Generate correct C literals for non-ascii characters
2014-09-28 18:40:15 -07:00
Max Brunsfeld
cb5ecbd491
Handle string and regex rules w/ non-ascii chars
2014-09-28 18:21:22 -07:00
Max Brunsfeld
a2b80098b2
Fix destination directory in compile_examples spec
2014-09-11 12:46:46 -07:00
Max Brunsfeld
cd8a683229
Improve error messages for invalid ubiquitous tokens
2014-09-10 13:02:16 -07:00
Max Brunsfeld
e9dad529f5
Make descriptions more consistent in compiler specs
2014-09-09 13:01:18 -07:00
Max Brunsfeld
1ff7cedf40
Unify ubiquitous tokens and lexical separators in API
2014-09-07 22:16:45 -07:00
Max Brunsfeld
a46f9d950c
Handle '\s' correctly in regexps
2014-09-07 16:05:43 -07:00
Max Brunsfeld
ed11ef557a
Fix expansion of repeat rules into recursive rules
...
Previously, the way repeat rules were expanded, the auxiliary
rule always needed to be reduced, even if the repeating content
was empty. This caused problems in parse states where some items
contained the repeat rule and some did not. To make those cases
work, the repeat rule had to explicitly be marked as optional.
With this change, that is no longer necessary.
2014-09-07 09:39:14 -07:00
Max Brunsfeld
d3204d3526
Include '_' in '\w' regex character class
2014-09-05 18:41:12 -07:00
Max Brunsfeld
545e575508
Revert "Remove the separator characters construct"
...
This reverts commit 5cd07648fd .
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd
Remove the separator characters construct
...
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
226ffd6b5b
Fix initializer list deduction warnings in specs
2014-08-27 22:23:45 -07:00
Max Brunsfeld
9338249075
Remove implicit CharacterRange constructors
...
Also fix misc smaller lint errors
2014-08-23 14:52:44 -07:00
Max Brunsfeld
0bb5663f0f
Refactor - represent char sets in terms of inclusions and exclusions
2014-08-23 14:25:45 -07:00
Max Brunsfeld
6f374fddff
Tweak format for pretty printing some classes
2014-08-23 13:52:00 -07:00
Max Brunsfeld
2963b08f79
Eliminate duplicates when constructing choice rules
2014-08-21 20:04:42 -07:00
Max Brunsfeld
b155994491
Fix indentation in specs
2014-08-07 08:11:21 -07:00
Max Brunsfeld
01571da30d
Handle more escaped characters in regexps
2014-08-03 21:57:21 -07:00
Max Brunsfeld
6a8addb84f
Fix segfault when grammar consists of a single token
2014-07-23 18:26:10 -07:00
Max Brunsfeld
02904085c2
Make separate helper scripts for testing compiler and runtime
2014-07-17 22:20:14 -07:00
Max Brunsfeld
6951acb13b
Fix error when grammar contains to error productions
2014-07-13 21:26:21 -07:00
Max Brunsfeld
7dfaba2954
Fix conflict manager spec
2014-07-13 18:10:32 -07:00
Max Brunsfeld
b217cd38fb
Handle built-in symbols correctly in conflict manager
2014-07-13 17:59:57 -07:00
Max Brunsfeld
77df7fe511
In lexer, always prefer the longest match
...
Only use rules' precedence to decide between two tokens
that match the same string
2014-07-03 08:57:35 -07:00
Max Brunsfeld
83a1b9439e
Fix handling of ubiquitous tokens used in grammar rules
2014-07-01 20:47:35 -07:00
Max Brunsfeld
3be648593e
merge_{sym,char}_transitions -> merge_{sym,char}_transition
2014-06-28 17:02:48 -07:00
Max Brunsfeld
9bad5dff3e
Avoid unnecessary std::map construction when merging transition sets
2014-06-26 13:42:42 -07:00
Max Brunsfeld
9686c57e90
Allow ubiquitous tokens to also be used in grammar rules
2014-06-26 08:52:42 -07:00
Max Brunsfeld
a9dff20658
Make grammars' separator characters configurable
2014-06-26 07:31:08 -07:00
Max Brunsfeld
7df35f9b8d
Make separate types for syntax and lexical grammars
...
This way, the separator characters can be added as a field to
lexical grammars only
2014-06-25 13:27:16 -07:00
Max Brunsfeld
d5674d33c4
Add test helper stream method for pairs
2014-06-24 21:54:38 -07:00
Max Brunsfeld
81880e000e
Tweak header include paths in tests
2014-06-23 18:50:03 -07:00
Max Brunsfeld
2c382b7363
Trim trailing whitespace
2014-06-16 21:33:35 -07:00
Max Brunsfeld
1daaf4485f
Refactor item set transition functions
2014-06-16 13:37:34 -07:00
Max Brunsfeld
39c1ab2d50
Refactor item_set_closure
...
Inline unnecessary function
2014-06-16 13:20:39 -07:00
Max Brunsfeld
7a2c2c1c90
Store ParseItemSets as maps, w/ core items as keys
...
ParseItem no longer has a lookahead_sym field; it now represents
the 'core' of a parse item. The lookahead context is stored separately,
as a set per core item. This makes iterating, copying and merging item
sets more efficient, because before, the core items were repeated for each
different lookahead symbol.
Also, the memoization in sym_transitions(ParseItemSet) has been removed.
Maybe I'll add it back later.
2014-06-16 08:35:20 -07:00
Max Brunsfeld
174f306e2a
Fix precedence of comments vs '/' operator
2014-06-11 12:27:58 -07:00
Max Brunsfeld
a42f498c59
Optimize merge_sym_transitions and merge_char_transitions
2014-06-10 14:01:32 -07:00
Max Brunsfeld
e105f5cebc
Remove inheritance link btwn PreparedGrammar and Grammar
2014-06-10 10:34:37 -07:00
Max Brunsfeld
54a555168d
Add accessor methods on Grammar
2014-06-09 21:05:25 -07:00
Max Brunsfeld
868a09b0b0
Remove infinite loop on certain lex errors
2014-06-01 23:23:24 -07:00
Max Brunsfeld
c7266f791e
Don't use std::tuples in parse regex spec
...
gcc doesn't let me use initializer list syntax for them
2014-06-01 17:34:18 -07:00
Max Brunsfeld
e93e254518
In lexer, prefer tokens to skipped separator characters
...
This was causing newlines in go and javascript to be parsed as
meaningless separator characters instead of statement terminators
2014-05-30 13:29:54 -07:00
Max Brunsfeld
c30055ba18
Fix symbol names for extracted tokens
2014-05-20 08:30:58 -07:00
Max Brunsfeld
649f200831
Expand regex/string rules as part of grammar preparation
...
This makes it possible to report errors in regex parsing
2014-05-19 20:54:59 -07:00
Max Brunsfeld
5245bc01fe
Backfill tests for token extraction in auxiliary rules
2014-05-19 19:05:54 -07:00
Max Brunsfeld
4700e33746
Introduce 'ubiquitous_tokens' concept, for parsing comments and such
2014-05-06 12:54:04 -07:00
Max Brunsfeld
b010e1667e
Fix parse action equality method
2014-05-06 12:51:38 -07:00
Max Brunsfeld
d91bc718a0
Add basic test for parse table builder
2014-05-04 23:28:53 -07:00