Max Brunsfeld
db4b9ebc7c
Implement Rule as a union rather than an abstract base class
2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd
Allow lexer to accept tokens that ended at previous positions
...
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
abf8a4f2c2
🎨
2017-03-01 22:15:26 -08:00
Max Brunsfeld
3c8e6f9987
Restructure parse state merging logic
...
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
also compute pairs of tokens which could potentially be incompatible
2017-02-26 12:23:48 -08:00
Max Brunsfeld
fad7294ba4
Store shift states for non-terminals directly in the main parse table
2016-11-14 08:36:06 -08:00
Max Brunsfeld
0e2bbbd7ee
Compress parse table by allowing reductions w/ unexpected lookaheads
2016-07-04 12:20:23 -07:00
Max Brunsfeld
38c144b4a3
Refine logic for deciding when tokens need to be re-lexed
...
* While generating the lex table, note which tokens can match the
same string. A token needs to be relexed when it has possible
homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
A token needs to be relexed when there are viable tokens that
could match longer strings in the current state and the next
token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
a3679fbb1f
Distinguish separators from main tokens via a property on transitions
...
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
6401a065ae
Use different types for advance and accept-token actions
...
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
939476c947
When removing duplicate lex states, update the error state too
...
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
386b124866
Ensure that there are no duplicate lex states
2015-12-20 15:46:13 -08:00
Max Brunsfeld
d713054d61
Record which tokens are fragile when lexing
2015-12-10 21:05:54 -08:00
Max Brunsfeld
d6ee28abd0
Make precedence more useful within tokens
...
Choose accept-token actions over advance actions if their rule has a higher precedence.
2015-11-01 12:48:27 -08:00
Max Brunsfeld
5455fb977f
Use PrecedenceRange in build_lex_table
2015-10-05 18:02:59 -07:00
Max Brunsfeld
e6f3239bef
Move stream operator definitions to spec helpers
...
This is one less thing for users to worry about when compiling and linking
the library itself
2015-09-10 10:12:11 -07:00
Max Brunsfeld
074a7884aa
Fix uninitialized variable error
2015-02-16 22:09:30 -08:00
Max Brunsfeld
545e575508
Revert "Remove the separator characters construct"
...
This reverts commit 5cd07648fd .
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd
Remove the separator characters construct
...
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
98cc2f2264
Auto-format all source code with clang-format
2014-07-21 13:20:00 -07:00
Max Brunsfeld
e93e254518
In lexer, prefer tokens to skipped separator characters
...
This was causing newlines in go and javascript to be parsed as
meaningless separator characters instead of statement terminators
2014-05-30 13:29:54 -07:00
Max Brunsfeld
220e081c49
Remove unnecessary methods on LexTable
2014-05-29 18:28:23 -07:00
Max Brunsfeld
93620b3ed1
Add keyword helper for making higher-priority string tokens
2014-05-01 13:25:20 -07:00
Max Brunsfeld
25eda9d889
ISymbol -> Symbol
...
Interned symbols are now the main type of symbol in use
2014-04-28 20:43:27 -07:00
Max Brunsfeld
68d44fd565
Intern symbols during grammar preparation
2014-04-22 23:38:26 -07:00
Max Brunsfeld
1cc7e32e2d
Fix handling of tokens consisting of separator characters
...
The parser is no longer hard-coded to skip whitespace. Tokens
such as newlines, whose characters overlap with the separator
characters, can now be correctly recognized.
2014-04-03 19:10:09 -07:00
Max Brunsfeld
3f770ff3c3
Remove unused consumed_symbols vector from parse items
2014-03-26 21:04:11 -07:00
Max Brunsfeld
aac0786449
Resolve token conflicts by tokens' order in grammar
2014-03-24 19:18:06 -07:00
Max Brunsfeld
39cb420df2
Remove uses of 'short' and 'long'
2014-03-09 23:00:14 -07:00
Max Brunsfeld
31a58bc7e4
Make include guards pass cpplint
2014-03-09 22:05:24 -07:00
Max Brunsfeld
eb30429700
Make paths explicit in #includes
2014-03-09 21:43:14 -07:00
Max Brunsfeld
39aa0ccc91
Add script to trim whitespace
2014-03-09 19:49:35 -07:00
Max Brunsfeld
e58a6d8ba7
Start work on error recovery
...
- In runtime, make parse errors part of the parse tree
- Add error state to lexers in which they can accept any token
2014-02-24 18:42:54 -08:00
Max Brunsfeld
2c56612650
Get makefile working
2014-02-18 09:07:00 -08:00
Max Brunsfeld
66f7dcf28a
Remove some unused imports
2014-02-10 21:09:43 -08:00
Max Brunsfeld
15c9e2d398
Make ordering of cases deterministic in generated parsers
2014-02-10 18:38:01 -08:00
Max Brunsfeld
d3d25f2683
Represent character sets as sets of character ranges
2014-02-05 18:56:04 -08:00
Max Brunsfeld
8cce11a52a
Rename Character -> CharacterSet, CharacterMatch -> CharacterRange
2014-02-03 13:05:51 -08:00
Max Brunsfeld
7f62e752be
Allow Character rules to handle arbitrary character sets
2014-01-30 08:34:20 -08:00
Max Brunsfeld
ca33c3942a
In parse table, store symbols as Symbol objects, not strings
2014-01-27 13:40:10 -08:00
Max Brunsfeld
3ca2e126be
Remove unnecessary public START and END constants
2014-01-25 21:34:46 -08:00
Max Brunsfeld
92cec5758f
Reorganize compiler directory
2014-01-11 15:14:17 -08:00