Commit graph

41 commits

Author SHA1 Message Date
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
abf8a4f2c2 🎨 2017-03-01 22:15:26 -08:00
Max Brunsfeld
3c8e6f9987 Restructure parse state merging logic
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
  also compute pairs of tokens which could potentially be incompatible
2017-02-26 12:23:48 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
0e2bbbd7ee Compress parse table by allowing reductions w/ unexpected lookaheads 2016-07-04 12:20:23 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
6401a065ae Use different types for advance and accept-token actions
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
939476c947 When removing duplicate lex states, update the error state too
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
386b124866 Ensure that there are no duplicate lex states 2015-12-20 15:46:13 -08:00
Max Brunsfeld
d713054d61 Record which tokens are fragile when lexing 2015-12-10 21:05:54 -08:00
Max Brunsfeld
d6ee28abd0 Make precedence more useful within tokens
Choose accept-token actions over advance actions if their rule has a higher precedence.
2015-11-01 12:48:27 -08:00
Max Brunsfeld
5455fb977f Use PrecedenceRange in build_lex_table 2015-10-05 18:02:59 -07:00
Max Brunsfeld
e6f3239bef Move stream operator definitions to spec helpers
This is one less thing for users to worry about when compiling and linking
the library itself
2015-09-10 10:12:11 -07:00
Max Brunsfeld
074a7884aa Fix uninitialized variable error 2015-02-16 22:09:30 -08:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
98cc2f2264 Auto-format all source code with clang-format 2014-07-21 13:20:00 -07:00
Max Brunsfeld
e93e254518 In lexer, prefer tokens to skipped separator characters
This was causing newlines in go and javascript to be parsed as
meaningless separator characters instead of statement terminators
2014-05-30 13:29:54 -07:00
Max Brunsfeld
220e081c49 Remove unnecessary methods on LexTable 2014-05-29 18:28:23 -07:00
Max Brunsfeld
93620b3ed1 Add keyword helper for making higher-priority string tokens 2014-05-01 13:25:20 -07:00
Max Brunsfeld
25eda9d889 ISymbol -> Symbol
Interned symbols are now the main type of symbol in use
2014-04-28 20:43:27 -07:00
Max Brunsfeld
68d44fd565 Intern symbols during grammar preparation 2014-04-22 23:38:26 -07:00
Max Brunsfeld
1cc7e32e2d Fix handling of tokens consisting of separator characters
The parser is no longer hard-coded to skip whitespace. Tokens
such as newlines, whose characters overlap with the separator
characters, can now be correctly recognized.
2014-04-03 19:10:09 -07:00
Max Brunsfeld
3f770ff3c3 Remove unused consumed_symbols vector from parse items 2014-03-26 21:04:11 -07:00
Max Brunsfeld
aac0786449 Resolve token conflicts by tokens' order in grammar 2014-03-24 19:18:06 -07:00
Max Brunsfeld
39cb420df2 Remove uses of 'short' and 'long' 2014-03-09 23:00:14 -07:00
Max Brunsfeld
31a58bc7e4 Make include guards pass cpplint 2014-03-09 22:05:24 -07:00
Max Brunsfeld
eb30429700 Make paths explicit in #includes 2014-03-09 21:43:14 -07:00
Max Brunsfeld
39aa0ccc91 Add script to trim whitespace 2014-03-09 19:49:35 -07:00
Max Brunsfeld
e58a6d8ba7 Start work on error recovery
- In runtime, make parse errors part of the parse tree
- Add error state to lexers in which they can accept any token
2014-02-24 18:42:54 -08:00
Max Brunsfeld
2c56612650 Get makefile working 2014-02-18 09:07:00 -08:00
Max Brunsfeld
66f7dcf28a Remove some unused imports 2014-02-10 21:09:43 -08:00
Max Brunsfeld
15c9e2d398 Make ordering of cases deterministic in generated parsers 2014-02-10 18:38:01 -08:00
Max Brunsfeld
d3d25f2683 Represent character sets as sets of character ranges 2014-02-05 18:56:04 -08:00
Max Brunsfeld
8cce11a52a Rename Character -> CharacterSet, CharacterMatch -> CharacterRange 2014-02-03 13:05:51 -08:00
Max Brunsfeld
7f62e752be Allow Character rules to handle arbitrary character sets 2014-01-30 08:34:20 -08:00
Max Brunsfeld
ca33c3942a In parse table, store symbols as Symbol objects, not strings 2014-01-27 13:40:10 -08:00
Max Brunsfeld
3ca2e126be Remove unnecessary public START and END constants 2014-01-25 21:34:46 -08:00
Max Brunsfeld
92cec5758f Reorganize compiler directory 2014-01-11 15:14:17 -08:00