Commit graph

460 commits

Author SHA1 Message Date
Max Brunsfeld
b76574e01c Handle ambiguities between extra and non-extra tokens using normal GLR splitting 2016-09-06 10:22:16 -07:00
Max Brunsfeld
1c52c30111 Fix unexpected EOF errors getting lost 2016-09-03 22:46:14 -07:00
Max Brunsfeld
88e8cab7f9 Remove all mention of the ERROR rule type 2016-09-01 16:34:44 -07:00
Max Brunsfeld
4182de2975 Include each symbol's numeric value in generated code
Sometimes these are useful for debugging
2016-08-26 17:40:22 -07:00
Max Brunsfeld
0ee1994078 Don't have both shift and shift-extra actions in recovery states 2016-07-17 13:35:58 -07:00
Max Brunsfeld
1c66d90203 Mark repeat symbols as anonymous 2016-07-17 10:44:08 -07:00
Max Brunsfeld
fa8993460e Don't reuse unexpected tokens for now 2016-07-17 07:25:13 -07:00
Max Brunsfeld
0e2bbbd7ee Compress parse table by allowing reductions w/ unexpected lookaheads 2016-07-04 12:20:23 -07:00
Max Brunsfeld
8c26d99353 Store error recovery actions in the normal parse table 2016-06-27 14:07:47 -07:00
Max Brunsfeld
877fe1f682 Fix incorrect exta entry in symbol metadata table 2016-06-26 22:14:31 -07:00
Max Brunsfeld
43ae8235fd Remove the error action; a lack of actions implies an error. 2016-06-21 22:53:48 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
45f7cee0c8 Handle extra tokens properly during error recovery 2016-06-18 20:46:25 -07:00
Max Brunsfeld
94721c7ec0 Rewind and re-tokenize in error mode after detecting an error 2016-06-17 21:26:03 -07:00
Max Brunsfeld
6d40e317df Ensure that reductions are ordered by child count in parse table 2016-06-10 13:11:52 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
59712ec492 Clean up lex table generation 2016-05-19 13:25:46 -07:00
Max Brunsfeld
507d5ad9f7 Include shift-extra actions alongside other actions in recovery states 2016-05-16 10:33:18 -07:00
Max Brunsfeld
19bd09b81d Don't include accept actions in recovery states 2016-05-11 14:02:26 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
9ad1e36238 Rename out_of_context_states -> recovery_states 2016-04-27 14:14:56 -07:00
Max Brunsfeld
5b74813a5c Refine logic for which tokens to use in error recovery 2016-04-27 14:09:19 -07:00
Max Brunsfeld
31f6b2e24a Refactor construction of out-of-context states 2016-04-25 21:59:40 -07:00
Max Brunsfeld
cf19b2e58d Make repeat rules left-recursive instead of right recursive 2016-04-18 12:40:14 -07:00
Max Brunsfeld
cad663b144 Consider multiple error repairs on the same path of the stack
This changes the API to the stack_iterate function so that you can pop
from the stack without stopping iteration
2016-04-15 21:28:00 -07:00
Max Brunsfeld
9657dfcfc3 Compute in-progress symbols for out-of-context states 2016-03-10 11:39:44 -08:00
Max Brunsfeld
b733b0cc81 Remove duplicate parse actions
This has only come up for out-of-context states, but it seems possible now that there
could be duplicate actions for any state, because of the possibility of multiple
actions with different precedence or associativity that are otherwise the same
2016-03-02 20:58:40 -08:00
Max Brunsfeld
b68f7212c8 Do not consider any symbols to be 'in-progress' in out-of-context states 2016-03-02 20:58:39 -08:00
Max Brunsfeld
76d072545d Include out-of-context states starting with non-terminals 2016-03-02 20:58:39 -08:00
Max Brunsfeld
e0c24e3be6 Remove old error recovery code 2016-03-02 20:58:39 -08:00
Max Brunsfeld
ffcd8b5c49 Generate C code for the in-progress symbols in each parse state 2016-03-02 20:56:05 -08:00
Max Brunsfeld
00d953f507 Generate C code for out-of-context states 2016-03-02 20:56:05 -08:00
Max Brunsfeld
8c01b70ce7 Don't skip tokens that are not the start of any non-terminal 2016-03-02 20:56:05 -08:00
Max Brunsfeld
b4f2407a49 Add forward move states for each terminal symbol 2016-03-02 20:56:04 -08:00
Max Brunsfeld
dee1f697c1 Compute the set of variables that can begin with each terminal symbol 2016-02-25 21:51:52 -08:00
Max Brunsfeld
3f08bfb264 Fix build warnings 2016-02-12 14:11:11 -08:00
Max Brunsfeld
b80a330a74 Fix assorted memory leaks in test code 2016-02-05 12:23:54 -08:00
Max Brunsfeld
6401a065ae Use different types for advance and accept-token actions
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
f0b1d851ce Fix uninitialized instance variable in ParseAction 2016-01-21 23:52:05 -07:00
Max Brunsfeld
569b9d4099 Allow comments within grammar JSON 2016-01-14 11:28:13 -08:00
Max Brunsfeld
49f393b75e Merge pull request #22 from maxbrunsfeld/c-compiler-api
Simplify the compiler API
2016-01-13 21:08:41 -08:00
Max Brunsfeld
d4632ab9a9 Make the compile function plain C and take a JSON grammar 2016-01-11 12:33:48 -08:00
Max Brunsfeld
b69e19c525 Add plain C API for compiling a JSON grammar 2016-01-10 13:44:22 -08:00
Max Brunsfeld
36870bfced Make Grammar a simple struct 2016-01-08 15:51:30 -08:00
Max Brunsfeld
e59f6294cb Fix bug in lexical state de-duping 2015-12-30 11:15:36 -08:00
Max Brunsfeld
4b04afac5e Control lexer's error-mode via explicit boolean argument
Previously, the lexer would operate in error-mode (ignoring any garbage input
until it found a valid token) if it was invoked in the 'error' state. Now that
the error state is deduped with other lexical states, the lexer might be invoked
in that state even when error-mode is not intended. This adds a third argument
to `ts_lex` that explicitly sets the error-mode.

This bug was unlikely to occur in any real grammars, but it caused the
node-tree-sitter-compiler test suite to fail for some grammars with only one
rule.
2015-12-30 09:43:12 -08:00
Max Brunsfeld
4ad1a666be clang-format 2015-12-29 21:17:31 -08:00
Max Brunsfeld
939476c947 When removing duplicate lex states, update the error state too
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
97a281502e Store parse table more compactly 2015-12-29 11:27:41 -08:00