Commit graph

130 commits

Author SHA1 Message Date
Max Brunsfeld
34a65f588d Tweak naming and organization of external-scanner related language fields 2016-12-21 11:24:41 -08:00
Max Brunsfeld
42c41c158c Refactor logic for handling shared internal/external tokens 2016-12-21 10:49:55 -08:00
Max Brunsfeld
2b3da512a4 Add serialize, deserialize and reset callbacks to external scanners
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-12-20 13:12:01 -08:00
Max Brunsfeld
10b51a05a1 Allow external scanners to refer to (and return) internally-defined tokens
Tokens that are defined in the grammar's rules may now be included in the
externals list also, so that external scanners can check if they are valid
lookaheads or not, and if so, can return them to the parser if needed.
2016-12-09 13:32:58 -08:00
Max Brunsfeld
83514293b5 Allow external tokens to be either visible or hidden 2016-12-05 17:26:11 -08:00
Max Brunsfeld
1251ff2e30 Consider externals to be named, not anonymous 2016-12-05 17:09:22 -08:00
Max Brunsfeld
0f8e130687 Call external scanner functions when lexing 2016-12-02 22:03:48 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
32387400c6 Rework LR conflict resolution
* Unify precedence/associativity-based resolution with the
  search for a whitelisted conflict
* Improve conflict error messages
2016-11-18 13:50:55 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
b76574e01c Handle ambiguities between extra and non-extra tokens using normal GLR splitting 2016-09-06 10:22:16 -07:00
Max Brunsfeld
1c52c30111 Fix unexpected EOF errors getting lost 2016-09-03 22:46:14 -07:00
Max Brunsfeld
4182de2975 Include each symbol's numeric value in generated code
Sometimes these are useful for debugging
2016-08-26 17:40:22 -07:00
Max Brunsfeld
1c66d90203 Mark repeat symbols as anonymous 2016-07-17 10:44:08 -07:00
Max Brunsfeld
fa8993460e Don't reuse unexpected tokens for now 2016-07-17 07:25:13 -07:00
Max Brunsfeld
8c26d99353 Store error recovery actions in the normal parse table 2016-06-27 14:07:47 -07:00
Max Brunsfeld
43ae8235fd Remove the error action; a lack of actions implies an error. 2016-06-21 22:53:48 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
45f7cee0c8 Handle extra tokens properly during error recovery 2016-06-18 20:46:25 -07:00
Max Brunsfeld
94721c7ec0 Rewind and re-tokenize in error mode after detecting an error 2016-06-17 21:26:03 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
9ad1e36238 Rename out_of_context_states -> recovery_states 2016-04-27 14:14:56 -07:00
Max Brunsfeld
31f6b2e24a Refactor construction of out-of-context states 2016-04-25 21:59:40 -07:00
Max Brunsfeld
cad663b144 Consider multiple error repairs on the same path of the stack
This changes the API to the stack_iterate function so that you can pop
from the stack without stopping iteration
2016-04-15 21:28:00 -07:00
Max Brunsfeld
b68f7212c8 Do not consider any symbols to be 'in-progress' in out-of-context states 2016-03-02 20:58:39 -08:00
Max Brunsfeld
76d072545d Include out-of-context states starting with non-terminals 2016-03-02 20:58:39 -08:00
Max Brunsfeld
e0c24e3be6 Remove old error recovery code 2016-03-02 20:58:39 -08:00
Max Brunsfeld
ffcd8b5c49 Generate C code for the in-progress symbols in each parse state 2016-03-02 20:56:05 -08:00
Max Brunsfeld
00d953f507 Generate C code for out-of-context states 2016-03-02 20:56:05 -08:00
Max Brunsfeld
6401a065ae Use different types for advance and accept-token actions
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
d4632ab9a9 Make the compile function plain C and take a JSON grammar 2016-01-11 12:33:48 -08:00
Max Brunsfeld
4b04afac5e Control lexer's error-mode via explicit boolean argument
Previously, the lexer would operate in error-mode (ignoring any garbage input
until it found a valid token) if it was invoked in the 'error' state. Now that
the error state is deduped with other lexical states, the lexer might be invoked
in that state even when error-mode is not intended. This adds a third argument
to `ts_lex` that explicitly sets the error-mode.

This bug was unlikely to occur in any real grammars, but it caused the
node-tree-sitter-compiler test suite to fail for some grammars with only one
rule.
2015-12-30 09:43:12 -08:00
Max Brunsfeld
4ad1a666be clang-format 2015-12-29 21:17:31 -08:00
Max Brunsfeld
939476c947 When removing duplicate lex states, update the error state too
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
97a281502e Store parse table more compactly 2015-12-29 11:27:41 -08:00
Max Brunsfeld
1c6ad5f7e4 Rename ubiquitous_tokens -> extra_tokens in compiler API
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
c495076adb Record in parse table which actions can hide splits
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
77a94a2929 Use ::count to check if sets and maps contain elements 2015-12-17 10:05:42 -08:00
Max Brunsfeld
66144dc28e Treat tokens that are sometimes extra as fragile 2015-12-16 20:04:45 -08:00
Max Brunsfeld
d713054d61 Record which tokens are fragile when lexing 2015-12-10 21:05:54 -08:00
Max Brunsfeld
75f31a79a3 Treat reduce actions with different production IDs as distinct 2015-12-10 13:00:26 -08:00
Max Brunsfeld
ad619d95f6 Add 'extra' field to symbol metadata
This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
2015-12-02 15:10:24 -08:00
Max Brunsfeld
f08554e958 Replace NodeType enum with SymbolMetadata bitfield
This will allow storing other metadata about symbols, like if they
only appear as ubiquitous tokens
2015-12-02 15:10:24 -08:00
Max Brunsfeld
e11515fb74 Escape backslashes and quotes in symbol name strings 2015-11-09 09:33:24 -08:00
Max Brunsfeld
dba0726eef clang format 2015-10-28 12:10:58 -07:00
Max Brunsfeld
e543661498 Fix typo: LB -> LF 2015-10-26 16:55:59 -07:00
Max Brunsfeld
bad1bff3bb Sanitize line breaks in symbol names 2015-10-26 13:36:31 -07:00