Commit graph

385 commits

Author SHA1 Message Date
Max Brunsfeld
0ec7e5ce42 Remove ts_stack_force_merge function 2018-04-06 13:26:18 -07:00
Max Brunsfeld
80f856cef5 Maintain a total node count on every tree
This simplifies (and fixes bugs in) the parse stack's tracking of its
total node count since the last error, which is needed for error
recovery.
2018-04-06 13:26:18 -07:00
Max Brunsfeld
e59558c83b Allow stack versions to be temporarily paused
This way, when detecting an error, we can defer the decision about
whether to bail or recover until all stack versions are processed.
2018-04-06 13:26:18 -07:00
Max Brunsfeld
5520983144 Clean up Stack API
* Remove StackPopResult
* Rename top_state() -> state()
* Rename top_position() -> position()
* Improve docs
2018-03-29 17:37:54 -07:00
Max Brunsfeld
ee995c3d6b Avoid redundant retains/releases by giving ts_stack_push move semantics 2018-03-29 17:18:43 -07:00
Max Brunsfeld
e917756ad1 Remove depends_on_lookahead field from parse table entries
This simplifies the logic for determining whether a token is reusable
and makes it more conservative. It should fix some incremental parsing
bugs that are being caught by the randomized tests on CI.
2018-03-28 10:58:33 -07:00
Max Brunsfeld
e927d02f43 Allow reusing leaf nodes unless the next leaf has changes 2018-03-07 17:44:54 -08:00
Max Brunsfeld
c0cc35ff07 Create separate lexer function for keywords 2018-03-07 12:00:26 -08:00
Max Brunsfeld
f96969738b Don't remove mergeable stack versions so aggressively during condense 2018-03-05 10:40:05 -08:00
Max Brunsfeld
dbc0c208f4 Add missing initialization of parser's in_ambiguity state 2018-03-02 15:25:39 -08:00
Max Brunsfeld
52087de4f0 Remove the concept of fragile reductions
They were a vestige of when Tree-sitter did sentential form-based
incremental parsing (as opposed to simply state matching). This was
elegant but not compatible with GLR as far as I could tell.
2018-03-02 14:51:54 -08:00
Max Brunsfeld
299a146b66 Balance repetition trees after parsing 2018-02-12 11:41:56 -08:00
Max Brunsfeld
8c29841adf Represent repetitions with associative structure 2018-02-12 11:41:56 -08:00
Max Brunsfeld
46dcd53090 Do not insert missing tokens if halt_on_error option is passed 2018-01-24 14:04:55 -08:00
Max Brunsfeld
dafa897021 Bail on error recovery if too many alternative parses have already completed 2018-01-09 17:08:36 -08:00
Max Brunsfeld
21a88b1731 Don't count less-far-along versions in better_version_exists method 2017-12-29 16:10:43 -08:00
Max Brunsfeld
d3c85f288d Start work on repairing errors by inserting missing tokens 2017-12-29 15:11:00 -08:00
Max Brunsfeld
f2dc620610 Extract parser__recover_to_state method 2017-12-29 15:10:59 -08:00
Max Brunsfeld
adf47e2b57 Fix invalid usage of 'extra' field on non-shift parse action 2017-12-29 11:46:41 -08:00
Max Brunsfeld
d9094e8146 Consolidate more logic into do_potential_reductions method 2017-12-28 15:49:48 -08:00
Max Brunsfeld
172cbb2d22 Fix infinite loop due to skipping empty tokens during error recovery 2017-12-27 11:18:06 -08:00
Max Brunsfeld
addeb6c4c1 Allocate and free trees using an object pool 2017-12-27 10:34:29 -08:00
Max Brunsfeld
0e69da37a5 Return a character count from the lexer's get_column method 2017-12-20 16:26:38 -08:00
Max Brunsfeld
fbcefe25f7 Avoid creating external tokens that start after they end 2017-12-07 11:50:27 -08:00
Max Brunsfeld
5d676de051 Remove unnecessary conditional in parser__accept 2017-12-07 11:50:27 -08:00
Max Brunsfeld
48681c3f0e Initialize error start and end positions at their declarations
Fixes #113

Clang doesn't seem to be able to tell that these variables were guaranteed to
be initialized by the time they were read.
2017-10-31 10:06:44 -07:00
Max Brunsfeld
121a6a66ec Take total dynamic precedence into account in stack version sorting
Signed-off-by: Josh Vera <vera@github.com>
2017-10-09 15:51:22 -07:00
Max Brunsfeld
36c2b685b9 Always invalidate old chunk of text when parsing after an edit 2017-10-04 15:09:46 -07:00
Max Brunsfeld
b0fdc33f73 Remove 'extra' and 'structural' booleans from symbol metadata 2017-09-14 12:07:46 -07:00
Max Brunsfeld
91456d7a17 Avoid duplicate error state entries for tokens that are both internal & external 2017-09-14 10:54:13 -07:00
Max Brunsfeld
2721f72c41 Represent MAX_COST_DIFFERENCE as unsigned 2017-09-13 16:49:18 -07:00
Max Brunsfeld
c1cf8e02a7 Merge pull request #101 from tree-sitter/merge-more-lex-states
Reduce the number of states in the generated lexer function
2017-09-13 16:46:58 -07:00
Max Brunsfeld
d291af9a31 Refactor error comparisons
* Deal with mergeability outside of error comparison function
* Make `better_version_exists` function pure (don't halt other versions
as a side effect).
* Tweak error comparison logic

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-13 16:38:15 -07:00
Max Brunsfeld
07fb3ab0e6 Abort recoveries before popping if better versions already exist 2017-09-13 09:56:51 -07:00
Max Brunsfeld
47669e6015 Avoid halting the only non-halted entry in recover 2017-09-12 16:20:06 -07:00
Max Brunsfeld
819235bac3 Limit the number of stack nodes that are included in a summary 2017-09-12 12:00:00 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
4c9c05806a Merge compatible starting token states before constructing lex table 2017-09-05 13:21:53 -07:00
Max Brunsfeld
ac9d260734 Clean up parser fields 2017-08-31 12:50:10 -07:00
Max Brunsfeld
4a0587061e Consolidate logic for deciding on a lookahead node 2017-08-31 12:19:37 -07:00
Max Brunsfeld
41074cbf2d 🎨 2017-08-30 16:48:15 -07:00
Max Brunsfeld
fdc6ee445b Remove parser__push helper function 2017-08-30 16:41:07 -07:00
Max Brunsfeld
1b1276bdbf Simplify parser__condense_stack function 2017-08-30 16:36:02 -07:00
Max Brunsfeld
96a630e5df Clean up check for leaf node reusability 2017-08-30 16:19:51 -07:00
Max Brunsfeld
8bdab7335e Remove unnecessary reusability check after breaking down lookahead 2017-08-30 16:19:11 -07:00
Max Brunsfeld
bef536a7d0 Discard fragile reusable nodes earlier 2017-08-30 16:17:10 -07:00
Max Brunsfeld
5cbd50c7d7 Remember how far ahead the lexer looked on failed calls
This needs to be included in the 'bytes_scanned' property of the token
that is ultimately produced.
2017-08-29 15:04:22 -07:00
Max Brunsfeld
f3977ec213 Always call deserialize on external scanner before scanning
Remembering the last token that the external scanner produced is
not worth the complexity.
2017-08-29 14:41:55 -07:00
Max Brunsfeld
4d63e26e9e Clean up logic for falling back to error mode after lexing fails 2017-08-25 16:57:09 -07:00
Max Brunsfeld
86d5737fc2 Escape quotes when printing symbols to dot graphs 2017-08-25 16:26:40 -07:00