Commit graph

76 commits

Author SHA1 Message Date
Max Brunsfeld
e59558c83b Allow stack versions to be temporarily paused
This way, when detecting an error, we can defer the decision about
whether to bail or recover until all stack versions are processed.
2018-04-06 13:26:18 -07:00
Max Brunsfeld
dbe77e7199 Simplify testing-only ts_stack_iterate function 2018-03-29 17:50:07 -07:00
Max Brunsfeld
5520983144 Clean up Stack API
* Remove StackPopResult
* Rename top_state() -> state()
* Rename top_position() -> position()
* Improve docs
2018-03-29 17:37:54 -07:00
Max Brunsfeld
addeb6c4c1 Allocate and free trees using an object pool 2017-12-27 10:34:29 -08:00
Max Brunsfeld
121a6a66ec Take total dynamic precedence into account in stack version sorting
Signed-off-by: Josh Vera <vera@github.com>
2017-10-09 15:51:22 -07:00
Max Brunsfeld
d291af9a31 Refactor error comparisons
* Deal with mergeability outside of error comparison function
* Make `better_version_exists` function pure (don't halt other versions
as a side effect).
* Tweak error comparison logic

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-13 16:38:15 -07:00
Max Brunsfeld
07fb3ab0e6 Abort recoveries before popping if better versions already exist 2017-09-13 09:56:51 -07:00
Max Brunsfeld
819235bac3 Limit the number of stack nodes that are included in a summary 2017-09-12 12:00:00 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
a89322c5f1 Remove unneeded parameters from public interface of stack_iterate callback 2017-06-29 16:43:56 -07:00
Max Brunsfeld
009d6d1534 Improve heuristics for pruning parse versions based on errors
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
2017-06-29 15:00:20 -07:00
Max Brunsfeld
445be0736a Clean up ts_stack_push function 2017-06-29 15:00:20 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
d57043b665 Add ability to store external token state per stack version 2017-01-04 21:22:23 -08:00
Max Brunsfeld
e7217f1bac Clean up some methods in parser.c 2016-11-14 17:25:55 -08:00
Max Brunsfeld
535879a2bd Represent byte, char and tree counts as 32 bit numbers
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
c9dcb29c6f Remove the TS prefix from some internal type/function names 2016-11-09 20:59:05 -08:00
Max Brunsfeld
4106ecda43 Remove logic for recovering from OOM 2016-11-04 09:18:38 -07:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
e0b0e29a2b Update parse count correctly when repairing errors & undoing reductions 2016-09-01 10:04:20 -07:00
Max Brunsfeld
7483da4184 Add push_count to stack, use it in error comparisons 2016-08-31 17:29:14 -07:00
Max Brunsfeld
0faae52132 Fix some inconsistencies in error cost calculation
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-08-31 10:51:59 -07:00
Max Brunsfeld
52ccebbf80 Rename error_depth -> error_count 2016-08-30 09:44:40 -07:00
Max Brunsfeld
00a0939504 Abort erroneous parse versions more eagerly 2016-06-02 14:04:48 -07:00
Max Brunsfeld
ea47fdc0fe Rework logic for when to abandon parses with errors 2016-05-29 22:36:47 -07:00
Max Brunsfeld
6535704870 Replace stack_merge_new function with two simpler functions
- merge(version1, version2)
- split(version)
2016-05-28 21:22:10 -07:00
Max Brunsfeld
e686478ad2 Rename stack_merge function to stack_merge_all 2016-05-28 20:24:08 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
88053cf723 In tests, don’t record allocations while printing debug graphs 2016-05-16 10:44:19 -07:00
Max Brunsfeld
d50f6a58cc Abort parse versions w/ worse errors when repairing an error 2016-05-16 10:33:19 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
e99a3925e0 Merge all versions created in a given reduce operation 2016-04-24 00:55:19 -07:00
Max Brunsfeld
fd4c33209e Select ambiguous alternatives by minimizing error size 2016-04-24 00:54:20 -07:00
Max Brunsfeld
cad663b144 Consider multiple error repairs on the same path of the stack
This changes the API to the stack_iterate function so that you can pop
from the stack without stopping iteration
2016-04-15 21:28:00 -07:00
Max Brunsfeld
695be5bc79 Merge equivalent stacks in a separate stage of parsing
* No more automatic merging every time a state is pushed to the stack
* When popping from the stack, the current version is always preserved
2016-04-10 14:12:24 -07:00
Max Brunsfeld
5ba40f15ad Rename stack heads to versions 2016-04-04 12:25:57 -07:00
Max Brunsfeld
b1a696085a Clean up stack pop functions 2016-04-04 11:59:10 -07:00
Max Brunsfeld
2f3e92c9be Add function for popping all nodes from the stack 2016-04-04 11:44:45 -07:00
Max Brunsfeld
91e3609fbf Write to file directly from stack debugging function 2016-04-02 22:18:44 -07:00
Max Brunsfeld
6bce6da1e6 Store verifying flag within parse stack 2016-03-31 12:03:21 -07:00
Max Brunsfeld
e7d3d40a59 Explicitly inform stack pop callback when the stack is exhausted
Also, pass non-extra tree count as a single value, rather than keeping
track of the extra count and the total separately.
2016-03-10 11:51:55 -08:00
Max Brunsfeld
4348eb89d4 Expose lower stack nodes via pop_until() function
This callback-based API allows the parser to easily visit each interior node
of the stack when searching for an error repair. It also is a better abstraction
over the stack's DAG implementation than having the public functions for
accessing entries and their successor entries.
2016-03-07 16:09:34 -08:00
Max Brunsfeld
c0595c21c5 Halt stack pops at all error states, not just error trees 2016-03-03 11:05:37 -08:00
Max Brunsfeld
3d516aeeec Give StackPushResult enumerators shorter names 2016-03-03 10:20:05 -08:00
Max Brunsfeld
8a13b5d120 Rename StackPopResult -> StackSlice 2016-03-03 10:16:10 -08:00
Max Brunsfeld
5a34d74702 Clean up stack 2016-02-25 21:51:39 -08:00
Max Brunsfeld
da2ef7ad35 Store trees in the links between stack nodes, not in the nodes themselves 2016-02-23 17:35:50 -08:00
Max Brunsfeld
6dd92c3abe Add function for rendering the stack as a DOT graph 2016-02-23 00:08:55 -08:00
Max Brunsfeld
f444a715fd Clean up tree array assertions in stack spec 2016-02-22 09:23:25 -08:00
Max Brunsfeld
b113dc8b0f Return a TreeArray from ts_stack_pop
Since the capacity is now included in the return value, the buffer
can be reused in the ts_parser__accept function. Also, it's just
cleaner to use Array consistently, rather than a separate buffer
and size.
2016-02-21 22:31:13 -08:00