Commit graph

109 commits

Author SHA1 Message Date
Max Brunsfeld
f00d2ade46 Remove unused function 2018-04-09 19:37:53 -07:00
Max Brunsfeld
3672a8ad87 Remove unused tree method 2018-04-09 12:29:22 -07:00
Max Brunsfeld
09be0b6ef5 Store trees' children in TreeArrays, not w/ separate pointer and length 2018-04-06 13:26:18 -07:00
Max Brunsfeld
80f856cef5 Maintain a total node count on every tree
This simplifies (and fixes bugs in) the parse stack's tracking of its
total node count since the last error, which is needed for error
recovery.
2018-04-06 13:26:18 -07:00
Max Brunsfeld
299a146b66 Balance repetition trees after parsing 2018-02-12 11:41:56 -08:00
Max Brunsfeld
d3c85f288d Start work on repairing errors by inserting missing tokens 2017-12-29 15:11:00 -08:00
Max Brunsfeld
addeb6c4c1 Allocate and free trees using an object pool 2017-12-27 10:34:29 -08:00
Max Brunsfeld
0e69da37a5 Return a character count from the lexer's get_column method 2017-12-20 16:26:38 -08:00
Max Brunsfeld
b98669c7e6 Replace general array_reverse with ts_tree_array_reverse 2017-08-07 12:44:33 -07:00
Max Brunsfeld
9260d8163c Refactor and fix bugs in tree comparison algorithm 2017-08-04 14:03:41 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Max Brunsfeld
4b40a1ed6c Support anonymous tokens inside of RENAME rules 2017-07-14 10:19:58 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
009d6d1534 Improve heuristics for pruning parse versions based on errors
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
2017-06-29 15:00:20 -07:00
Phil Turnbull
5ef9c4d6aa Increase size of ref_counts and add assertions
This bumps the size of the reference counts from 16- to 32-bit counters to make
it less likely to overflow. Also assert in the retain function that the
reference count didn't overflow.

32-bits seems big enough for non-pathological examples but a more fool-proof
fix may be to bump it to 64-bits.
2017-06-29 06:39:01 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
704c2d5907 Fix lookahead_char type in ts_tree_make_error function 2017-04-27 14:49:04 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
df520635c6 Prevent crash due to huge number of possible paths through parse stack 2017-02-20 14:34:10 -08:00
Max Brunsfeld
c14a776a3d Avoid including trailing extra tokens within error nodes unnecessarily 2017-02-19 21:21:54 -08:00
Max Brunsfeld
343887c1dd Fix miscounting of extra tokens when repairing errors 2017-02-06 17:43:07 -08:00
Max Brunsfeld
2fa7b453c8 Restore external scanner's state only after repositioning lexer
Also, properly identify the leaf node with the external token state
2016-12-21 13:59:56 -08:00
Max Brunsfeld
e6c82ead2c Start work toward maintaining external scanner's state during incremental parses 2016-12-20 17:06:20 -08:00
Max Brunsfeld
2b3da512a4 Add serialize, deserialize and reset callbacks to external scanners
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-12-20 13:12:01 -08:00
Max Brunsfeld
535879a2bd Represent byte, char and tree counts as 32 bit numbers
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
c9dcb29c6f Remove the TS prefix from some internal type/function names 2016-11-09 20:59:05 -08:00
Max Brunsfeld
4106ecda43 Remove logic for recovering from OOM 2016-11-04 09:18:38 -07:00
Max Brunsfeld
eed54d95e1 Merge branch 'master' into changed-ranges 2016-10-16 21:10:25 -07:00
Max Brunsfeld
b3140b2689 Implement ts_document_parse_and_get_changed_ranges 2016-10-15 22:31:21 -07:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
fcf9293d35 Use explicit stack for assigning trees' parent pointers 2016-09-19 12:40:45 -07:00
Max Brunsfeld
00528e50ce Change edit API to be byte-based 2016-09-13 13:08:52 -07:00
Max Brunsfeld
820cbece20 Pretty-print unexpected EOF errors properly 2016-09-03 22:45:18 -07:00
Max Brunsfeld
0faae52132 Fix some inconsistencies in error cost calculation
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-08-31 10:51:59 -07:00
Max Brunsfeld
1d0f6c3cc0 Rename error_size -> error_cost 2016-08-30 11:09:12 -07:00
Max Brunsfeld
d554fab5b5 Remove unused tree state constant 2016-06-27 14:39:12 -07:00
Max Brunsfeld
b40c0326dc Include parse tree rendering at end of debug output 2016-06-22 21:04:35 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
f69d709650 Remove unused functions 2016-06-15 10:17:54 -07:00
Max Brunsfeld
2109f0ed74 Handle allocation failures when copying tree arrays 2016-06-14 14:46:49 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
fd4c33209e Select ambiguous alternatives by minimizing error size 2016-04-24 00:54:20 -07:00
Max Brunsfeld
1fb6065f02 Move tree sexp function back to tree, for easier use in debugger 2016-04-24 00:09:32 -07:00
Max Brunsfeld
695be5bc79 Merge equivalent stacks in a separate stage of parsing
* No more automatic merging every time a state is pushed to the stack
* When popping from the stack, the current version is always preserved
2016-04-10 14:12:24 -07:00
Max Brunsfeld
267092940d Collapse redundant interior error nodes 2016-04-03 23:46:43 -07:00
Max Brunsfeld
4348eb89d4 Expose lower stack nodes via pop_until() function
This callback-based API allows the parser to easily visit each interior node
of the stack when searching for an error repair. It also is a better abstraction
over the stack's DAG implementation than having the public functions for
accessing entries and their successor entries.
2016-03-07 16:09:34 -08:00
Max Brunsfeld
aef7582a2a Start using the forward move to recover from errors
Some unit tests passing. Corpus tests still failing
2016-03-02 21:08:42 -08:00