tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	e59558c83b	Allow stack versions to be temporarily paused This way, when detecting an error, we can defer the decision about whether to bail or recover until all stack versions are processed.	2018-04-06 13:26:18 -07:00
Max Brunsfeld	dbe77e7199	Simplify testing-only ts_stack_iterate function	2018-03-29 17:50:07 -07:00
Max Brunsfeld	5520983144	Clean up Stack API * Remove StackPopResult * Rename top_state() -> state() * Rename top_position() -> position() * Improve docs	2018-03-29 17:37:54 -07:00
Max Brunsfeld	addeb6c4c1	Allocate and free trees using an object pool	2017-12-27 10:34:29 -08:00
Max Brunsfeld	121a6a66ec	Take total dynamic precedence into account in stack version sorting Signed-off-by: Josh Vera <vera@github.com>	2017-10-09 15:51:22 -07:00
Max Brunsfeld	d291af9a31	Refactor error comparisons * Deal with mergeability outside of error comparison function * Make `better_version_exists` function pure (don't halt other versions as a side effect). * Tweak error comparison logic Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-13 16:38:15 -07:00
Max Brunsfeld	07fb3ab0e6	Abort recoveries before popping if better versions already exist	2017-09-13 09:56:51 -07:00
Max Brunsfeld	819235bac3	Limit the number of stack nodes that are included in a summary	2017-09-12 12:00:00 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	a89322c5f1	Remove unneeded parameters from public interface of stack_iterate callback	2017-06-29 16:43:56 -07:00
Max Brunsfeld	009d6d1534	Improve heuristics for pruning parse versions based on errors * Rewrite the error cost comparison in terms of explicit, discrete conditions. * Allow merging versions have different error costs. * Store the depth of each stack version since the last error. Use this state to prevent incorrect merging. * Sort the stack versions in order of preference and put a hard limit on the version count.	2017-06-29 15:00:20 -07:00
Max Brunsfeld	445be0736a	Clean up ts_stack_push function	2017-06-29 15:00:20 -07:00
Max Brunsfeld	0143bfdad4	Avoid use-after-free of external token states Previously, it was possible for references to external token states to outlive the trees to which those states belonged. Now, instead of storing references to external token states in the Stack and in the Lexer, we store references to the external token trees themselves, and we retain the trees to prevent use-after-free.	2017-06-27 14:54:27 -07:00
Max Brunsfeld	d57043b665	Add ability to store external token state per stack version	2017-01-04 21:22:23 -08:00
Max Brunsfeld	e7217f1bac	Clean up some methods in parser.c	2016-11-14 17:25:55 -08:00
Max Brunsfeld	535879a2bd	Represent byte, char and tree counts as 32 bit numbers The parser spends the majority of its time allocating and freeing trees and stack nodes. Also, the memory footprint of the AST is a significant concern when using tree-sitter with large files. This library is already unlikely to work very well with source files larger than 4GB, so representing rows, columns, byte lengths and child indices as unsigned 32 bit integers seems like the right choice.	2016-11-14 12:19:13 -08:00
Max Brunsfeld	c9dcb29c6f	Remove the TS prefix from some internal type/function names	2016-11-09 20:59:05 -08:00
Max Brunsfeld	4106ecda43	Remove logic for recovering from OOM	2016-11-04 09:18:38 -07:00
Max Brunsfeld	e149d94ff5	Remove generated parsers' dependency on runtime.h	2016-10-05 14:02:49 -07:00
Max Brunsfeld	e0b0e29a2b	Update parse count correctly when repairing errors & undoing reductions	2016-09-01 10:04:20 -07:00
Max Brunsfeld	7483da4184	Add push_count to stack, use it in error comparisons	2016-08-31 17:29:14 -07:00
Max Brunsfeld	0faae52132	Fix some inconsistencies in error cost calculation Signed-off-by: Nathan Sobo <nathan@github.com>	2016-08-31 10:51:59 -07:00
Max Brunsfeld	52ccebbf80	Rename error_depth -> error_count	2016-08-30 09:44:40 -07:00
Max Brunsfeld	00a0939504	Abort erroneous parse versions more eagerly	2016-06-02 14:04:48 -07:00
Max Brunsfeld	ea47fdc0fe	Rework logic for when to abandon parses with errors	2016-05-29 22:36:47 -07:00
Max Brunsfeld	6535704870	Replace stack_merge_new function with two simpler functions - merge(version1, version2) - split(version)	2016-05-28 21:22:10 -07:00
Max Brunsfeld	e686478ad2	Rename stack_merge function to stack_merge_all	2016-05-28 20:24:08 -07:00
Max Brunsfeld	1e353381ff	Don't create error node in lexer unless token is completely invalid Before, any syntax error would cause the lexer to create an error leaf node. This could happen even with a valid input, if the parse stack had split and one particular version of the parse stack failed to parse. Now, an error leaf node is only created when the lexer cannot understand part of the input stream at all. When a normal syntax error occurs, the lexer just returns a token that is outside of the expected token set, and the parser handles the unexpected token.	2016-05-26 14:15:10 -07:00
Max Brunsfeld	88053cf723	In tests, don’t record allocations while printing debug graphs	2016-05-16 10:44:19 -07:00
Max Brunsfeld	d50f6a58cc	Abort parse versions w/ worse errors when repairing an error	2016-05-16 10:33:19 -07:00
Max Brunsfeld	22c550c9d6	Discard tokens after error detection to find the best repair * Use GLR stack-splitting to try all numbers of tokens to discard until a repair is found. * Check the validity of repairs by looking at the child trees, rather than the statically-computed 'in-progress symbols' list	2016-05-11 13:49:43 -07:00
Max Brunsfeld	e99a3925e0	Merge all versions created in a given reduce operation	2016-04-24 00:55:19 -07:00
Max Brunsfeld	fd4c33209e	Select ambiguous alternatives by minimizing error size	2016-04-24 00:54:20 -07:00
Max Brunsfeld	cad663b144	Consider multiple error repairs on the same path of the stack This changes the API to the stack_iterate function so that you can pop from the stack without stopping iteration	2016-04-15 21:28:00 -07:00
Max Brunsfeld	695be5bc79	Merge equivalent stacks in a separate stage of parsing * No more automatic merging every time a state is pushed to the stack * When popping from the stack, the current version is always preserved	2016-04-10 14:12:24 -07:00
Max Brunsfeld	5ba40f15ad	Rename stack heads to versions	2016-04-04 12:25:57 -07:00
Max Brunsfeld	b1a696085a	Clean up stack pop functions	2016-04-04 11:59:10 -07:00
Max Brunsfeld	2f3e92c9be	Add function for popping all nodes from the stack	2016-04-04 11:44:45 -07:00
Max Brunsfeld	91e3609fbf	Write to file directly from stack debugging function	2016-04-02 22:18:44 -07:00
Max Brunsfeld	6bce6da1e6	Store `verifying` flag within parse stack	2016-03-31 12:03:21 -07:00
Max Brunsfeld	e7d3d40a59	Explicitly inform stack pop callback when the stack is exhausted Also, pass non-extra tree count as a single value, rather than keeping track of the extra count and the total separately.	2016-03-10 11:51:55 -08:00
Max Brunsfeld	4348eb89d4	Expose lower stack nodes via pop_until() function This callback-based API allows the parser to easily visit each interior node of the stack when searching for an error repair. It also is a better abstraction over the stack's DAG implementation than having the public functions for accessing entries and their successor entries.	2016-03-07 16:09:34 -08:00
Max Brunsfeld	c0595c21c5	Halt stack pops at all error states, not just error trees	2016-03-03 11:05:37 -08:00
Max Brunsfeld	3d516aeeec	Give StackPushResult enumerators shorter names	2016-03-03 10:20:05 -08:00
Max Brunsfeld	8a13b5d120	Rename StackPopResult -> StackSlice	2016-03-03 10:16:10 -08:00
Max Brunsfeld	5a34d74702	Clean up stack	2016-02-25 21:51:39 -08:00
Max Brunsfeld	da2ef7ad35	Store trees in the links between stack nodes, not in the nodes themselves	2016-02-23 17:35:50 -08:00
Max Brunsfeld	6dd92c3abe	Add function for rendering the stack as a DOT graph	2016-02-23 00:08:55 -08:00
Max Brunsfeld	f444a715fd	Clean up tree array assertions in stack spec	2016-02-22 09:23:25 -08:00
Max Brunsfeld	b113dc8b0f	Return a TreeArray from ts_stack_pop Since the capacity is now included in the return value, the buffer can be reused in the ts_parser__accept function. Also, it's just cleaner to use Array consistently, rather than a separate buffer and size.	2016-02-21 22:31:13 -08:00

1 2

76 commits