tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	8c29841adf	Represent repetitions with associative structure	2018-02-12 11:41:56 -08:00
Max Brunsfeld	b0fdc33f73	Remove 'extra' and 'structural' booleans from symbol metadata	2017-09-14 12:07:46 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	eace426129	Suppress unknown pragma warnings in MSVC	2017-08-09 10:14:05 -07:00
Max Brunsfeld	e6b43700b9	Get generated parsers compiling and loading properly on windows	2017-08-08 16:47:51 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	4649c3a37f	Avoid creating redundant rename sequences	2017-07-18 15:29:06 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Max Brunsfeld	66dc12587a	Call the external scanner whenever an external token is valid For some reason, there was previously some extra logic that prevented the external scanner from being invoked if the only valid external token also had an internal definition. It's surprising to not call the external scanner if an external token is valid.	2017-07-17 10:28:59 -07:00
Max Brunsfeld	b3a72954ff	Introduce RENAME rule type	2017-07-13 17:17:22 -07:00
Max Brunsfeld	59236d2ed1	Avoid redundant character comparisons in generated lex function	2017-07-10 14:09:31 -07:00
Max Brunsfeld	d8e9d04fe7	Add PREC_DYNAMIC rule for resolving runtime ambiguities	2017-07-06 15:24:45 -07:00
joshvera	f76935cc7e	just make it static	2017-03-24 18:38:21 -04:00
joshvera	6938b288a5	Make external scanner symbol map unique	2017-03-24 14:51:37 -04:00
Max Brunsfeld	ed8fbff175	Allow anonymous tokens to be used in grammars' external token lists	2017-03-17 16:31:29 -07:00
Max Brunsfeld	db4b9ebc7c	Implement Rule as a union rather than an abstract base class	2017-03-17 13:29:31 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	f04d7c5860	Handle unused tokens	2017-03-09 21:16:37 -08:00
Max Brunsfeld	abf8a4f2c2	🎨	2017-03-01 22:15:26 -08:00
Max Brunsfeld	686dc0997c	Avoid introducing certain lexical conflicts during parse state merging The current pretty conservative approach is to avoid merging parse states which would cause a pair tokens to co-exist for the first time in any parse state, where the two tokens can start with the same character and at least one of the tokens can contain a character which is part of the grammar's separators.	2017-02-27 22:54:38 -08:00
Max Brunsfeld	0a6e5f9ee6	Fix some build warnings on gcc	2017-01-31 11:46:28 -08:00
Max Brunsfeld	60f6998485	Rename generated language functions to e.g. `tree_sitter_python` They used to be called e.g. `ts_language_python`. Now that there are APIs that deal with the `TSLanguage` objects themselves, such as `ts_language_symbol_count`, the old names were a little confusing.	2017-01-31 10:29:31 -08:00
Max Brunsfeld	d853b6504d	Add version number to TSLanguage structs	2017-01-31 10:21:47 -08:00
Max Brunsfeld	3706678b89	Pass const TSExternalTokenState to external scanner deserialize hook	2016-12-21 13:58:18 -08:00
Max Brunsfeld	34a65f588d	Tweak naming and organization of external-scanner related language fields	2016-12-21 11:24:41 -08:00
Max Brunsfeld	42c41c158c	Refactor logic for handling shared internal/external tokens	2016-12-21 10:49:55 -08:00
Max Brunsfeld	2b3da512a4	Add serialize, deserialize and reset callbacks to external scanners Signed-off-by: Nathan Sobo <nathan@github.com>	2016-12-20 13:12:01 -08:00
Max Brunsfeld	10b51a05a1	Allow external scanners to refer to (and return) internally-defined tokens Tokens that are defined in the grammar's rules may now be included in the externals list also, so that external scanners can check if they are valid lookaheads or not, and if so, can return them to the parser if needed.	2016-12-09 13:32:58 -08:00
Max Brunsfeld	83514293b5	Allow external tokens to be either visible or hidden	2016-12-05 17:26:11 -08:00
Max Brunsfeld	1251ff2e30	Consider externals to be named, not anonymous	2016-12-05 17:09:22 -08:00
Max Brunsfeld	0f8e130687	Call external scanner functions when lexing	2016-12-02 22:03:48 -08:00
Max Brunsfeld	c966af0412	Start work on external tokens	2016-12-02 16:24:19 -08:00
Max Brunsfeld	32387400c6	Rework LR conflict resolution * Unify precedence/associativity-based resolution with the search for a whitelisted conflict * Improve conflict error messages	2016-11-18 13:50:55 -08:00
Max Brunsfeld	fad7294ba4	Store shift states for non-terminals directly in the main parse table	2016-11-14 08:36:06 -08:00
Max Brunsfeld	e149d94ff5	Remove generated parsers' dependency on runtime.h	2016-10-05 14:02:49 -07:00
Max Brunsfeld	b76574e01c	Handle ambiguities between extra and non-extra tokens using normal GLR splitting	2016-09-06 10:22:16 -07:00
Max Brunsfeld	1c52c30111	Fix unexpected EOF errors getting lost	2016-09-03 22:46:14 -07:00
Max Brunsfeld	4182de2975	Include each symbol's numeric value in generated code Sometimes these are useful for debugging	2016-08-26 17:40:22 -07:00
Max Brunsfeld	1c66d90203	Mark repeat symbols as anonymous	2016-07-17 10:44:08 -07:00
Max Brunsfeld	fa8993460e	Don't reuse unexpected tokens for now	2016-07-17 07:25:13 -07:00
Max Brunsfeld	8c26d99353	Store error recovery actions in the normal parse table	2016-06-27 14:07:47 -07:00
Max Brunsfeld	43ae8235fd	Remove the error action; a lack of actions implies an error.	2016-06-21 22:53:48 -07:00
Max Brunsfeld	38c144b4a3	Refine logic for deciding when tokens need to be re-lexed * While generating the lex table, note which tokens can match the same string. A token needs to be relexed when it has possible homonyms in the current state. * Also note which tokens can match substrings of each other tokens. A token needs to be relexed when there are viable tokens that could match longer strings in the current state and the next token has been edited. * Remove the logic for marking tokens as fragile on creation. * Store the reusability/non-reusability of symbols off of individual actions and onto the entire entry for the state & symbol.	2016-06-21 07:28:04 -07:00
Max Brunsfeld	45f7cee0c8	Handle extra tokens properly during error recovery	2016-06-18 20:46:25 -07:00
Max Brunsfeld	94721c7ec0	Rewind and re-tokenize in error mode after detecting an error	2016-06-17 21:26:03 -07:00
Max Brunsfeld	1e353381ff	Don't create error node in lexer unless token is completely invalid Before, any syntax error would cause the lexer to create an error leaf node. This could happen even with a valid input, if the parse stack had split and one particular version of the parse stack failed to parse. Now, an error leaf node is only created when the lexer cannot understand part of the input stream at all. When a normal syntax error occurs, the lexer just returns a token that is outside of the expected token set, and the parser handles the unexpected token.	2016-05-26 14:15:10 -07:00
Max Brunsfeld	a3679fbb1f	Distinguish separators from main tokens via a property on transitions It was incorrect to store it as a property on the lexical states themselves	2016-05-19 16:27:25 -07:00
Max Brunsfeld	22c550c9d6	Discard tokens after error detection to find the best repair * Use GLR stack-splitting to try all numbers of tokens to discard until a repair is found. * Check the validity of repairs by looking at the child trees, rather than the statically-computed 'in-progress symbols' list	2016-05-11 13:49:43 -07:00

1 2 3 4

156 commits