tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	128edbebd6	Eliminate non-user-visible unit reductions from parse tables	2018-03-08 12:53:32 -08:00
Max Brunsfeld	c0cc35ff07	Create separate lexer function for keywords	2018-03-07 12:00:26 -08:00
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	32ef3e001a	Account for epsilon external tokens when merging parse states Do not merge a token T into a parse state S if S contains external tokens that can be followed by tokens that could be shadowed by T. At this point, the only automated test for this logic is via the bash grammar, in which the `]` token should not be merged into states in which `_concat` is valid, because `_concat` can be followed by a `_special_characters` token, and `]` would shadow `_special_characters`.	2018-02-28 14:47:04 -08:00
Max Brunsfeld	2daae48fe0	Handle conflicts in repeat rules after external tokens Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2018-02-14 11:24:51 -08:00
Max Brunsfeld	8c29841adf	Represent repetitions with associative structure	2018-02-12 11:41:56 -08:00
Max Brunsfeld	493db39363	Never move the start rule of a grammar into the lexical grammar This preserves a useful invariant that the root node of the AST is never a token.	2017-12-07 11:50:27 -08:00
Max Brunsfeld	91456d7a17	Avoid duplicate error state entries for tokens that are both internal & external	2017-09-14 10:54:13 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	4c9c05806a	Merge compatible starting token states before constructing lex table	2017-09-05 13:21:53 -07:00
Max Brunsfeld	9d668c5004	Move incompatible token map into LexTableBuilder	2017-08-31 15:46:37 -07:00
Max Brunsfeld	573b5f3671	Pass LexTableBuilder to ParseTableBuilder	2017-08-25 15:57:50 -07:00

12 commits