tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	d4264d6191	Fix parsing of quantifiers with no upper bound	2018-08-06 13:47:26 -07:00
Max Brunsfeld	126f84aa73	Avoid unnecessary suffixes on external symbol identifiers	2018-08-01 16:11:21 -07:00
Max Brunsfeld	cb784975a4	Add IMMEDIATE_TOKEN rule type, for enforcing no preceding extras	2018-08-01 14:00:57 -07:00
Max Brunsfeld	e88dd223b2	Support {} quantifier syntax in regexes	2018-07-25 11:29:41 -07:00
Max Brunsfeld	e130c4ddb5	Add `word` property to grammar JSON schema	2018-06-15 13:32:41 -07:00
Max Brunsfeld	0dd41f0d74	Restore logic for restricting keyword tokens Removing this restriction created problems for the Rust grammar, and possibly others. The proper fix would be to ensure that the 'word token' matches every possible string that a 'keyword token' matches, as opposed to just matching some of the same strings. This would require us to gather a little more information about how tokens conflict. For now, I'm just going to put back the hard-coded logic that we had.	2018-06-15 13:15:02 -07:00
Max Brunsfeld	c39f0e9ef9	Rename word_rule -> word_token	2018-06-15 09:15:12 -07:00
Max Brunsfeld	91e3bc3e55	Update parse state merging logic for explicit word tokens Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-06-14 12:32:27 -07:00
Max Brunsfeld	190456d7ec	Fix logging during lex table construction Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-06-14 12:03:40 -07:00
Max Brunsfeld	6e72c2943d	Avoid missing field initializer warnings w/o default field syntax The default field syntax aint working on windows	2018-06-14 11:12:04 -07:00
Max Brunsfeld	e17cd42e47	Perform keyword optimization using explicitly selected word token rather than trying to infer the word token automatically. Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-06-14 09:35:54 -07:00
Max Brunsfeld	8120e61d8d	Remove blank lines from log messages	2018-05-25 21:37:25 -07:00
Max Brunsfeld	45c52f9459	Allow keywords to contain numbers, as long as they start w/ a letter	2018-05-25 21:28:47 -07:00
Max Brunsfeld	356d5e0221	Generalize logic for finding a keyword capture token	2018-05-25 15:29:15 -07:00
Max Brunsfeld	406a85a166	Log when lexical conflicts prevents parse state merging	2018-05-24 16:46:43 -07:00
Max Brunsfeld	915978aa9d	Avoid redundant logging of conflicting tokens	2018-05-24 16:22:16 -07:00
Max Brunsfeld	6fca8f2f4d	Make ts_compile_grammar take an optional log file, start logging to it	2018-05-24 16:01:14 -07:00
Max Brunsfeld	6bb63f549f	Fix unused lambda captures	2018-05-11 14:59:59 -07:00
Max Brunsfeld	e8cfb9ced0	Remove incorrect return statement This prevented conflicts between some tokens from being recorded properly. In the case of JavaScript, it prevented tree-sitter from recognizing the conflict between the forward slash operator and the regex token, allowing regexes to be merged into parse states containing '/' incorrectly. Refs tree-sitter/tree-sitter-javascript#71	2018-04-17 17:14:36 -07:00
Max Brunsfeld	379a2fd121	Incrementally build a tree of skipped tokens Rather than pushing them to the stack individually	2018-04-09 12:29:22 -07:00
Max Brunsfeld	1109a565fc	Merge pull request #159 from Pike/test-escape-brackets Tests for issue 158	2018-04-06 13:19:36 -07:00
Max Brunsfeld	1ca261c79b	Fix some regex parsing bugs * Allow escape sequences to be used in ranges * Don't give special meaning to dashes outside of character classes	2018-04-06 12:46:06 -07:00
Max Brunsfeld	65e654ea9b	Remove overly conservative check for the validity of keyword capture tokens	2018-04-05 13:25:16 -07:00
Pieter Goetschalckx	3d0ca31cf1	Add error message for TSCompileErrorTypeInvalidTokenContents	2018-03-30 21:12:09 +02:00
Max Brunsfeld	fb348c0f1e	Fix signed/unsigned comparison warning	2018-03-28 11:04:49 -07:00
Max Brunsfeld	e917756ad1	Remove depends_on_lookahead field from parse table entries This simplifies the logic for determining whether a token is reusable and makes it more conservative. It should fix some incremental parsing bugs that are being caught by the randomized tests on CI.	2018-03-28 10:58:33 -07:00
Max Brunsfeld	186f70649c	Consolidate the unify for detecting conflicting tokens	2018-03-28 10:03:09 -07:00
Max Brunsfeld	a8bc67ac42	Allow LookaheadSet::for_each to terminate early	2018-03-28 10:03:09 -07:00
Max Brunsfeld	43e14332ed	Avoid creating duplicate metadata rules	2018-03-28 10:03:09 -07:00
Max Brunsfeld	b7d0606fbd	Be less conservative in merging parse states with external tokens Also, clean up the internal representation of external tokens	2018-03-16 16:00:40 -07:00
Max Brunsfeld	7183f8d3e7	Fix unit reduction elimination bugs * Handle 'chains' of unit reductions starting in a single state * Avoid eliminating rules which will later receive aliases	2018-03-12 07:54:18 -07:00
Max Brunsfeld	72849787b1	Fix logic for identifying keyword capture token	2018-03-12 07:52:57 -07:00
Max Brunsfeld	128edbebd6	Eliminate non-user-visible unit reductions from parse tables	2018-03-08 12:53:32 -08:00
Max Brunsfeld	53cd89c614	Ensure keyword capture tokens aren't too loosely defined	2018-03-07 14:46:11 -08:00
Max Brunsfeld	c0cc35ff07	Create separate lexer function for keywords	2018-03-07 12:00:26 -08:00
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	32ef3e001a	Account for epsilon external tokens when merging parse states Do not merge a token T into a parse state S if S contains external tokens that can be followed by tokens that could be shadowed by T. At this point, the only automated test for this logic is via the bash grammar, in which the `]` token should not be merged into states in which `_concat` is valid, because `_concat` can be followed by a `_special_characters` token, and `]` would shadow `_special_characters`.	2018-02-28 14:47:04 -08:00
Max Brunsfeld	10a3cbd814	Move grammar schema to src folder Now that there's a docs folder that contains actual docs.	2018-02-26 00:40:20 -08:00
Max Brunsfeld	2daae48fe0	Handle conflicts in repeat rules after external tokens Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2018-02-14 11:24:51 -08:00
Max Brunsfeld	8c29841adf	Represent repetitions with associative structure	2018-02-12 11:41:56 -08:00
Max Brunsfeld	2e4f76c164	Don't allow an epsilon start rule if it is used in other rules	2018-01-23 17:05:28 -08:00
Max Brunsfeld	532bbeca0d	Remove wrong handling of \a in a regex	2017-12-12 16:50:53 -08:00
Max Brunsfeld	493db39363	Never move the start rule of a grammar into the lexical grammar This preserves a useful invariant that the root node of the AST is never a token.	2017-12-07 11:50:27 -08:00
Max Brunsfeld	ba607a1f84	Optimize lex state merging	2017-09-18 13:40:37 -07:00
Max Brunsfeld	b0fdc33f73	Remove 'extra' and 'structural' booleans from symbol metadata	2017-09-14 12:07:46 -07:00
Max Brunsfeld	91456d7a17	Avoid duplicate error state entries for tokens that are both internal & external	2017-09-14 10:54:13 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	4c9c05806a	Merge compatible starting token states before constructing lex table	2017-09-05 13:21:53 -07:00
Max Brunsfeld	9d668c5004	Move incompatible token map into LexTableBuilder	2017-08-31 15:46:37 -07:00
Max Brunsfeld	f8649824fa	Remove unused function	2017-08-31 15:30:44 -07:00

1 2 3 4 5 ...

619 commits