Commit graph

612 commits

Author SHA1 Message Date
Max Brunsfeld
91e3bc3e55 Update parse state merging logic for explicit word tokens
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 12:32:27 -07:00
Max Brunsfeld
190456d7ec Fix logging during lex table construction
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 12:03:40 -07:00
Max Brunsfeld
6e72c2943d Avoid missing field initializer warnings w/o default field syntax
The default field syntax aint working on windows
2018-06-14 11:12:04 -07:00
Max Brunsfeld
e17cd42e47 Perform keyword optimization using explicitly selected word token
rather than trying to infer the word token automatically.

Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 09:35:54 -07:00
Max Brunsfeld
8120e61d8d Remove blank lines from log messages 2018-05-25 21:37:25 -07:00
Max Brunsfeld
45c52f9459 Allow keywords to contain numbers, as long as they start w/ a letter 2018-05-25 21:28:47 -07:00
Max Brunsfeld
356d5e0221 Generalize logic for finding a keyword capture token 2018-05-25 15:29:15 -07:00
Max Brunsfeld
406a85a166 Log when lexical conflicts prevents parse state merging 2018-05-24 16:46:43 -07:00
Max Brunsfeld
915978aa9d Avoid redundant logging of conflicting tokens 2018-05-24 16:22:16 -07:00
Max Brunsfeld
6fca8f2f4d Make ts_compile_grammar take an optional log file, start logging to it 2018-05-24 16:01:14 -07:00
Max Brunsfeld
6bb63f549f Fix unused lambda captures 2018-05-11 14:59:59 -07:00
Max Brunsfeld
e8cfb9ced0 Remove incorrect return statement
This prevented conflicts between some tokens from being recorded
properly. In the case of JavaScript, it prevented tree-sitter from
recognizing the conflict between the forward slash operator and the
regex token, allowing regexes to be merged into parse states containing
'/' incorrectly.

Refs tree-sitter/tree-sitter-javascript#71
2018-04-17 17:14:36 -07:00
Max Brunsfeld
379a2fd121 Incrementally build a tree of skipped tokens
Rather than pushing them to the stack individually
2018-04-09 12:29:22 -07:00
Max Brunsfeld
1109a565fc
Merge pull request #159 from Pike/test-escape-brackets
Tests for issue 158
2018-04-06 13:19:36 -07:00
Max Brunsfeld
1ca261c79b Fix some regex parsing bugs
* Allow escape sequences to be used in ranges
* Don't give special meaning to dashes outside of character classes
2018-04-06 12:46:06 -07:00
Max Brunsfeld
65e654ea9b Remove overly conservative check for the validity of keyword capture tokens 2018-04-05 13:25:16 -07:00
Pieter Goetschalckx
3d0ca31cf1 Add error message for TSCompileErrorTypeInvalidTokenContents 2018-03-30 21:12:09 +02:00
Max Brunsfeld
fb348c0f1e Fix signed/unsigned comparison warning 2018-03-28 11:04:49 -07:00
Max Brunsfeld
e917756ad1 Remove depends_on_lookahead field from parse table entries
This simplifies the logic for determining whether a token is reusable
and makes it more conservative. It should fix some incremental parsing
bugs that are being caught by the randomized tests on CI.
2018-03-28 10:58:33 -07:00
Max Brunsfeld
186f70649c Consolidate the unify for detecting conflicting tokens 2018-03-28 10:03:09 -07:00
Max Brunsfeld
a8bc67ac42 Allow LookaheadSet::for_each to terminate early 2018-03-28 10:03:09 -07:00
Max Brunsfeld
43e14332ed Avoid creating duplicate metadata rules 2018-03-28 10:03:09 -07:00
Max Brunsfeld
b7d0606fbd Be less conservative in merging parse states with external tokens
Also, clean up the internal representation of external tokens
2018-03-16 16:00:40 -07:00
Max Brunsfeld
7183f8d3e7 Fix unit reduction elimination bugs
* Handle 'chains' of unit reductions starting in a single state
* Avoid eliminating rules which will later receive aliases
2018-03-12 07:54:18 -07:00
Max Brunsfeld
72849787b1 Fix logic for identifying keyword capture token 2018-03-12 07:52:57 -07:00
Max Brunsfeld
128edbebd6 Eliminate non-user-visible unit reductions from parse tables 2018-03-08 12:53:32 -08:00
Max Brunsfeld
53cd89c614 Ensure keyword capture tokens aren't too loosely defined 2018-03-07 14:46:11 -08:00
Max Brunsfeld
c0cc35ff07 Create separate lexer function for keywords 2018-03-07 12:00:26 -08:00
Max Brunsfeld
52087de4f0 Remove the concept of fragile reductions
They were a vestige of when Tree-sitter did sentential form-based
incremental parsing (as opposed to simply state matching). This was
elegant but not compatible with GLR as far as I could tell.
2018-03-02 14:51:54 -08:00
Max Brunsfeld
32ef3e001a Account for epsilon external tokens when merging parse states
Do not merge a token T into a parse state S if S contains
external tokens that can be *followed* by tokens that could
be shadowed by T.

At this point, the only automated test for this logic is via
the bash grammar, in which the `]` token should not be merged
into states in which `_concat` is valid, because `_concat`
can be followed by a `_special_characters` token, and `]`
would shadow `_special_characters`.
2018-02-28 14:47:04 -08:00
Max Brunsfeld
10a3cbd814 Move grammar schema to src folder
Now that there's a docs folder that contains actual docs.
2018-02-26 00:40:20 -08:00
Max Brunsfeld
2daae48fe0 Handle conflicts in repeat rules after external tokens
Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2018-02-14 11:24:51 -08:00
Max Brunsfeld
8c29841adf Represent repetitions with associative structure 2018-02-12 11:41:56 -08:00
Max Brunsfeld
2e4f76c164 Don't allow an epsilon start rule if it is used in other rules 2018-01-23 17:05:28 -08:00
Max Brunsfeld
532bbeca0d Remove wrong handling of \a in a regex 2017-12-12 16:50:53 -08:00
Max Brunsfeld
493db39363 Never move the start rule of a grammar into the lexical grammar
This preserves a useful invariant that the root node of the AST is never
a token.
2017-12-07 11:50:27 -08:00
Max Brunsfeld
ba607a1f84 Optimize lex state merging 2017-09-18 13:40:37 -07:00
Max Brunsfeld
b0fdc33f73 Remove 'extra' and 'structural' booleans from symbol metadata 2017-09-14 12:07:46 -07:00
Max Brunsfeld
91456d7a17 Avoid duplicate error state entries for tokens that are both internal & external 2017-09-14 10:54:13 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
4c9c05806a Merge compatible starting token states before constructing lex table 2017-09-05 13:21:53 -07:00
Max Brunsfeld
9d668c5004 Move incompatible token map into LexTableBuilder 2017-08-31 15:46:37 -07:00
Max Brunsfeld
f8649824fa Remove unused function 2017-08-31 15:30:44 -07:00
Max Brunsfeld
c285fbef38 Clear LexTableBuilder's state after detecting conflicts 2017-08-25 17:11:39 -07:00
Max Brunsfeld
573b5f3671 Pass LexTableBuilder to ParseTableBuilder 2017-08-25 15:57:50 -07:00
Max Brunsfeld
eace426129 Suppress unknown pragma warnings in MSVC 2017-08-09 10:14:05 -07:00
Max Brunsfeld
964dd16812 Avoid unicode escape sequences when generating conflict messages 2017-08-09 09:32:58 -07:00
Max Brunsfeld
5f40adb70c Recur to sub-rules in a deterministic order in expand_repeats 2017-08-08 17:20:04 -07:00
Max Brunsfeld
e6b43700b9 Get generated parsers compiling and loading properly on windows 2017-08-08 16:47:51 -07:00
Max Brunsfeld
9d616b3bf8 Replace size_t -> LexStateId in LexTableBuilder::remove_duplicate_states 2017-08-08 12:55:35 -07:00