Commit graph

34 commits

Author SHA1 Message Date
Max Brunsfeld
cb784975a4 Add IMMEDIATE_TOKEN rule type, for enforcing no preceding extras 2018-08-01 14:00:57 -07:00
Max Brunsfeld
0dd41f0d74 Restore logic for restricting keyword tokens
Removing this restriction created problems for the Rust grammar, and
possibly others. The proper fix would be to ensure that the 'word
token' matches *every* possible string that a 'keyword token'
matches, as opposed to just matching *some* of the same strings.
This would require us to gather a little more information
about how tokens conflict. For now, I'm just going to put back the
hard-coded logic that we had.
2018-06-15 13:15:02 -07:00
Max Brunsfeld
c39f0e9ef9 Rename word_rule -> word_token 2018-06-15 09:15:12 -07:00
Max Brunsfeld
91e3bc3e55 Update parse state merging logic for explicit word tokens
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 12:32:27 -07:00
Max Brunsfeld
190456d7ec Fix logging during lex table construction
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 12:03:40 -07:00
Max Brunsfeld
e17cd42e47 Perform keyword optimization using explicitly selected word token
rather than trying to infer the word token automatically.

Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-14 09:35:54 -07:00
Max Brunsfeld
45c52f9459 Allow keywords to contain numbers, as long as they start w/ a letter 2018-05-25 21:28:47 -07:00
Max Brunsfeld
356d5e0221 Generalize logic for finding a keyword capture token 2018-05-25 15:29:15 -07:00
Max Brunsfeld
915978aa9d Avoid redundant logging of conflicting tokens 2018-05-24 16:22:16 -07:00
Max Brunsfeld
6fca8f2f4d Make ts_compile_grammar take an optional log file, start logging to it 2018-05-24 16:01:14 -07:00
Max Brunsfeld
e8cfb9ced0 Remove incorrect return statement
This prevented conflicts between some tokens from being recorded
properly. In the case of JavaScript, it prevented tree-sitter from
recognizing the conflict between the forward slash operator and the
regex token, allowing regexes to be merged into parse states containing
'/' incorrectly.

Refs tree-sitter/tree-sitter-javascript#71
2018-04-17 17:14:36 -07:00
Max Brunsfeld
65e654ea9b Remove overly conservative check for the validity of keyword capture tokens 2018-04-05 13:25:16 -07:00
Max Brunsfeld
fb348c0f1e Fix signed/unsigned comparison warning 2018-03-28 11:04:49 -07:00
Max Brunsfeld
e917756ad1 Remove depends_on_lookahead field from parse table entries
This simplifies the logic for determining whether a token is reusable
and makes it more conservative. It should fix some incremental parsing
bugs that are being caught by the randomized tests on CI.
2018-03-28 10:58:33 -07:00
Max Brunsfeld
186f70649c Consolidate the unify for detecting conflicting tokens 2018-03-28 10:03:09 -07:00
Max Brunsfeld
a8bc67ac42 Allow LookaheadSet::for_each to terminate early 2018-03-28 10:03:09 -07:00
Max Brunsfeld
43e14332ed Avoid creating duplicate metadata rules 2018-03-28 10:03:09 -07:00
Max Brunsfeld
72849787b1 Fix logic for identifying keyword capture token 2018-03-12 07:52:57 -07:00
Max Brunsfeld
53cd89c614 Ensure keyword capture tokens aren't too loosely defined 2018-03-07 14:46:11 -08:00
Max Brunsfeld
c0cc35ff07 Create separate lexer function for keywords 2018-03-07 12:00:26 -08:00
Max Brunsfeld
32ef3e001a Account for epsilon external tokens when merging parse states
Do not merge a token T into a parse state S if S contains
external tokens that can be *followed* by tokens that could
be shadowed by T.

At this point, the only automated test for this logic is via
the bash grammar, in which the `]` token should not be merged
into states in which `_concat` is valid, because `_concat`
can be followed by a `_special_characters` token, and `]`
would shadow `_special_characters`.
2018-02-28 14:47:04 -08:00
Max Brunsfeld
ba607a1f84 Optimize lex state merging 2017-09-18 13:40:37 -07:00
Max Brunsfeld
4c9c05806a Merge compatible starting token states before constructing lex table 2017-09-05 13:21:53 -07:00
Max Brunsfeld
9d668c5004 Move incompatible token map into LexTableBuilder 2017-08-31 15:46:37 -07:00
Max Brunsfeld
c285fbef38 Clear LexTableBuilder's state after detecting conflicts 2017-08-25 17:11:39 -07:00
Max Brunsfeld
9d616b3bf8 Replace size_t -> LexStateId in LexTableBuilder::remove_duplicate_states 2017-08-08 12:55:35 -07:00
Max Brunsfeld
5bd5b4bb05 Replace <cctype> -> <cwctype> 2017-07-10 14:35:14 -07:00
Max Brunsfeld
1586d70cbe Compute conflicting tokens more precisely
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
2017-07-07 17:54:24 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
64e9230071 Use LexTableBuilder to detect conflicts between tokens more correctly 2017-03-08 12:47:38 -08:00
Renamed from src/compiler/build_tables/build_lex_table.cc (Browse further)