Removing this restriction created problems for the Rust grammar, and
possibly others. The proper fix would be to ensure that the 'word
token' matches *every* possible string that a 'keyword token'
matches, as opposed to just matching *some* of the same strings.
This would require us to gather a little more information
about how tokens conflict. For now, I'm just going to put back the
hard-coded logic that we had.
This prevented conflicts between some tokens from being recorded
properly. In the case of JavaScript, it prevented tree-sitter from
recognizing the conflict between the forward slash operator and the
regex token, allowing regexes to be merged into parse states containing
'/' incorrectly.
Refs tree-sitter/tree-sitter-javascript#71
This simplifies the logic for determining whether a token is reusable
and makes it more conservative. It should fix some incremental parsing
bugs that are being caught by the randomized tests on CI.
Do not merge a token T into a parse state S if S contains
external tokens that can be *followed* by tokens that could
be shadowed by T.
At this point, the only automated test for this logic is via
the bash grammar, in which the `]` token should not be merged
into states in which `_concat` is valid, because `_concat`
can be followed by a `_special_characters` token, and `]`
would shadow `_special_characters`.
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -
std::vector<std::pair<int, int>>
to not parse correctly.