Max Brunsfeld
e947d7e2ad
Adjust test assertions for subtly different recoveries
2016-08-29 11:23:52 -07:00
Max Brunsfeld
1b8843dd41
Perform all possible reductions recursively upon detecting an error
2016-08-29 11:23:35 -07:00
Max Brunsfeld
9538b5b879
Don't count extra trees toward stack versions' error costs
2016-06-26 22:46:40 -07:00
Max Brunsfeld
9972709e43
Allow error recovery to skip non-terminal nodes after error detection
2016-06-24 10:28:05 -07:00
Max Brunsfeld
94721c7ec0
Rewind and re-tokenize in error mode after detecting an error
2016-06-17 21:26:03 -07:00
Max Brunsfeld
e70547cd11
Allow recoveries that skip leading children of invisible trees
...
Before this, errors could only be recovered by skipping internal children.
2016-06-14 14:48:35 -07:00
Max Brunsfeld
9b67b21dcd
Fix an outdated error corpus entry
2016-06-02 14:04:10 -07:00
Max Brunsfeld
e1a3a1daeb
Import error corpus entries from grammar repos
...
Now that error recovery requires no input for the grammar author, it shouldn't
be tested in the individual grammar repos.
2016-05-28 20:12:02 -07:00
Max Brunsfeld
0f7dbea9a3
Unify test targets, use externally defined languages as fixtures
2016-01-15 11:19:24 -08:00
Max Brunsfeld
ad4089a4bf
Move anonymous tokens grammar into integration spec
2016-01-14 10:35:03 -08:00
Max Brunsfeld
49f393b75e
Merge pull request #22 from maxbrunsfeld/c-compiler-api
...
Simplify the compiler API
2016-01-13 21:08:41 -08:00
Max Brunsfeld
d4632ab9a9
Make the compile function plain C and take a JSON grammar
2016-01-11 12:33:48 -08:00
Max Brunsfeld
36870bfced
Make Grammar a simple struct
2016-01-08 15:51:30 -08:00
Max Brunsfeld
e59f6294cb
Fix bug in lexical state de-duping
2015-12-30 11:15:36 -08:00
Max Brunsfeld
4b04afac5e
Control lexer's error-mode via explicit boolean argument
...
Previously, the lexer would operate in error-mode (ignoring any garbage input
until it found a valid token) if it was invoked in the 'error' state. Now that
the error state is deduped with other lexical states, the lexer might be invoked
in that state even when error-mode is not intended. This adds a third argument
to `ts_lex` that explicitly sets the error-mode.
This bug was unlikely to occur in any real grammars, but it caused the
node-tree-sitter-compiler test suite to fail for some grammars with only one
rule.
2015-12-30 09:43:12 -08:00
Max Brunsfeld
939476c947
When removing duplicate lex states, update the error state too
...
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
97a281502e
Store parse table more compactly
2015-12-29 11:27:41 -08:00
Max Brunsfeld
386b124866
Ensure that there are no duplicate lex states
2015-12-20 15:46:13 -08:00
Max Brunsfeld
c9db5499e9
Remove uninteresting corpus entries
2015-12-18 13:46:24 -08:00
Max Brunsfeld
66460b24fd
Use more greek letters in arithmetic corpus
2015-12-18 13:46:10 -08:00
Max Brunsfeld
1c6ad5f7e4
Rename ubiquitous_tokens -> extra_tokens in compiler API
...
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
c495076adb
Record in parse table which actions can hide splits
...
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
66144dc28e
Treat tokens that are sometimes extra as fragile
2015-12-16 20:04:45 -08:00
Max Brunsfeld
9bff4d0b06
Add concise method syntax to javascript fixture grammar
...
This exposes an ambiguity handling bug that I discovered while adding ES6 support to
tree-sitter-javascript
2015-12-15 22:25:48 -08:00
Max Brunsfeld
d713054d61
Record which tokens are fragile when lexing
2015-12-10 21:05:54 -08:00
Max Brunsfeld
75f31a79a3
Treat reduce actions with different production IDs as distinct
2015-12-10 13:00:26 -08:00
Max Brunsfeld
76e4599d5e
For now, allow any expression as an assignment LHS
2015-12-06 14:14:17 -08:00
Max Brunsfeld
863cabc827
Don't include trailing ubiquitous tokens as children when reducing
2015-12-02 15:31:15 -08:00
Max Brunsfeld
64e56f5acc
Add assignments to C grammar
...
This creates another source of ambiguity: assignments vs initializations
for declarations. This is good for testing ambiguity handling
2015-12-02 15:10:24 -08:00
Max Brunsfeld
ad619d95f6
Add 'extra' field to symbol metadata
...
This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
2015-12-02 15:10:24 -08:00
Max Brunsfeld
f08554e958
Replace NodeType enum with SymbolMetadata bitfield
...
This will allow storing other metadata about symbols, like if they
only appear as ubiquitous tokens
2015-12-02 15:10:24 -08:00
Max Brunsfeld
40a90b551a
Allow error recovery to look all the way to the bottom of the stack
...
Previously, there was a bug where the first node on the stack
would never be popped
2015-11-11 16:59:41 -08:00
Max Brunsfeld
1a5d5b3156
Make ambiguities resolve deterministically
...
In the future, they should resolve according to some kind of dynamic
precedence annotations provided in the grammars. For now, this at least makes
them fully deterministic, so that tests won't fail due to ambiguities resolving
differently after undone edits.
2015-11-11 16:54:03 -08:00
Max Brunsfeld
e11515fb74
Escape backslashes and quotes in symbol name strings
2015-11-09 09:33:24 -08:00
Max Brunsfeld
d5ce268074
Fix handling of changing precedence within lexical rules.
...
A precedence annotation wrapping a sequence of characters now only affects how
tightly those characters bind to *each other*, not how tightly they bind to the
preceding character.
This bug surfaced because a generated lexer was failing to recognize a '\n' character
as a token, instead treating it as ubiquitous whitespace. It made this error
because, even though anonymous ubiquitous tokens have the lowest precedence, the
character immediately *after* the '\n' was part of a normal token, which had
*normal* precedence (0). Advancing into that following token was incorrectly
prioritized above accepting the line-break token.
2015-11-08 13:36:15 -08:00
Max Brunsfeld
30b6530fd1
Account for parse stack merges when shifting
...
Previously, when the parse stack was split into 3 or more heads, it was
possible for head 3 to be accidentally skipped if head 2 merged with head 1.
2015-11-05 21:21:18 -08:00
Max Brunsfeld
a0eca388e8
Make fixture C grammar a subset of tree-sitter-c
2015-11-05 21:19:22 -08:00
Max Brunsfeld
d7cb48aae7
Fix handling of precedence for repeat rules
2015-11-01 21:00:44 -08:00
Max Brunsfeld
d6ee28abd0
Make precedence more useful within tokens
...
Choose accept-token actions over advance actions if their rule has a higher precedence.
2015-11-01 12:48:27 -08:00
Max Brunsfeld
998ae533da
Make completion_status() a method on LexItem
2015-10-30 16:48:37 -07:00
Max Brunsfeld
a8ead10d6f
In lex error state, don't look for tokens that would match *any* line
2015-10-28 17:45:17 -07:00
Max Brunsfeld
500533476b
Fix bugs in handling multiple simultaneous ambiguities
2015-10-22 11:42:38 -07:00
Max Brunsfeld
1983bcfb60
Fix conflation of finished items w/ different precedence
2015-10-18 12:51:32 -07:00
Max Brunsfeld
8725e96a65
Fix item-set-closure bug w/ empty productions
2015-10-15 23:59:47 -07:00
Max Brunsfeld
3d0253f9b8
Start work on c++ fixture grammar
2015-10-13 11:28:53 -07:00
Max Brunsfeld
9959fe35b0
Allow associativity to be specified in rules w/o precedence
2015-10-13 11:25:28 -07:00
Max Brunsfeld
82726ad53b
Define repeat rule in terms of repeat1 rule
2015-10-12 19:22:05 -07:00
Max Brunsfeld
8ef25ffef3
Try lexing using each parse stack head's state
...
This fixes the case where the parse stack is split and the top states
have different valid lookahead symbols
2015-10-06 16:22:58 -07:00
Max Brunsfeld
624e4651d2
Use GLR for for-in loop conlfict in javascript grammar
2015-10-06 10:50:56 -07:00
Max Brunsfeld
ebc52f109d
Merge branch 'flatten-rules-into-productions'
...
This branch had diverged considerably, so merging it required changing a lot
of code.
Conflicts:
project.gyp
spec/compiler/build_tables/action_takes_precedence_spec.cc
spec/compiler/build_tables/build_conflict_spec.cc
spec/compiler/build_tables/build_parse_table_spec.cc
spec/compiler/build_tables/first_symbols_spec.cc
spec/compiler/build_tables/item_set_closure_spec.cc
spec/compiler/build_tables/item_set_transitions_spec.cc
spec/compiler/build_tables/rule_can_be_blank_spec.cc
spec/compiler/helpers/containers.h
spec/compiler/prepare_grammar/expand_repeats_spec.cc
spec/compiler/prepare_grammar/extract_tokens_spec.cc
src/compiler/build_tables/action_takes_precedence.h
src/compiler/build_tables/build_parse_table.cc
src/compiler/build_tables/first_symbols.cc
src/compiler/build_tables/first_symbols.h
src/compiler/build_tables/item_set_closure.cc
src/compiler/build_tables/item_set_transitions.cc
src/compiler/build_tables/parse_item.cc
src/compiler/build_tables/parse_item.h
src/compiler/build_tables/rule_can_be_blank.cc
src/compiler/build_tables/rule_can_be_blank.h
src/compiler/prepare_grammar/expand_repeats.cc
src/compiler/prepare_grammar/extract_tokens.cc
src/compiler/prepare_grammar/extract_tokens.h
src/compiler/prepare_grammar/prepare_grammar.cc
src/compiler/rules/built_in_symbols.cc
src/compiler/rules/built_in_symbols.h
src/compiler/syntax_grammar.cc
src/compiler/syntax_grammar.h
2015-10-02 23:46:39 -07:00