Max Brunsfeld
76d072545d
Include out-of-context states starting with non-terminals
2016-03-02 20:58:39 -08:00
Max Brunsfeld
e0c24e3be6
Remove old error recovery code
2016-03-02 20:58:39 -08:00
Max Brunsfeld
ffcd8b5c49
Generate C code for the in-progress symbols in each parse state
2016-03-02 20:56:05 -08:00
Max Brunsfeld
00d953f507
Generate C code for out-of-context states
2016-03-02 20:56:05 -08:00
Max Brunsfeld
6401a065ae
Use different types for advance and accept-token actions
...
Unlike with parse actions, lexical actions of different types never appear
in the same places in the table
2016-01-22 22:24:11 -07:00
Max Brunsfeld
d4632ab9a9
Make the compile function plain C and take a JSON grammar
2016-01-11 12:33:48 -08:00
Max Brunsfeld
4b04afac5e
Control lexer's error-mode via explicit boolean argument
...
Previously, the lexer would operate in error-mode (ignoring any garbage input
until it found a valid token) if it was invoked in the 'error' state. Now that
the error state is deduped with other lexical states, the lexer might be invoked
in that state even when error-mode is not intended. This adds a third argument
to `ts_lex` that explicitly sets the error-mode.
This bug was unlikely to occur in any real grammars, but it caused the
node-tree-sitter-compiler test suite to fail for some grammars with only one
rule.
2015-12-30 09:43:12 -08:00
Max Brunsfeld
4ad1a666be
clang-format
2015-12-29 21:17:31 -08:00
Max Brunsfeld
939476c947
When removing duplicate lex states, update the error state too
...
Now, instead of being stored as a separate field on the parse table, the error
state is just the first state in the states vector.
2015-12-29 21:02:24 -08:00
Max Brunsfeld
97a281502e
Store parse table more compactly
2015-12-29 11:27:41 -08:00
Max Brunsfeld
1c6ad5f7e4
Rename ubiquitous_tokens -> extra_tokens in compiler API
...
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
c495076adb
Record in parse table which actions can hide splits
...
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
77a94a2929
Use ::count to check if sets and maps contain elements
2015-12-17 10:05:42 -08:00
Max Brunsfeld
66144dc28e
Treat tokens that are sometimes extra as fragile
2015-12-16 20:04:45 -08:00
Max Brunsfeld
d713054d61
Record which tokens are fragile when lexing
2015-12-10 21:05:54 -08:00
Max Brunsfeld
75f31a79a3
Treat reduce actions with different production IDs as distinct
2015-12-10 13:00:26 -08:00
Max Brunsfeld
ad619d95f6
Add 'extra' field to symbol metadata
...
This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
2015-12-02 15:10:24 -08:00
Max Brunsfeld
f08554e958
Replace NodeType enum with SymbolMetadata bitfield
...
This will allow storing other metadata about symbols, like if they
only appear as ubiquitous tokens
2015-12-02 15:10:24 -08:00
Max Brunsfeld
e11515fb74
Escape backslashes and quotes in symbol name strings
2015-11-09 09:33:24 -08:00
Max Brunsfeld
dba0726eef
clang format
2015-10-28 12:10:58 -07:00
Max Brunsfeld
e543661498
Fix typo: LB -> LF
2015-10-26 16:55:59 -07:00
Max Brunsfeld
bad1bff3bb
Sanitize line breaks in symbol names
2015-10-26 13:36:31 -07:00
Max Brunsfeld
1983bcfb60
Fix conflation of finished items w/ different precedence
2015-10-18 12:51:32 -07:00
Max Brunsfeld
ebc52f109d
Merge branch 'flatten-rules-into-productions'
...
This branch had diverged considerably, so merging it required changing a lot
of code.
Conflicts:
project.gyp
spec/compiler/build_tables/action_takes_precedence_spec.cc
spec/compiler/build_tables/build_conflict_spec.cc
spec/compiler/build_tables/build_parse_table_spec.cc
spec/compiler/build_tables/first_symbols_spec.cc
spec/compiler/build_tables/item_set_closure_spec.cc
spec/compiler/build_tables/item_set_transitions_spec.cc
spec/compiler/build_tables/rule_can_be_blank_spec.cc
spec/compiler/helpers/containers.h
spec/compiler/prepare_grammar/expand_repeats_spec.cc
spec/compiler/prepare_grammar/extract_tokens_spec.cc
src/compiler/build_tables/action_takes_precedence.h
src/compiler/build_tables/build_parse_table.cc
src/compiler/build_tables/first_symbols.cc
src/compiler/build_tables/first_symbols.h
src/compiler/build_tables/item_set_closure.cc
src/compiler/build_tables/item_set_transitions.cc
src/compiler/build_tables/parse_item.cc
src/compiler/build_tables/parse_item.h
src/compiler/build_tables/rule_can_be_blank.cc
src/compiler/build_tables/rule_can_be_blank.h
src/compiler/prepare_grammar/expand_repeats.cc
src/compiler/prepare_grammar/extract_tokens.cc
src/compiler/prepare_grammar/extract_tokens.h
src/compiler/prepare_grammar/prepare_grammar.cc
src/compiler/rules/built_in_symbols.cc
src/compiler/rules/built_in_symbols.h
src/compiler/syntax_grammar.cc
src/compiler/syntax_grammar.h
2015-10-02 23:46:39 -07:00
Max Brunsfeld
7ee5eaa16a
Rename node accessor methods
...
Instead of child() vs concrete_child(), next_sibling() vs next_concrete_sibling(), etc,
the default is switched: child() refers to the concrete syntax tree, and named_child()
refers to the AST. Because the AST is abstract through exclusion of some nodes, the
names are clearer if the qualifier goes on the AST operations
2015-09-08 23:16:24 -07:00
Max Brunsfeld
f9316933ad
Refactor logic for marking '_'-prefixed rules as hidden
2015-09-06 16:53:13 -07:00
Max Brunsfeld
9591c88f39
In runtime, distinguish between anonymous and hidden nodes
2015-09-06 00:12:37 -07:00
Max Brunsfeld
5982b77c97
In compiler, distinguish between anonymous tokens and hidden rules
2015-09-05 22:28:55 -07:00
Max Brunsfeld
21258e6a9e
Remove 'document' wrapper node
2015-08-22 10:48:34 -07:00
Max Brunsfeld
c18351772a
Auto-format: no single-line functions
2015-07-31 16:32:24 -07:00
Max Brunsfeld
f9b057f3a9
clang-format everything
2015-07-27 18:29:48 -07:00
Max Brunsfeld
0b1d70db34
Always resolve ambiguities immediately
...
No more ambiguity nodes.
Also, when merging parse stacks, merge their successors if needed.
2015-07-15 13:15:11 -07:00
Max Brunsfeld
f26ddf5187
Fix symbol name for ambiguity nodes
2015-07-08 17:31:21 -07:00
Max Brunsfeld
aabcb10cfb
Respect expected_conflicts field when building parse table
2015-06-28 16:22:31 -05:00
Max Brunsfeld
381f89f8ba
Create ambiguity nodes when joining stack heads
2015-06-18 17:03:16 -07:00
Max Brunsfeld
755894b44d
Allow multiple parse actions in parse table
2015-06-18 17:03:16 -07:00
Max Brunsfeld
44774119bf
Use all caps for built-in symbol names
2015-06-15 15:26:05 -07:00
Max Brunsfeld
2d436cf141
Identify fragile reductions at compile time
2015-02-21 15:11:03 -08:00
Max Brunsfeld
34d96909d1
Move {Syntax,Lexical}Grammar into separate files
2015-01-14 21:10:41 -08:00
Max Brunsfeld
5dc08ccce9
Include names of in-progress rules in shift/reduce conflicts
2014-11-05 18:39:50 -08:00
Max Brunsfeld
fc83322832
Handle more special characters in c symbol names
2014-10-31 18:37:56 -07:00
Max Brunsfeld
4fc960e4ec
Give extracted string/regex tokens more descriptive names
2014-10-31 08:56:58 -07:00
Max Brunsfeld
0daada6921
Tidy up c-code generator
2014-10-12 12:33:45 -07:00
Max Brunsfeld
5b624d37f6
Use consistent iteration in c code generator
2014-10-12 11:57:52 -07:00
Max Brunsfeld
2e7ffb4d14
Tweak auto-format settings
...
Prefer lines that exceed 80 characters by a small margin to
line breaks in argument lists
2014-09-09 13:15:40 -07:00
Max Brunsfeld
1ff7cedf40
Unify ubiquitous tokens and lexical separators in API
2014-09-07 22:16:45 -07:00
Max Brunsfeld
c0a3f8d39c
Remove some macros from public parser header
2014-09-05 23:47:38 -07:00
Max Brunsfeld
545e575508
Revert "Remove the separator characters construct"
...
This reverts commit 5cd07648fd .
The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.
Conflicts:
src/compiler/build_tables/build_lex_table.cc
src/compiler/grammar.cc
src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
5cd07648fd
Remove the separator characters construct
...
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.
For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
346cf4fe5d
Remove LEX_PANIC macro
2014-08-26 13:12:12 -07:00