Commit graph

54 commits

Author SHA1 Message Date
Max Brunsfeld
09b019c530 Fix test for invalid blank input 2016-06-23 09:24:26 -07:00
Max Brunsfeld
c6e9b32d3f Print all the same parse log messages for both debugging methods 2016-06-22 22:36:11 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
2e35587161 Use new stack_pop_until function for repairing errors 2016-03-07 20:06:46 -08:00
Max Brunsfeld
bc8df9f5c5 Avoid recompiling test languages when possible 2016-03-03 12:05:04 -08:00
Max Brunsfeld
aef7582a2a Start using the forward move to recover from errors
Some unit tests passing. Corpus tests still failing
2016-03-02 21:08:42 -08:00
Max Brunsfeld
e7abfdd373 Prevent string assertion failures from creating later memory leak errors 2016-03-02 20:58:39 -08:00
Max Brunsfeld
b80a330a74 Fix assorted memory leaks in test code 2016-02-05 12:23:54 -08:00
Max Brunsfeld
7c44b0e387 Fix leaked lookahead trees in normal parsing 2016-01-29 17:31:43 -08:00
Max Brunsfeld
0f7dbea9a3 Unify test targets, use externally defined languages as fixtures 2016-01-15 11:19:24 -08:00
Max Brunsfeld
f2e7058ad9 Support UTF16 directly
This makes the API easier to use from javascript
2015-12-28 13:53:22 -08:00
Max Brunsfeld
c495076adb Record in parse table which actions can hide splits
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
8a146a9bef Reset lexer correctly when old input was blank 2015-12-03 10:00:39 -08:00
Max Brunsfeld
aba8af9e5b Cleanup debug logging in parser 2015-09-22 19:35:13 -07:00
Max Brunsfeld
f37f73f92f Add ability to edit multiple times between parses 2015-09-18 23:20:06 -07:00
Max Brunsfeld
252fa7b631 Add Document getter methods for input, language 2015-09-08 23:33:43 -07:00
Max Brunsfeld
ebd60213d9 Drop release functions from callback structs
The caller can just as easily take care of the cleanup explicitly
2015-09-08 23:24:33 -07:00
Max Brunsfeld
53926c467e Don't automatically hide singleton nodes 2015-09-02 16:36:29 -07:00
Max Brunsfeld
21258e6a9e Remove 'document' wrapper node 2015-08-22 10:48:34 -07:00
Max Brunsfeld
54e40b8146 Rework AST access API: reduce heap allocation 2015-07-31 15:47:48 -07:00
Max Brunsfeld
766e3bab2c Use 2-space continuation indent consistently in specs 2015-07-27 18:18:58 -07:00
Max Brunsfeld
f7e4445358 Handle goto actions after reductions in a more standard way
Rather than letting the reduced tree become the new lookahead symbol,
and re-adding it to the stack via a subsequent shift action, just
add it to the stack as part of the reduce action. This is more in
line with the way LR is described traditionally.
2015-05-27 10:53:02 -07:00
Max Brunsfeld
8bd11e1b58 Fix parser debug messages in spec 2015-03-07 16:40:39 -08:00
Max Brunsfeld
ec31249fe6 Add commas in expected message in document debugging spec 2014-11-01 17:03:32 -07:00
Max Brunsfeld
de9a48d11f Tweak debugging in parser and lexer 2014-10-22 20:10:08 -07:00
Max Brunsfeld
41a067fef9 Fix build warnings in document spec 2014-10-17 21:27:49 -07:00
Max Brunsfeld
8cf800ef5d Unify debugging API for parsing and lexing 2014-10-17 17:52:54 -07:00
Max Brunsfeld
b5d022a70c Fix missing field warnings for debugger structs 2014-10-14 22:50:24 -07:00
Max Brunsfeld
c594208ab8 Allow callbacks to be specified for debug output 2014-10-13 01:02:18 -07:00
Max Brunsfeld
f05762b4a0 Move parser tests into their own file 2014-09-10 18:49:53 -07:00
Max Brunsfeld
1ff7cedf40 Unify ubiquitous tokens and lexical separators in API 2014-09-07 22:16:45 -07:00
Max Brunsfeld
ed11ef557a Fix expansion of repeat rules into recursive rules
Previously, the way repeat rules were expanded, the auxiliary
rule always needed to be reduced, even if the repeating content
was empty. This caused problems in parse states where some items
contained the repeat rule and some did not. To make those cases
work, the repeat rule had to explicitly be marked as optional.
With this change, that is no longer necessary.
2014-09-07 09:39:14 -07:00
Max Brunsfeld
43ecac2a1d Expose debug flag on document 2014-09-06 17:56:00 -07:00
Max Brunsfeld
3dea1261a6 Clean up document specs for incremental parsing 2014-09-03 18:48:10 -07:00
Max Brunsfeld
c72445d808 Fix inc parsing for nodes containing ubiq tokens 2014-09-03 13:17:06 -07:00
Max Brunsfeld
ad52bdc448 Fix inc parsing when appending to end of a token 2014-09-03 07:09:15 -07:00
Max Brunsfeld
77529ace3d Fix infinite loop in certain cases w/ unterminated tokens 2014-09-03 00:38:44 -07:00
Max Brunsfeld
545e575508 Revert "Remove the separator characters construct"
This reverts commit 5cd07648fd.

The separators construct is useful as an optimization. It turns out that
constructing a node for every chunk of whitespace in a document causes a
significant performance regression.

Conflicts:
	src/compiler/build_tables/build_lex_table.cc
	src/compiler/grammar.cc
	src/runtime/parser.c
2014-09-02 08:03:51 -07:00
Max Brunsfeld
e941f8c175 Fix error in document editing
When breaking down the stack in parser.c, the previous code
would not account for ubiquitous tokens. This was a problem
for a long time, but wasn't noticed until ubiquitous tokens
started being used to represent separator characters
2014-09-01 21:32:29 -07:00
Max Brunsfeld
5cd07648fd Remove the separator characters construct
Now, grammars can handle whitespace by making it another ubiquitous
token, like comments.

For now, this has the side effect of whitespace being included in the
tree that precedes it. This was already an issue for other ubiquitous
tokens though, so it needs to be fixed anyway.
2014-09-01 20:19:43 -07:00
Max Brunsfeld
2985a98150 Build error nodes in lexer again, not in parser 2014-08-31 16:59:01 -07:00
Max Brunsfeld
85d8c9df5c Handle multiple ubiquitous in a row 2014-08-31 12:11:16 -07:00
Max Brunsfeld
a75686b017 Fix double release calls in document spec 2014-08-31 00:46:09 -07:00
Max Brunsfeld
c5ac02c571 Fix size calculation for error nodes 2014-08-29 13:22:03 -07:00
Max Brunsfeld
1e79ed794b Allow multiple top-level nodes
Now, the root node of a document is always a document node.
It will often have only one child node which corresponds to the grammar's
start symbol, but not always. Currently, it may have more than one child
if there are ubiquitous tokens such as comments at the beginning of the
document. In the future, it will also be possible be possible to have multiple
for the document to have multiple children if the document is partially parsed.
2014-08-09 00:00:20 -07:00
Max Brunsfeld
8da9219c3a Remove redundant functions for Documents
There's no need for a `string` function since one already
exists for Nodes.

Now the root node is always stored on the document. This
means callers of `ts_document_root_node` don't need to release
its return value.
2014-08-08 12:58:51 -07:00
Max Brunsfeld
41d26aaceb Fix segfault when document's input is set before its language 2014-08-01 12:34:49 -07:00
Max Brunsfeld
0d6d09cbd9 In generated parsers, export language as a function 2014-07-31 13:04:46 -07:00
Max Brunsfeld
eecbcccee0 Remove generated parsers' dependency on the runtime library
Generated parsers no longer export a parser constructor function.
They now export an opaque Language object which can be set on
Documents directly. This way, the logic for constructing parsers
lives entirely in the runtime. The Languages are just structs which
have no load-time dependency on the runtime
2014-07-30 23:40:02 -07:00
Max Brunsfeld
1ecafb874e Add functions to retrieve nodes' siblings and parents 2014-07-18 13:24:03 -07:00