Commit graph

242 commits

Author SHA1 Message Date
Max Brunsfeld
94721c7ec0 Rewind and re-tokenize in error mode after detecting an error 2016-06-17 21:26:03 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
31cc6e6f9c Remove unused InProgressSymbolEntry typedef 2016-05-16 12:46:29 -07:00
Max Brunsfeld
22c550c9d6 Discard tokens after error detection to find the best repair
* Use GLR stack-splitting to try all numbers of tokens to
  discard until a repair is found.
* Check the validity of repairs by looking at the child trees,
  rather than the statically-computed 'in-progress symbols' list
2016-05-11 13:49:43 -07:00
Max Brunsfeld
9d247e45b2 Deemphasize extra trees in stack debugging graphs 2016-05-01 15:24:50 -07:00
Max Brunsfeld
9ad1e36238 Rename out_of_context_states -> recovery_states 2016-04-27 14:14:56 -07:00
Max Brunsfeld
f63fcffe95 Fix incorrect cast in ts_language_symbol_is_in_progress 2016-04-18 11:17:07 -07:00
Max Brunsfeld
e0c24e3be6 Remove old error recovery code 2016-03-02 20:58:39 -08:00
Max Brunsfeld
c8d7c16f87 Use out-of-context states when in error parse state 2016-03-02 20:56:05 -08:00
Max Brunsfeld
9b2e775b79 Store out-of-context states in the language struct 2016-03-02 20:56:05 -08:00
Max Brunsfeld
ffcd8b5c49 Generate C code for the in-progress symbols in each parse state 2016-03-02 20:56:05 -08:00
Max Brunsfeld
abbc282950 Add a public function for printing debugging graphs 2016-02-23 11:16:50 -08:00
Max Brunsfeld
2b35890bbb Add ts_node_symbols() function 2016-02-19 15:41:30 -08:00
Max Brunsfeld
b80a330a74 Fix assorted memory leaks in test code 2016-02-05 12:23:54 -08:00
Max Brunsfeld
3dde0a6f39 Handle allocation failures during parsing 2016-01-19 18:08:01 -08:00
Max Brunsfeld
9d0835edbf Return non-const string from ts_node_string
The caller should free the string.
2016-01-18 10:27:23 -08:00
Max Brunsfeld
d4632ab9a9 Make the compile function plain C and take a JSON grammar 2016-01-11 12:33:48 -08:00
Max Brunsfeld
b69e19c525 Add plain C API for compiling a JSON grammar 2016-01-10 13:44:22 -08:00
Max Brunsfeld
36870bfced Make Grammar a simple struct 2016-01-08 15:51:30 -08:00
Max Brunsfeld
4b04afac5e Control lexer's error-mode via explicit boolean argument
Previously, the lexer would operate in error-mode (ignoring any garbage input
until it found a valid token) if it was invoked in the 'error' state. Now that
the error state is deduped with other lexical states, the lexer might be invoked
in that state even when error-mode is not intended. This adds a third argument
to `ts_lex` that explicitly sets the error-mode.

This bug was unlikely to occur in any real grammars, but it caused the
node-tree-sitter-compiler test suite to fail for some grammars with only one
rule.
2015-12-30 09:43:12 -08:00
Max Brunsfeld
4ad1a666be clang-format 2015-12-29 21:17:31 -08:00
Max Brunsfeld
97a281502e Store parse table more compactly 2015-12-29 11:27:41 -08:00
Max Brunsfeld
f2e7058ad9 Support UTF16 directly
This makes the API easier to use from javascript
2015-12-28 13:53:22 -08:00
Max Brunsfeld
2bcd2e4d00 Reuse fragile tokens that came from the current lex state 2015-12-21 16:04:11 -08:00
Max Brunsfeld
1c6ad5f7e4 Rename ubiquitous_tokens -> extra_tokens in compiler API
They were already called this in the runtime code.
'Extra' is just easier to say.
2015-12-17 15:50:50 -08:00
Max Brunsfeld
c495076adb Record in parse table which actions can hide splits
Suppose a parse state S has multiple actions for a terminal lookahead symbol A.
Then during incremental parsing, while in state S, the parser should not
reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B
might prematurely discard one of the possible actions that a batch parser
would have attempted in state S, upon seeing A as a lookahead.
2015-12-17 13:11:56 -08:00
Max Brunsfeld
66144dc28e Treat tokens that are sometimes extra as fragile 2015-12-16 20:04:45 -08:00
Max Brunsfeld
d713054d61 Record which tokens are fragile when lexing 2015-12-10 21:05:54 -08:00
Max Brunsfeld
d2bf88d5fe Include rows and columns in TSLength
This way, we don't have to have separate 1D and 2D versions for so many values
2015-12-04 20:20:29 -08:00
Max Brunsfeld
22c76fc71b Remove TSLength from runtime header
Refactor node functions now that character offset and byte offset are stored separately
2015-12-04 10:45:30 -08:00
Max Brunsfeld
8e217f758c Use individual args instead of TSLength in input seek function 2015-12-03 23:06:01 -08:00
Max Brunsfeld
b3a6de6dad Replace node pos/size functions with start/end char/byte functions 2015-12-03 22:59:27 -08:00
Max Brunsfeld
ad619d95f6 Add 'extra' field to symbol metadata
This stores whether a symbol is only ever used as a ubiquitous token. This will
allow ubiquitous nodes to be reused more effectively: if they are always
ubiquitous, then they can be reused immediately, and otherwise, they must be
broken down in case they need to be used structurally.
2015-12-02 15:10:24 -08:00
Max Brunsfeld
f08554e958 Replace NodeType enum with SymbolMetadata bitfield
This will allow storing other metadata about symbols, like if they
only appear as ubiquitous tokens
2015-12-02 15:10:24 -08:00
joshvera
7633cbb836 indentation 2015-11-30 12:59:23 -05:00
joshvera
3d9a44d880 Calculate the column and offset separately in TSNode 2015-11-25 13:36:19 -05:00
joshvera
1687d66776 expose ts_node start and end point functions 2015-11-25 11:57:39 -05:00
joshvera
8446b657f0 Rename position to offset and point to offset_point 2015-11-18 17:53:38 -08:00
joshvera
cf72f2f0ae ts_node_size_point 2015-11-18 16:47:27 -08:00
joshvera
6485e27d70 add ts_node_end_point 2015-11-18 16:45:10 -08:00
joshvera
b0f6bac3ab replace start and end with padding and size 2015-11-18 16:34:50 -08:00
joshvera
e720922662 Add source info to TSLexer 2015-11-12 12:24:05 -05:00
Rob Rix
3da510d53b Add a prototype for the symbol count. 2015-10-29 12:44:28 -04:00
Rob Rix
a176baa26f Pass a language, rather than a document. 2015-10-29 12:41:21 -04:00
Rob Rix
09162d1981 Export the symbol over ts_language_…. 2015-10-29 12:40:01 -04:00
Rob Rix
63f1a618b4 Add a prototype for getting the name for a symbol from a document. 2015-10-28 17:12:43 -04:00
Max Brunsfeld
0955f660d0 Add ts_document_invalidate, for forcing a full re-parse 2015-10-18 13:05:40 -07:00
Max Brunsfeld
9959fe35b0 Allow associativity to be specified in rules w/o precedence 2015-10-13 11:25:28 -07:00
Max Brunsfeld
82726ad53b Define repeat rule in terms of repeat1 rule 2015-10-12 19:22:05 -07:00