tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	2b3da512a4	Add serialize, deserialize and reset callbacks to external scanners Signed-off-by: Nathan Sobo <nathan@github.com>	2016-12-20 13:12:01 -08:00
Max Brunsfeld	a1770ce844	Allow external tokens to be used as extras	2016-12-12 22:06:01 -08:00
Max Brunsfeld	10b51a05a1	Allow external scanners to refer to (and return) internally-defined tokens Tokens that are defined in the grammar's rules may now be included in the externals list also, so that external scanners can check if they are valid lookaheads or not, and if so, can return them to the parser if needed.	2016-12-09 13:32:58 -08:00
Max Brunsfeld	c4fe8ded95	Remove state argument to Lexer advance method	2016-12-05 16:36:34 -08:00
Max Brunsfeld	0f8e130687	Call external scanner functions when lexing	2016-12-02 22:03:48 -08:00
Max Brunsfeld	c966af0412	Start work on external tokens	2016-12-02 16:24:19 -08:00
Max Brunsfeld	d627042fa6	Fix javascript error test A single line with two function declarations now parses successfully, so to create the desired error recovery scenario, wrap the two functions in an assignment	2016-11-30 23:19:34 -08:00
Max Brunsfeld	fad7294ba4	Store shift states for non-terminals directly in the main parse table	2016-11-14 08:36:06 -08:00
Max Brunsfeld	b76574e01c	Handle ambiguities between extra and non-extra tokens using normal GLR splitting	2016-09-06 10:22:16 -07:00
Max Brunsfeld	c1b6d9f5be	Improve error comparison criteria Signed-off-by: Nathan Sobo <nathan@github.com>	2016-09-01 11:39:23 -07:00
Max Brunsfeld	0faae52132	Fix some inconsistencies in error cost calculation Signed-off-by: Nathan Sobo <nathan@github.com>	2016-08-31 10:51:59 -07:00
Max Brunsfeld	1d617ab5e0	Allow reductions based on error token, skipping some preceding content	2016-08-29 17:34:51 -07:00
Max Brunsfeld	31d1160e21	Base error costs on top-level trees skipped and lines of text skipped Rather than on the total number of tokens skipped	2016-08-29 17:06:23 -07:00
Max Brunsfeld	e947d7e2ad	Adjust test assertions for subtly different recoveries	2016-08-29 11:23:52 -07:00
Max Brunsfeld	1b8843dd41	Perform all possible reductions recursively upon detecting an error	2016-08-29 11:23:35 -07:00
Max Brunsfeld	9538b5b879	Don't count extra trees toward stack versions' error costs	2016-06-26 22:46:40 -07:00
Max Brunsfeld	9972709e43	Allow error recovery to skip non-terminal nodes after error detection	2016-06-24 10:28:05 -07:00
Max Brunsfeld	94721c7ec0	Rewind and re-tokenize in error mode after detecting an error	2016-06-17 21:26:03 -07:00
Max Brunsfeld	e70547cd11	Allow recoveries that skip leading children of invisible trees Before this, errors could only be recovered by skipping internal children.	2016-06-14 14:48:35 -07:00
Max Brunsfeld	9b67b21dcd	Fix an outdated error corpus entry	2016-06-02 14:04:10 -07:00
Max Brunsfeld	e1a3a1daeb	Import error corpus entries from grammar repos Now that error recovery requires no input for the grammar author, it shouldn't be tested in the individual grammar repos.	2016-05-28 20:12:02 -07:00
Max Brunsfeld	0f7dbea9a3	Unify test targets, use externally defined languages as fixtures	2016-01-15 11:19:24 -08:00
Max Brunsfeld	ad4089a4bf	Move anonymous tokens grammar into integration spec	2016-01-14 10:35:03 -08:00
Max Brunsfeld	49f393b75e	Merge pull request #22 from maxbrunsfeld/c-compiler-api Simplify the compiler API	2016-01-13 21:08:41 -08:00
Max Brunsfeld	d4632ab9a9	Make the compile function plain C and take a JSON grammar	2016-01-11 12:33:48 -08:00
Max Brunsfeld	36870bfced	Make Grammar a simple struct	2016-01-08 15:51:30 -08:00
Max Brunsfeld	e59f6294cb	Fix bug in lexical state de-duping	2015-12-30 11:15:36 -08:00
Max Brunsfeld	4b04afac5e	Control lexer's error-mode via explicit boolean argument Previously, the lexer would operate in error-mode (ignoring any garbage input until it found a valid token) if it was invoked in the 'error' state. Now that the error state is deduped with other lexical states, the lexer might be invoked in that state even when error-mode is not intended. This adds a third argument to `ts_lex` that explicitly sets the error-mode. This bug was unlikely to occur in any real grammars, but it caused the node-tree-sitter-compiler test suite to fail for some grammars with only one rule.	2015-12-30 09:43:12 -08:00
Max Brunsfeld	939476c947	When removing duplicate lex states, update the error state too Now, instead of being stored as a separate field on the parse table, the error state is just the first state in the states vector.	2015-12-29 21:02:24 -08:00
Max Brunsfeld	97a281502e	Store parse table more compactly	2015-12-29 11:27:41 -08:00
Max Brunsfeld	386b124866	Ensure that there are no duplicate lex states	2015-12-20 15:46:13 -08:00
Max Brunsfeld	c9db5499e9	Remove uninteresting corpus entries	2015-12-18 13:46:24 -08:00
Max Brunsfeld	66460b24fd	Use more greek letters in arithmetic corpus	2015-12-18 13:46:10 -08:00
Max Brunsfeld	1c6ad5f7e4	Rename ubiquitous_tokens -> extra_tokens in compiler API They were already called this in the runtime code. 'Extra' is just easier to say.	2015-12-17 15:50:50 -08:00
Max Brunsfeld	c495076adb	Record in parse table which actions can hide splits Suppose a parse state S has multiple actions for a terminal lookahead symbol A. Then during incremental parsing, while in state S, the parser should not reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B might prematurely discard one of the possible actions that a batch parser would have attempted in state S, upon seeing A as a lookahead.	2015-12-17 13:11:56 -08:00
Max Brunsfeld	66144dc28e	Treat tokens that are sometimes extra as fragile	2015-12-16 20:04:45 -08:00
Max Brunsfeld	9bff4d0b06	Add concise method syntax to javascript fixture grammar This exposes an ambiguity handling bug that I discovered while adding ES6 support to tree-sitter-javascript	2015-12-15 22:25:48 -08:00
Max Brunsfeld	d713054d61	Record which tokens are fragile when lexing	2015-12-10 21:05:54 -08:00
Max Brunsfeld	75f31a79a3	Treat reduce actions with different production IDs as distinct	2015-12-10 13:00:26 -08:00
Max Brunsfeld	76e4599d5e	For now, allow any expression as an assignment LHS	2015-12-06 14:14:17 -08:00
Max Brunsfeld	863cabc827	Don't include trailing ubiquitous tokens as children when reducing	2015-12-02 15:31:15 -08:00
Max Brunsfeld	64e56f5acc	Add assignments to C grammar This creates another source of ambiguity: assignments vs initializations for declarations. This is good for testing ambiguity handling	2015-12-02 15:10:24 -08:00
Max Brunsfeld	ad619d95f6	Add 'extra' field to symbol metadata This stores whether a symbol is only ever used as a ubiquitous token. This will allow ubiquitous nodes to be reused more effectively: if they are always ubiquitous, then they can be reused immediately, and otherwise, they must be broken down in case they need to be used structurally.	2015-12-02 15:10:24 -08:00
Max Brunsfeld	f08554e958	Replace NodeType enum with SymbolMetadata bitfield This will allow storing other metadata about symbols, like if they only appear as ubiquitous tokens	2015-12-02 15:10:24 -08:00
Max Brunsfeld	40a90b551a	Allow error recovery to look all the way to the bottom of the stack Previously, there was a bug where the first node on the stack would never be popped	2015-11-11 16:59:41 -08:00
Max Brunsfeld	1a5d5b3156	Make ambiguities resolve deterministically In the future, they should resolve according to some kind of dynamic precedence annotations provided in the grammars. For now, this at least makes them fully deterministic, so that tests won't fail due to ambiguities resolving differently after undone edits.	2015-11-11 16:54:03 -08:00
Max Brunsfeld	e11515fb74	Escape backslashes and quotes in symbol name strings	2015-11-09 09:33:24 -08:00
Max Brunsfeld	d5ce268074	Fix handling of changing precedence within lexical rules. A precedence annotation wrapping a sequence of characters now only affects how tightly those characters bind to each other, not how tightly they bind to the preceding character. This bug surfaced because a generated lexer was failing to recognize a '\n' character as a token, instead treating it as ubiquitous whitespace. It made this error because, even though anonymous ubiquitous tokens have the lowest precedence, the character immediately after the '\n' was part of a normal token, which had normal precedence (0). Advancing into that following token was incorrectly prioritized above accepting the line-break token.	2015-11-08 13:36:15 -08:00
Max Brunsfeld	30b6530fd1	Account for parse stack merges when shifting Previously, when the parse stack was split into 3 or more heads, it was possible for head 3 to be accidentally skipped if head 2 merged with head 1.	2015-11-05 21:21:18 -08:00
Max Brunsfeld	a0eca388e8	Make fixture C grammar a subset of tree-sitter-c	2015-11-05 21:19:22 -08:00

1 2 3

149 commits