tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	c0cc35ff07	Create separate lexer function for keywords	2018-03-07 12:00:26 -08:00
Max Brunsfeld	52087de4f0	Remove the concept of fragile reductions They were a vestige of when Tree-sitter did sentential form-based incremental parsing (as opposed to simply state matching). This was elegant but not compatible with GLR as far as I could tell.	2018-03-02 14:51:54 -08:00
Max Brunsfeld	8c29841adf	Represent repetitions with associative structure	2018-02-12 11:41:56 -08:00
Max Brunsfeld	fcff16cb86	Add get_column method to lexer	2017-12-19 17:54:15 -08:00
Max Brunsfeld	b0fdc33f73	Remove 'extra' and 'structural' booleans from symbol metadata	2017-09-14 12:07:46 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	e6b43700b9	Get generated parsers compiling and loading properly on windows	2017-08-08 16:47:51 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	1df41a9107	Avoid anonymous struct to silence gcc's override-init warning (again)	2017-07-21 10:17:54 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Max Brunsfeld	1a195d44bb	Whoops, dynamic precedence needs a sign	2017-07-14 11:06:16 -07:00
Max Brunsfeld	b3a72954ff	Introduce RENAME rule type	2017-07-13 17:17:22 -07:00
Max Brunsfeld	d8e9d04fe7	Add PREC_DYNAMIC rule for resolving runtime ambiguities	2017-07-06 15:24:45 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	d853b6504d	Add version number to TSLanguage structs	2017-01-31 10:21:47 -08:00
Max Brunsfeld	3706678b89	Pass const TSExternalTokenState to external scanner deserialize hook	2016-12-21 13:58:18 -08:00
Max Brunsfeld	34a65f588d	Tweak naming and organization of external-scanner related language fields	2016-12-21 11:24:41 -08:00
Max Brunsfeld	2b3da512a4	Add serialize, deserialize and reset callbacks to external scanners Signed-off-by: Nathan Sobo <nathan@github.com>	2016-12-20 13:12:01 -08:00
Max Brunsfeld	c4fe8ded95	Remove state argument to Lexer advance method	2016-12-05 16:36:34 -08:00
Max Brunsfeld	0f8e130687	Call external scanner functions when lexing	2016-12-02 22:03:48 -08:00
Max Brunsfeld	c966af0412	Start work on external tokens	2016-12-02 16:24:19 -08:00
Max Brunsfeld	535879a2bd	Represent byte, char and tree counts as 32 bit numbers The parser spends the majority of its time allocating and freeing trees and stack nodes. Also, the memory footprint of the AST is a significant concern when using tree-sitter with large files. This library is already unlikely to work very well with source files larger than 4GB, so representing rows, columns, byte lengths and child indices as unsigned 32 bit integers seems like the right choice.	2016-11-14 12:19:13 -08:00
Max Brunsfeld	fad7294ba4	Store shift states for non-terminals directly in the main parse table	2016-11-14 08:36:06 -08:00
Max Brunsfeld	e53beb66c9	Avoid anonymous nested struct to silence override-init warnings	2016-10-26 11:10:56 -07:00
Max Brunsfeld	e149d94ff5	Remove generated parsers' dependency on runtime.h	2016-10-05 14:02:49 -07:00
Max Brunsfeld	096ac2d4b6	Rename ts_document_set_debugger -> ts_document_set_logger	2016-09-06 17:40:26 -07:00
Max Brunsfeld	b76574e01c	Handle ambiguities between extra and non-extra tokens using normal GLR splitting	2016-09-06 10:22:16 -07:00
Max Brunsfeld	4f0c83ba01	Move logic for lexical error handling outside of lexer functions This way, less logic needs to be exposed in parser.h	2016-09-03 23:40:57 -07:00
Max Brunsfeld	1c52c30111	Fix unexpected EOF errors getting lost	2016-09-03 22:46:14 -07:00
Max Brunsfeld	8c26d99353	Store error recovery actions in the normal parse table	2016-06-27 14:07:47 -07:00
Max Brunsfeld	43ae8235fd	Remove the error action; a lack of actions implies an error.	2016-06-21 22:53:48 -07:00
Max Brunsfeld	6a7a5cfc3f	Remove nesting in parse action struct	2016-06-21 21:36:33 -07:00
Max Brunsfeld	38c144b4a3	Refine logic for deciding when tokens need to be re-lexed * While generating the lex table, note which tokens can match the same string. A token needs to be relexed when it has possible homonyms in the current state. * Also note which tokens can match substrings of each other tokens. A token needs to be relexed when there are viable tokens that could match longer strings in the current state and the next token has been edited. * Remove the logic for marking tokens as fragile on creation. * Store the reusability/non-reusability of symbols off of individual actions and onto the entire entry for the state & symbol.	2016-06-21 07:28:04 -07:00
Max Brunsfeld	45f7cee0c8	Handle extra tokens properly during error recovery	2016-06-18 20:46:25 -07:00
Max Brunsfeld	94721c7ec0	Rewind and re-tokenize in error mode after detecting an error	2016-06-17 21:26:03 -07:00
Max Brunsfeld	1e353381ff	Don't create error node in lexer unless token is completely invalid Before, any syntax error would cause the lexer to create an error leaf node. This could happen even with a valid input, if the parse stack had split and one particular version of the parse stack failed to parse. Now, an error leaf node is only created when the lexer cannot understand part of the input stream at all. When a normal syntax error occurs, the lexer just returns a token that is outside of the expected token set, and the parser handles the unexpected token.	2016-05-26 14:15:10 -07:00
Max Brunsfeld	a3679fbb1f	Distinguish separators from main tokens via a property on transitions It was incorrect to store it as a property on the lexical states themselves	2016-05-19 16:27:25 -07:00
Max Brunsfeld	31cc6e6f9c	Remove unused InProgressSymbolEntry typedef	2016-05-16 12:46:29 -07:00
Max Brunsfeld	22c550c9d6	Discard tokens after error detection to find the best repair * Use GLR stack-splitting to try all numbers of tokens to discard until a repair is found. * Check the validity of repairs by looking at the child trees, rather than the statically-computed 'in-progress symbols' list	2016-05-11 13:49:43 -07:00
Max Brunsfeld	9d247e45b2	Deemphasize extra trees in stack debugging graphs	2016-05-01 15:24:50 -07:00
Max Brunsfeld	9ad1e36238	Rename out_of_context_states -> recovery_states	2016-04-27 14:14:56 -07:00
Max Brunsfeld	f63fcffe95	Fix incorrect cast in ts_language_symbol_is_in_progress	2016-04-18 11:17:07 -07:00
Max Brunsfeld	e0c24e3be6	Remove old error recovery code	2016-03-02 20:58:39 -08:00
Max Brunsfeld	c8d7c16f87	Use out-of-context states when in error parse state	2016-03-02 20:56:05 -08:00
Max Brunsfeld	9b2e775b79	Store out-of-context states in the language struct	2016-03-02 20:56:05 -08:00
Max Brunsfeld	ffcd8b5c49	Generate C code for the in-progress symbols in each parse state	2016-03-02 20:56:05 -08:00
Max Brunsfeld	d4632ab9a9	Make the compile function plain C and take a JSON grammar	2016-01-11 12:33:48 -08:00
Max Brunsfeld	4b04afac5e	Control lexer's error-mode via explicit boolean argument Previously, the lexer would operate in error-mode (ignoring any garbage input until it found a valid token) if it was invoked in the 'error' state. Now that the error state is deduped with other lexical states, the lexer might be invoked in that state even when error-mode is not intended. This adds a third argument to `ts_lex` that explicitly sets the error-mode. This bug was unlikely to occur in any real grammars, but it caused the node-tree-sitter-compiler test suite to fail for some grammars with only one rule.	2015-12-30 09:43:12 -08:00
Max Brunsfeld	4ad1a666be	clang-format	2015-12-29 21:17:31 -08:00

1 2 3 4

171 commits