tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	94721c7ec0	Rewind and re-tokenize in error mode after detecting an error	2016-06-17 21:26:03 -07:00
Max Brunsfeld	1e353381ff	Don't create error node in lexer unless token is completely invalid Before, any syntax error would cause the lexer to create an error leaf node. This could happen even with a valid input, if the parse stack had split and one particular version of the parse stack failed to parse. Now, an error leaf node is only created when the lexer cannot understand part of the input stream at all. When a normal syntax error occurs, the lexer just returns a token that is outside of the expected token set, and the parser handles the unexpected token.	2016-05-26 14:15:10 -07:00
Max Brunsfeld	a3679fbb1f	Distinguish separators from main tokens via a property on transitions It was incorrect to store it as a property on the lexical states themselves	2016-05-19 16:27:25 -07:00
Max Brunsfeld	31cc6e6f9c	Remove unused InProgressSymbolEntry typedef	2016-05-16 12:46:29 -07:00
Max Brunsfeld	22c550c9d6	Discard tokens after error detection to find the best repair * Use GLR stack-splitting to try all numbers of tokens to discard until a repair is found. * Check the validity of repairs by looking at the child trees, rather than the statically-computed 'in-progress symbols' list	2016-05-11 13:49:43 -07:00
Max Brunsfeld	9d247e45b2	Deemphasize extra trees in stack debugging graphs	2016-05-01 15:24:50 -07:00
Max Brunsfeld	9ad1e36238	Rename out_of_context_states -> recovery_states	2016-04-27 14:14:56 -07:00
Max Brunsfeld	f63fcffe95	Fix incorrect cast in ts_language_symbol_is_in_progress	2016-04-18 11:17:07 -07:00
Max Brunsfeld	e0c24e3be6	Remove old error recovery code	2016-03-02 20:58:39 -08:00
Max Brunsfeld	c8d7c16f87	Use out-of-context states when in error parse state	2016-03-02 20:56:05 -08:00
Max Brunsfeld	9b2e775b79	Store out-of-context states in the language struct	2016-03-02 20:56:05 -08:00
Max Brunsfeld	ffcd8b5c49	Generate C code for the in-progress symbols in each parse state	2016-03-02 20:56:05 -08:00
Max Brunsfeld	d4632ab9a9	Make the compile function plain C and take a JSON grammar	2016-01-11 12:33:48 -08:00
Max Brunsfeld	4b04afac5e	Control lexer's error-mode via explicit boolean argument Previously, the lexer would operate in error-mode (ignoring any garbage input until it found a valid token) if it was invoked in the 'error' state. Now that the error state is deduped with other lexical states, the lexer might be invoked in that state even when error-mode is not intended. This adds a third argument to `ts_lex` that explicitly sets the error-mode. This bug was unlikely to occur in any real grammars, but it caused the node-tree-sitter-compiler test suite to fail for some grammars with only one rule.	2015-12-30 09:43:12 -08:00
Max Brunsfeld	4ad1a666be	clang-format	2015-12-29 21:17:31 -08:00
Max Brunsfeld	97a281502e	Store parse table more compactly	2015-12-29 11:27:41 -08:00
Max Brunsfeld	2bcd2e4d00	Reuse fragile tokens that came from the current lex state	2015-12-21 16:04:11 -08:00
Max Brunsfeld	c495076adb	Record in parse table which actions can hide splits Suppose a parse state S has multiple actions for a terminal lookahead symbol A. Then during incremental parsing, while in state S, the parser should not reuse a non-terminal lookahead B where FIRST(B) contains A, because reusing B might prematurely discard one of the possible actions that a batch parser would have attempted in state S, upon seeing A as a lookahead.	2015-12-17 13:11:56 -08:00
Max Brunsfeld	66144dc28e	Treat tokens that are sometimes extra as fragile	2015-12-16 20:04:45 -08:00
Max Brunsfeld	d713054d61	Record which tokens are fragile when lexing	2015-12-10 21:05:54 -08:00
Max Brunsfeld	d2bf88d5fe	Include rows and columns in TSLength This way, we don't have to have separate 1D and 2D versions for so many values	2015-12-04 20:20:29 -08:00
Max Brunsfeld	22c76fc71b	Remove TSLength from runtime header Refactor node functions now that character offset and byte offset are stored separately	2015-12-04 10:45:30 -08:00
Max Brunsfeld	ad619d95f6	Add 'extra' field to symbol metadata This stores whether a symbol is only ever used as a ubiquitous token. This will allow ubiquitous nodes to be reused more effectively: if they are always ubiquitous, then they can be reused immediately, and otherwise, they must be broken down in case they need to be used structurally.	2015-12-02 15:10:24 -08:00
Max Brunsfeld	f08554e958	Replace NodeType enum with SymbolMetadata bitfield This will allow storing other metadata about symbols, like if they only appear as ubiquitous tokens	2015-12-02 15:10:24 -08:00
joshvera	b0f6bac3ab	replace start and end with padding and size	2015-11-18 16:34:50 -08:00
joshvera	e720922662	Add source info to TSLexer	2015-11-12 12:24:05 -05:00
Max Brunsfeld	7ee5eaa16a	Rename node accessor methods Instead of child() vs concrete_child(), next_sibling() vs next_concrete_sibling(), etc, the default is switched: child() refers to the concrete syntax tree, and named_child() refers to the AST. Because the AST is abstract through exclusion of some nodes, the names are clearer if the qualifier goes on the AST operations	2015-09-08 23:16:24 -07:00
Max Brunsfeld	c3f3f19ea8	Add concrete_child and concrete_child_count Node methods	2015-09-08 09:53:26 -07:00
Max Brunsfeld	9591c88f39	In runtime, distinguish between anonymous and hidden nodes	2015-09-06 00:12:37 -07:00
Max Brunsfeld	54e40b8146	Rework AST access API: reduce heap allocation	2015-07-31 15:47:48 -07:00
Max Brunsfeld	f9b057f3a9	clang-format everything	2015-07-27 18:29:48 -07:00
Max Brunsfeld	aff8bc3266	Split parse stack when there are multiple parse actions	2015-07-09 23:09:33 -07:00
Max Brunsfeld	755894b44d	Allow multiple parse actions in parse table	2015-06-18 17:03:16 -07:00
Max Brunsfeld	d5ce3a9b5a	lexer: in error mode, continue until token is found	2015-06-15 15:26:05 -07:00
Max Brunsfeld	2d436cf141	Identify fragile reductions at compile time	2015-02-21 15:11:03 -08:00
Max Brunsfeld	8cf800ef5d	Unify debugging API for parsing and lexing	2014-10-17 17:52:54 -07:00
Max Brunsfeld	7498725d7f	Move lexer debugging logic out of public header	2014-10-17 16:20:01 -07:00
Max Brunsfeld	5c600942df	Inline some helper functions for lexer	2014-10-17 15:22:01 -07:00
Max Brunsfeld	c594208ab8	Allow callbacks to be specified for debug output	2014-10-13 01:02:18 -07:00
Max Brunsfeld	6d37877e49	Tweak debugging output	2014-10-05 16:56:29 -07:00
Max Brunsfeld	e5ea4efb0b	Use stdbool.h	2014-10-03 16:06:08 -07:00
Max Brunsfeld	78c5fe8e02	clang-format	2014-10-03 15:44:21 -07:00
Max Brunsfeld	444188cb5f	Display characters > 255 as numbers in debug output	2014-09-27 16:00:27 -07:00
Max Brunsfeld	c1565c1aae	Track AST nodes' sizes in characters as well as bytes The `pos` and `size` functions for Nodes now return TSLength structs, which contain lengths in both characters and bytes. This is important for knowing the number of unicode characters in a Node.	2014-09-26 16:15:07 -07:00
Max Brunsfeld	f2e2102a25	Add missing import of stdint.h	2014-09-13 00:25:12 -07:00
Max Brunsfeld	141cbcfa02	Read unicode characters using utf8proc	2014-09-13 00:24:10 -07:00
Max Brunsfeld	68d6e242ee	Fix parsing of wildcard patterns at the ends of documents - Remove special EOF handling from lexer - Explicitly exclude the EOF character from all-inclusive character sets.	2014-09-11 13:10:23 -07:00
Max Brunsfeld	2e7ffb4d14	Tweak auto-format settings Prefer lines that exceed 80 characters by a small margin to line breaks in argument lists	2014-09-09 13:15:40 -07:00
Max Brunsfeld	c0a3f8d39c	Remove some macros from public parser header	2014-09-05 23:47:38 -07:00
Max Brunsfeld	9c0b5b5571	clang-format	2014-09-03 18:53:38 -07:00

1 2 3

136 commits