tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	47918070f6	Add a single-source file way of building the runtime library	2018-11-13 15:36:21 -08:00
Max Brunsfeld	508499bab1	Fix bug where missing token was inserted outside of any included range	2018-09-11 17:41:23 -07:00
Max Brunsfeld	acc937b7d7	Handle input chunks that end within multi-byte characters	2018-08-02 15:43:30 -07:00
Max Brunsfeld	87c992a7f0	Add lexer API for detecting boundaries of included ranges Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-07-17 13:58:26 -07:00
Max Brunsfeld	83f88164aa	Fix end positions of tokens at the end of included ranges Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-07-09 10:23:25 -07:00
Max Brunsfeld	3169620ce4	Fix ranges of tokens at the beginnings of included ranges	2018-07-06 17:08:36 -07:00
Max Brunsfeld	80cab8fd8a	Make the empty chunk 2 bytes long, for UTF16 support	2018-06-25 17:46:23 -07:00
Max Brunsfeld	a6451f9b4f	Add `ts_parser_set_include_ranges` function Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-06-20 13:37:43 -07:00
Max Brunsfeld	d7c1f84d7b	Remove `resume` method, make `parse` resume by default Also, add a `reset` method to explicitly discard an outstanding parse. Co-Authored-By: Ashi Krishnan <queerviolet@github.com>	2018-06-19 15:33:29 -07:00
Max Brunsfeld	b0b3b2e5f3	Consolidate TSInput interface down to one function	2018-06-19 09:34:40 -07:00
Max Brunsfeld	35510a612d	Rename Tree -> Subtree	2018-05-10 15:11:14 -07:00
Max Brunsfeld	facafcd6e4	Pass row/column position to input seek method	2018-02-14 07:31:49 -08:00
Max Brunsfeld	0e69da37a5	Return a character count from the lexer's get_column method	2017-12-20 16:26:38 -08:00
Max Brunsfeld	fcff16cb86	Add get_column method to lexer	2017-12-19 17:54:15 -08:00
Max Brunsfeld	36c2b685b9	Always invalidate old chunk of text when parsing after an edit	2017-10-04 15:09:46 -07:00
Max Brunsfeld	f3977ec213	Always call deserialize on external scanner before scanning Remembering the last token that the external scanner produced is not worth the complexity.	2017-08-29 14:41:55 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Max Brunsfeld	0143bfdad4	Avoid use-after-free of external token states Previously, it was possible for references to external token states to outlive the trees to which those states belonged. Now, instead of storing references to external token states in the Stack and in the Lexer, we store references to the external token trees themselves, and we retain the trees to prevent use-after-free.	2017-06-27 14:54:27 -07:00
Max Brunsfeld	f62ee5a0f3	Fix OOB reads at ends of chunks Signed-off-by: Philip Turnbull <philipturnbull@github.com>	2017-06-23 12:09:16 -07:00
Max Brunsfeld	c66fddd3aa	Add TSInput option to measure columns in bytes not characters	2017-06-15 16:35:34 -07:00
Max Brunsfeld	a98d449d88	Add an option to immediately halt on syntax error	2017-05-01 13:50:49 -07:00
Timothy Clem	91558f0a0e	utf8proc_iterate can set codepoint_ref to -1 and returns negative error	2017-04-27 14:46:36 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	36608180d2	Store external token states in the parse stack	2017-01-08 22:06:05 -08:00
Max Brunsfeld	2fa7b453c8	Restore external scanner's state only after repositioning lexer Also, properly identify the leaf node with the external token state	2016-12-21 13:59:56 -08:00
Max Brunsfeld	0e595346be	Make lexer log output easier to read	2016-12-09 13:33:37 -08:00
Max Brunsfeld	c4fe8ded95	Remove state argument to Lexer advance method	2016-12-05 16:36:34 -08:00
Max Brunsfeld	0f8e130687	Call external scanner functions when lexing	2016-12-02 22:03:48 -08:00
Max Brunsfeld	5332fd3418	Fix build warnings	2016-11-19 20:47:43 -08:00
Max Brunsfeld	535879a2bd	Represent byte, char and tree counts as 32 bit numbers The parser spends the majority of its time allocating and freeing trees and stack nodes. Also, the memory footprint of the AST is a significant concern when using tree-sitter with large files. This library is already unlikely to work very well with source files larger than 4GB, so representing rows, columns, byte lengths and child indices as unsigned 32 bit integers seems like the right choice.	2016-11-14 12:19:13 -08:00
Max Brunsfeld	c9dcb29c6f	Remove the TS prefix from some internal type/function names	2016-11-09 20:59:05 -08:00
Max Brunsfeld	eed54d95e1	Merge branch 'master' into changed-ranges	2016-10-16 21:10:25 -07:00
Max Brunsfeld	e149d94ff5	Remove generated parsers' dependency on runtime.h	2016-10-05 14:02:49 -07:00
Max Brunsfeld	cc62fe0375	Represent Lengths in terms of Points	2016-09-09 21:11:02 -07:00
Max Brunsfeld	38241d466b	Rename .read_fn, .seek_fn -> .read, .seek	2016-09-06 21:39:10 -07:00
Max Brunsfeld	096ac2d4b6	Rename ts_document_set_debugger -> ts_document_set_logger	2016-09-06 17:40:26 -07:00
Max Brunsfeld	e2ca55c918	Avoid unnecessary TSInput calls when resetting lexer within an existing chunk	2016-09-06 10:23:07 -07:00
Max Brunsfeld	4f0c83ba01	Move logic for lexical error handling outside of lexer functions This way, less logic needs to be exposed in parser.h	2016-09-03 23:40:57 -07:00
Max Brunsfeld	1c52c30111	Fix unexpected EOF errors getting lost	2016-09-03 22:46:14 -07:00
Max Brunsfeld	38c144b4a3	Refine logic for deciding when tokens need to be re-lexed * While generating the lex table, note which tokens can match the same string. A token needs to be relexed when it has possible homonyms in the current state. * Also note which tokens can match substrings of each other tokens. A token needs to be relexed when there are viable tokens that could match longer strings in the current state and the next token has been edited. * Remove the logic for marking tokens as fragile on creation. * Store the reusability/non-reusability of symbols off of individual actions and onto the entire entry for the state & symbol.	2016-06-21 07:28:04 -07:00
Max Brunsfeld	1e353381ff	Don't create error node in lexer unless token is completely invalid Before, any syntax error would cause the lexer to create an error leaf node. This could happen even with a valid input, if the parse stack had split and one particular version of the parse stack failed to parse. Now, an error leaf node is only created when the lexer cannot understand part of the input stream at all. When a normal syntax error occurs, the lexer just returns a token that is outside of the expected token set, and the parser handles the unexpected token.	2016-05-26 14:15:10 -07:00
Max Brunsfeld	a3679fbb1f	Distinguish separators from main tokens via a property on transitions It was incorrect to store it as a property on the lexical states themselves	2016-05-19 16:27:25 -07:00
Max Brunsfeld	c96c4a08e6	Use an object pool for stack nodes, to reduce allocations Also, fix some leaks in the case where memory allocation failed during parsing	2016-02-04 11:19:42 -08:00
Max Brunsfeld	3dde0a6f39	Handle allocation failures during parsing	2016-01-19 18:08:01 -08:00
Max Brunsfeld	f2e7058ad9	Support UTF16 directly This makes the API easier to use from javascript	2015-12-28 13:53:22 -08:00
Max Brunsfeld	da1bc038e5	Remove nested options structs in Tree	2015-12-22 14:20:58 -08:00
Max Brunsfeld	2bcd2e4d00	Reuse fragile tokens that came from the current lex state	2015-12-21 16:04:11 -08:00
Max Brunsfeld	d713054d61	Record which tokens are fragile when lexing	2015-12-10 21:05:54 -08:00
Max Brunsfeld	08d50c25ae	clang-format	2015-12-04 20:56:33 -08:00
Max Brunsfeld	d2bf88d5fe	Include rows and columns in TSLength This way, we don't have to have separate 1D and 2D versions for so many values	2015-12-04 20:20:29 -08:00

1 2 3

115 commits