Commit graph

25 commits

Author SHA1 Message Date
Max Brunsfeld
508499bab1 Fix bug where missing token was inserted outside of any included range 2018-09-11 17:41:23 -07:00
Max Brunsfeld
a6451f9b4f Add ts_parser_set_include_ranges function
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-20 13:37:43 -07:00
Max Brunsfeld
3c01382b95 Avoid warnings about repeated typedefs 2018-05-17 17:59:50 -07:00
Max Brunsfeld
35510a612d Rename Tree -> Subtree 2018-05-10 15:11:14 -07:00
Max Brunsfeld
f3977ec213 Always call deserialize on external scanner before scanning
Remembering the last token that the external scanner produced is
not worth the complexity.
2017-08-29 14:41:55 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
36608180d2 Store external token states in the parse stack 2017-01-08 22:06:05 -08:00
Max Brunsfeld
2fa7b453c8 Restore external scanner's state only after repositioning lexer
Also, properly identify the leaf node with the external token state
2016-12-21 13:59:56 -08:00
Max Brunsfeld
0f8e130687 Call external scanner functions when lexing 2016-12-02 22:03:48 -08:00
Max Brunsfeld
535879a2bd Represent byte, char and tree counts as 32 bit numbers
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
c9dcb29c6f Remove the TS prefix from some internal type/function names 2016-11-09 20:59:05 -08:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
4f0c83ba01 Move logic for lexical error handling outside of lexer functions
This way, less logic needs to be exposed in parser.h
2016-09-03 23:40:57 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
c96c4a08e6 Use an object pool for stack nodes, to reduce allocations
Also, fix some leaks in the case where memory allocation failed during parsing
2016-02-04 11:19:42 -08:00
Max Brunsfeld
d2bf88d5fe Include rows and columns in TSLength
This way, we don't have to have separate 1D and 2D versions for so many values
2015-12-04 20:20:29 -08:00
Max Brunsfeld
8a146a9bef Reset lexer correctly when old input was blank 2015-12-03 10:00:39 -08:00
joshvera
b0f6bac3ab replace start and end with padding and size 2015-11-18 16:34:50 -08:00
Max Brunsfeld
af7f57a80e Fix sizing of error nodes after edits 2014-10-05 16:56:50 -07:00
Max Brunsfeld
e23f11b7c4 Allow lexical debug mode to be enabled on documents
- `ts_document_set_debug(doc, 1)` implies parse debug mode
- `ts_document_set_debug(doc, > 1)` implies parse and lex debug mode
2014-09-11 13:12:06 -07:00
Max Brunsfeld
eecbcccee0 Remove generated parsers' dependency on the runtime library
Generated parsers no longer export a parser constructor function.
They now export an opaque Language object which can be set on
Documents directly. This way, the logic for constructing parsers
lives entirely in the runtime. The Languages are just structs which
have no load-time dependency on the runtime
2014-07-30 23:40:02 -07:00