Max Brunsfeld
47918070f6
Add a single-source file way of building the runtime library
2018-11-13 15:36:21 -08:00
Max Brunsfeld
508499bab1
Fix bug where missing token was inserted outside of any included range
2018-09-11 17:41:23 -07:00
Max Brunsfeld
acc937b7d7
Handle input chunks that end within multi-byte characters
2018-08-02 15:43:30 -07:00
Max Brunsfeld
87c992a7f0
Add lexer API for detecting boundaries of included ranges
...
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-07-17 13:58:26 -07:00
Max Brunsfeld
83f88164aa
Fix end positions of tokens at the end of included ranges
...
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-07-09 10:23:25 -07:00
Max Brunsfeld
3169620ce4
Fix ranges of tokens at the beginnings of included ranges
2018-07-06 17:08:36 -07:00
Max Brunsfeld
80cab8fd8a
Make the empty chunk 2 bytes long, for UTF16 support
2018-06-25 17:46:23 -07:00
Max Brunsfeld
a6451f9b4f
Add ts_parser_set_include_ranges function
...
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-20 13:37:43 -07:00
Max Brunsfeld
d7c1f84d7b
Remove resume method, make parse resume by default
...
Also, add a `reset` method to explicitly discard an outstanding parse.
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
2018-06-19 15:33:29 -07:00
Max Brunsfeld
b0b3b2e5f3
Consolidate TSInput interface down to one function
2018-06-19 09:34:40 -07:00
Max Brunsfeld
35510a612d
Rename Tree -> Subtree
2018-05-10 15:11:14 -07:00
Max Brunsfeld
facafcd6e4
Pass row/column position to input seek method
2018-02-14 07:31:49 -08:00
Max Brunsfeld
0e69da37a5
Return a character count from the lexer's get_column method
2017-12-20 16:26:38 -08:00
Max Brunsfeld
fcff16cb86
Add get_column method to lexer
2017-12-19 17:54:15 -08:00
Max Brunsfeld
36c2b685b9
Always invalidate old chunk of text when parsing after an edit
2017-10-04 15:09:46 -07:00
Max Brunsfeld
f3977ec213
Always call deserialize on external scanner before scanning
...
Remembering the last token that the external scanner produced is
not worth the complexity.
2017-08-29 14:41:55 -07:00
Max Brunsfeld
9a04231ab1
Remove length restriction in external scanner serialization API
2017-07-17 17:12:36 -07:00
Max Brunsfeld
0143bfdad4
Avoid use-after-free of external token states
...
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.
Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
f62ee5a0f3
Fix OOB reads at ends of chunks
...
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:09:16 -07:00
Max Brunsfeld
c66fddd3aa
Add TSInput option to measure columns in bytes not characters
2017-06-15 16:35:34 -07:00
Max Brunsfeld
a98d449d88
Add an option to immediately halt on syntax error
2017-05-01 13:50:49 -07:00
Timothy Clem
91558f0a0e
utf8proc_iterate can set codepoint_ref to -1 and returns negative error
2017-04-27 14:46:36 -07:00
Max Brunsfeld
d222dbb9fd
Allow lexer to accept tokens that ended at previous positions
...
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
36608180d2
Store external token states in the parse stack
2017-01-08 22:06:05 -08:00
Max Brunsfeld
2fa7b453c8
Restore external scanner's state only after repositioning lexer
...
Also, properly identify the leaf node with the external token state
2016-12-21 13:59:56 -08:00
Max Brunsfeld
0e595346be
Make lexer log output easier to read
2016-12-09 13:33:37 -08:00
Max Brunsfeld
c4fe8ded95
Remove state argument to Lexer advance method
2016-12-05 16:36:34 -08:00
Max Brunsfeld
0f8e130687
Call external scanner functions when lexing
2016-12-02 22:03:48 -08:00
Max Brunsfeld
5332fd3418
Fix build warnings
2016-11-19 20:47:43 -08:00
Max Brunsfeld
535879a2bd
Represent byte, char and tree counts as 32 bit numbers
...
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
c9dcb29c6f
Remove the TS prefix from some internal type/function names
2016-11-09 20:59:05 -08:00
Max Brunsfeld
eed54d95e1
Merge branch 'master' into changed-ranges
2016-10-16 21:10:25 -07:00
Max Brunsfeld
e149d94ff5
Remove generated parsers' dependency on runtime.h
2016-10-05 14:02:49 -07:00
Max Brunsfeld
cc62fe0375
Represent Lengths in terms of Points
2016-09-09 21:11:02 -07:00
Max Brunsfeld
38241d466b
Rename .read_fn, .seek_fn -> .read, .seek
2016-09-06 21:39:10 -07:00
Max Brunsfeld
096ac2d4b6
Rename ts_document_set_debugger -> ts_document_set_logger
2016-09-06 17:40:26 -07:00
Max Brunsfeld
e2ca55c918
Avoid unnecessary TSInput calls when resetting lexer within an existing chunk
2016-09-06 10:23:07 -07:00
Max Brunsfeld
4f0c83ba01
Move logic for lexical error handling outside of lexer functions
...
This way, less logic needs to be exposed in parser.h
2016-09-03 23:40:57 -07:00
Max Brunsfeld
1c52c30111
Fix unexpected EOF errors getting lost
2016-09-03 22:46:14 -07:00
Max Brunsfeld
38c144b4a3
Refine logic for deciding when tokens need to be re-lexed
...
* While generating the lex table, note which tokens can match the
same string. A token needs to be relexed when it has possible
homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
A token needs to be relexed when there are viable tokens that
could match longer strings in the current state and the next
token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
1e353381ff
Don't create error node in lexer unless token is completely invalid
...
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.
Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f
Distinguish separators from main tokens via a property on transitions
...
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00
Max Brunsfeld
c96c4a08e6
Use an object pool for stack nodes, to reduce allocations
...
Also, fix some leaks in the case where memory allocation failed during parsing
2016-02-04 11:19:42 -08:00
Max Brunsfeld
3dde0a6f39
Handle allocation failures during parsing
2016-01-19 18:08:01 -08:00
Max Brunsfeld
f2e7058ad9
Support UTF16 directly
...
This makes the API easier to use from javascript
2015-12-28 13:53:22 -08:00
Max Brunsfeld
da1bc038e5
Remove nested options structs in Tree
2015-12-22 14:20:58 -08:00
Max Brunsfeld
2bcd2e4d00
Reuse fragile tokens that came from the current lex state
2015-12-21 16:04:11 -08:00
Max Brunsfeld
d713054d61
Record which tokens are fragile when lexing
2015-12-10 21:05:54 -08:00
Max Brunsfeld
08d50c25ae
clang-format
2015-12-04 20:56:33 -08:00
Max Brunsfeld
d2bf88d5fe
Include rows and columns in TSLength
...
This way, we don't have to have separate 1D and 2D versions for so many values
2015-12-04 20:20:29 -08:00