Commit graph

282 commits

Author SHA1 Message Date
Max Brunsfeld
f653f2b3bb Add ts_node_first_{child,named_child}_for_byte methods 2018-01-09 13:44:59 -08:00
Max Brunsfeld
d3c85f288d Start work on repairing errors by inserting missing tokens 2017-12-29 15:11:00 -08:00
Max Brunsfeld
0e69da37a5 Return a character count from the lexer's get_column method 2017-12-20 16:26:38 -08:00
Max Brunsfeld
fcff16cb86 Add get_column method to lexer 2017-12-19 17:54:15 -08:00
Max Brunsfeld
b0fdc33f73 Remove 'extra' and 'structural' booleans from symbol metadata 2017-09-14 12:07:46 -07:00
Max Brunsfeld
037933ffc5 Bump LANGUAGE_VERSION constant due to incompatible parse table change 2017-09-14 11:09:26 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
e6b43700b9 Get generated parsers compiling and loading properly on windows 2017-08-08 16:47:51 -07:00
Max Brunsfeld
94dc703bfc Require that grammars' start rules be visible 2017-08-04 17:07:37 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
1df41a9107 Avoid anonymous struct to silence gcc's override-init warning (again) 2017-07-21 10:17:54 -07:00
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Max Brunsfeld
1a195d44bb Whoops, dynamic precedence needs a sign 2017-07-14 11:06:16 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
107feb7960 Bump the language version number after adding dynamic precedences 2017-07-06 15:58:29 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
17bc3dfaf7 Add a benchmark command
This command measures the speed of parsing each grammar's examples.
It also uses each grammar to parse all of the *other* grammars' examples
in order to measure error recovery performance with fairly large files.
2017-07-05 14:14:38 -07:00
Max Brunsfeld
c66fddd3aa Add TSInput option to measure columns in bytes not characters 2017-06-15 16:35:34 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00
Rob Rix
3a888b1623 Define a function providing the type of a given symbol. 2017-04-12 09:47:51 -04:00
Rob Rix
4b1f69142d Define a symbol type enum. 2017-04-12 09:46:01 -04:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Rob Rix
c230658bae Add public API to set the input string with explicit length. 2017-02-10 09:10:31 -05:00
Max Brunsfeld
4131e1c16e Return an error when external token name matches non-terminal rule 2017-01-31 11:36:51 -08:00
Max Brunsfeld
d853b6504d Add version number to TSLanguage structs 2017-01-31 10:21:47 -08:00
Max Brunsfeld
3706678b89 Pass const TSExternalTokenState to external scanner deserialize hook 2016-12-21 13:58:18 -08:00
Max Brunsfeld
34a65f588d Tweak naming and organization of external-scanner related language fields 2016-12-21 11:24:41 -08:00
Max Brunsfeld
2b3da512a4 Add serialize, deserialize and reset callbacks to external scanners
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-12-20 13:12:01 -08:00
Max Brunsfeld
c4fe8ded95 Remove state argument to Lexer advance method 2016-12-05 16:36:34 -08:00
Max Brunsfeld
0f8e130687 Call external scanner functions when lexing 2016-12-02 22:03:48 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
996ca91e70 Disallow syntax rules that match the empty string (for now) 2016-11-30 23:19:54 -08:00
Max Brunsfeld
535879a2bd Represent byte, char and tree counts as 32 bit numbers
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
4106ecda43 Remove logic for recovering from OOM 2016-11-04 09:18:38 -07:00
Max Brunsfeld
e53beb66c9 Avoid anonymous nested struct to silence override-init warnings 2016-10-26 11:10:56 -07:00
Max Brunsfeld
eed54d95e1 Merge branch 'master' into changed-ranges 2016-10-16 21:10:25 -07:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
00528e50ce Change edit API to be byte-based 2016-09-13 13:08:52 -07:00
Max Brunsfeld
cc62fe0375 Represent Lengths in terms of Points 2016-09-09 21:11:02 -07:00
Max Brunsfeld
131bbee160 Rename parse_and_diff -> parse_and_get_changed_ranges
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-09-08 17:51:34 -07:00
Max Brunsfeld
fce8d57152 Start work on document_parse_and_diff API 2016-09-08 17:51:20 -07:00
Max Brunsfeld
a6a08dde31 Rename ts_node_name -> ts_node_type 2016-09-06 21:43:59 -07:00
Max Brunsfeld
38241d466b Rename .read_fn, .seek_fn -> .read, .seek 2016-09-06 21:39:10 -07:00
Max Brunsfeld
f6da44fdbb Add ts_node_descendant_for_byte_range 2016-09-06 21:33:19 -07:00
Max Brunsfeld
70756034f1 Allow descendant queries by both 1D and 2D coordinates 2016-09-06 21:17:26 -07:00
Max Brunsfeld
096ac2d4b6 Rename ts_document_set_debugger -> ts_document_set_logger 2016-09-06 17:40:26 -07:00
Max Brunsfeld
64a6c9db0e Rename ts_document_make -> ts_document_new 2016-09-06 17:26:18 -07:00