Commit graph

271 commits

Author SHA1 Message Date
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Max Brunsfeld
1a195d44bb Whoops, dynamic precedence needs a sign 2017-07-14 11:06:16 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
107feb7960 Bump the language version number after adding dynamic precedences 2017-07-06 15:58:29 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
17bc3dfaf7 Add a benchmark command
This command measures the speed of parsing each grammar's examples.
It also uses each grammar to parse all of the *other* grammars' examples
in order to measure error recovery performance with fairly large files.
2017-07-05 14:14:38 -07:00
Max Brunsfeld
c66fddd3aa Add TSInput option to measure columns in bytes not characters 2017-06-15 16:35:34 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00
Rob Rix
3a888b1623 Define a function providing the type of a given symbol. 2017-04-12 09:47:51 -04:00
Rob Rix
4b1f69142d Define a symbol type enum. 2017-04-12 09:46:01 -04:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Rob Rix
c230658bae Add public API to set the input string with explicit length. 2017-02-10 09:10:31 -05:00
Max Brunsfeld
4131e1c16e Return an error when external token name matches non-terminal rule 2017-01-31 11:36:51 -08:00
Max Brunsfeld
d853b6504d Add version number to TSLanguage structs 2017-01-31 10:21:47 -08:00
Max Brunsfeld
3706678b89 Pass const TSExternalTokenState to external scanner deserialize hook 2016-12-21 13:58:18 -08:00
Max Brunsfeld
34a65f588d Tweak naming and organization of external-scanner related language fields 2016-12-21 11:24:41 -08:00
Max Brunsfeld
2b3da512a4 Add serialize, deserialize and reset callbacks to external scanners
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-12-20 13:12:01 -08:00
Max Brunsfeld
c4fe8ded95 Remove state argument to Lexer advance method 2016-12-05 16:36:34 -08:00
Max Brunsfeld
0f8e130687 Call external scanner functions when lexing 2016-12-02 22:03:48 -08:00
Max Brunsfeld
c966af0412 Start work on external tokens 2016-12-02 16:24:19 -08:00
Max Brunsfeld
996ca91e70 Disallow syntax rules that match the empty string (for now) 2016-11-30 23:19:54 -08:00
Max Brunsfeld
535879a2bd Represent byte, char and tree counts as 32 bit numbers
The parser spends the majority of its time allocating and freeing trees and stack nodes.
Also, the memory footprint of the AST is a significant concern when using tree-sitter
with large files. This library is already unlikely to work very well with source files
larger than 4GB, so representing rows, columns, byte lengths and child indices as
unsigned 32 bit integers seems like the right choice.
2016-11-14 12:19:13 -08:00
Max Brunsfeld
fad7294ba4 Store shift states for non-terminals directly in the main parse table 2016-11-14 08:36:06 -08:00
Max Brunsfeld
4106ecda43 Remove logic for recovering from OOM 2016-11-04 09:18:38 -07:00
Max Brunsfeld
e53beb66c9 Avoid anonymous nested struct to silence override-init warnings 2016-10-26 11:10:56 -07:00
Max Brunsfeld
eed54d95e1 Merge branch 'master' into changed-ranges 2016-10-16 21:10:25 -07:00
Max Brunsfeld
e149d94ff5 Remove generated parsers' dependency on runtime.h 2016-10-05 14:02:49 -07:00
Max Brunsfeld
00528e50ce Change edit API to be byte-based 2016-09-13 13:08:52 -07:00
Max Brunsfeld
cc62fe0375 Represent Lengths in terms of Points 2016-09-09 21:11:02 -07:00
Max Brunsfeld
131bbee160 Rename parse_and_diff -> parse_and_get_changed_ranges
Signed-off-by: Nathan Sobo <nathan@github.com>
2016-09-08 17:51:34 -07:00
Max Brunsfeld
fce8d57152 Start work on document_parse_and_diff API 2016-09-08 17:51:20 -07:00
Max Brunsfeld
a6a08dde31 Rename ts_node_name -> ts_node_type 2016-09-06 21:43:59 -07:00
Max Brunsfeld
38241d466b Rename .read_fn, .seek_fn -> .read, .seek 2016-09-06 21:39:10 -07:00
Max Brunsfeld
f6da44fdbb Add ts_node_descendant_for_byte_range 2016-09-06 21:33:19 -07:00
Max Brunsfeld
70756034f1 Allow descendant queries by both 1D and 2D coordinates 2016-09-06 21:17:26 -07:00
Max Brunsfeld
096ac2d4b6 Rename ts_document_set_debugger -> ts_document_set_logger 2016-09-06 17:40:26 -07:00
Max Brunsfeld
64a6c9db0e Rename ts_document_make -> ts_document_new 2016-09-06 17:26:18 -07:00
Max Brunsfeld
b76574e01c Handle ambiguities between extra and non-extra tokens using normal GLR splitting 2016-09-06 10:22:16 -07:00
Max Brunsfeld
4f0c83ba01 Move logic for lexical error handling outside of lexer functions
This way, less logic needs to be exposed in parser.h
2016-09-03 23:40:57 -07:00
Max Brunsfeld
1c52c30111 Fix unexpected EOF errors getting lost 2016-09-03 22:46:14 -07:00
Max Brunsfeld
8c26d99353 Store error recovery actions in the normal parse table 2016-06-27 14:07:47 -07:00
Max Brunsfeld
43ae8235fd Remove the error action; a lack of actions implies an error. 2016-06-21 22:53:48 -07:00
Max Brunsfeld
6a7a5cfc3f Remove nesting in parse action struct 2016-06-21 21:36:33 -07:00
Max Brunsfeld
38c144b4a3 Refine logic for deciding when tokens need to be re-lexed
* While generating the lex table, note which tokens can match the
  same string. A token needs to be relexed when it has possible
  homonyms in the current state.
* Also note which tokens can match substrings of each other tokens.
  A token needs to be relexed when there are viable tokens that
  could match longer strings in the current state and the next
  token has been edited.
* Remove the logic for marking tokens as fragile on creation.
* Store the reusability/non-reusability of symbols off of individual
  actions and onto the entire entry for the state & symbol.
2016-06-21 07:28:04 -07:00
Max Brunsfeld
45f7cee0c8 Handle extra tokens properly during error recovery 2016-06-18 20:46:25 -07:00
Max Brunsfeld
94721c7ec0 Rewind and re-tokenize in error mode after detecting an error 2016-06-17 21:26:03 -07:00
Max Brunsfeld
1e353381ff Don't create error node in lexer unless token is completely invalid
Before, any syntax error would cause the lexer to create an error
leaf node. This could happen even with a valid input, if the parse
stack had split and one particular version of the parse stack
failed to parse.

Now, an error leaf node is only created when the lexer cannot understand
part of the input stream at all. When a normal syntax error occurs,
the lexer just returns a token that is outside of the expected token
set, and the parser handles the unexpected token.
2016-05-26 14:15:10 -07:00
Max Brunsfeld
a3679fbb1f Distinguish separators from main tokens via a property on transitions
It was incorrect to store it as a property on the lexical states themselves
2016-05-19 16:27:25 -07:00