tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	d8e9d04fe7	Add PREC_DYNAMIC rule for resolving runtime ambiguities	2017-07-06 15:24:45 -07:00
Max Brunsfeld	20982fdcb9	Mark tokens as non-reusable in states where shorter takes take precedence This fixes some randomized test failures in the C grammar, relating to Object-like macros. The object-like macro rule relies on a whitespace token in order to distinguish object-like macros whose values begin with a '(' from function-like macros. The presence of that whitespace token means that other nodes should not be reusable in that state.	2017-06-22 16:04:42 -07:00
Max Brunsfeld	8517313a45	🎨	2017-06-22 15:33:07 -07:00
Max Brunsfeld	8157b81b68	Improve logic for short-circuiting trivial lexing conflict detection	2017-06-22 15:33:01 -07:00
Max Brunsfeld	2c043803f1	Be more conservative about avoiding lexing conflicts when merging states This fixes a bug in the C++ grammar where the `>>` token was merged into a state where it was previously not valid, but the `>` token was valid. This caused nested templates like - std::vector<std::pair<int, int>> to not parse correctly.	2017-06-22 15:32:13 -07:00
Phil Turnbull	fdd8792ebc	Correctly set is_first From scan-build: Value stored to 'is_first' is never read	2017-06-14 11:12:06 -04:00
Phil Turnbull	c58f6401d0	Non-terminal entries always have valid state-ids	2017-06-14 08:49:38 -04:00
Phil Turnbull	577e43f653	shift-extra actions do not have valid state_ids	2017-06-09 16:26:01 -04:00
Phil Turnbull	18ba6ebbd7	Move state_id check into each_referenced_state	2017-06-09 16:25:59 -04:00
Phil Turnbull	6897530c47	Check for invalid state indexes Some ParseActions have a state-id of -1 which can cause an out-of-bounds read when removing duplicate parse states. This was found by AddressSanitizer: ==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438 READ of size 8 at 0x6320000187f8 thread T0 #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)::operator()(unsigned long) const build_parse_table.cc:398 #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)&, unsigned long>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)&&&, unsigned long&&) __functional_base:416 ... 0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8) allocated by thread T0 here: #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b) #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169 #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074 #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068 #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378 #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85 ... SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)::operator()(unsigned long) const Shadow bytes around the buggy address: 0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa] 0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00	2017-06-07 17:23:44 -04:00
Phil Turnbull	dee86f908a	Correctly check type is ParseActionTypeRecover	2017-06-07 17:05:39 -04:00
joshvera	f76935cc7e	just make it static	2017-03-24 18:38:21 -04:00
joshvera	6938b288a5	Make external scanner symbol map unique	2017-03-24 14:51:37 -04:00
Max Brunsfeld	ed8fbff175	Allow anonymous tokens to be used in grammars' external token lists	2017-03-17 16:31:29 -07:00
Max Brunsfeld	b3edd8f749	Remove use of shared_ptr in choice, repeat, and seq factories	2017-03-17 14:28:13 -07:00
Max Brunsfeld	d9fb863bea	Fix build errors w/ gcc	2017-03-17 14:03:49 -07:00
Max Brunsfeld	416cbb9def	Add missing cassert includes	2017-03-17 13:54:40 -07:00
Max Brunsfeld	90d21adf3b	Format make_visitor helper consistently w/ project	2017-03-17 13:37:26 -07:00
Max Brunsfeld	db4b9ebc7c	Implement Rule as a union rather than an abstract base class	2017-03-17 13:29:31 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	f04d7c5860	Handle unused tokens	2017-03-09 21:16:37 -08:00
Max Brunsfeld	c79fae6d21	Clean up extract_tokens function	2017-03-09 21:16:20 -08:00
Max Brunsfeld	f049d5d94c	Make ParseItem a struct, not a class	2017-03-08 21:06:30 -08:00
Max Brunsfeld	64e9230071	Use LexTableBuilder to detect conflicts between tokens more correctly	2017-03-08 12:47:38 -08:00
Max Brunsfeld	abf8a4f2c2	🎨	2017-03-01 22:15:26 -08:00
Max Brunsfeld	686dc0997c	Avoid introducing certain lexical conflicts during parse state merging The current pretty conservative approach is to avoid merging parse states which would cause a pair tokens to co-exist for the first time in any parse state, where the two tokens can start with the same character and at least one of the tokens can contain a character which is part of the grammar's separators.	2017-02-27 22:54:38 -08:00
Max Brunsfeld	3c8e6f9987	Restructure parse state merging logic * Remove remnants of templatized remove_duplicate_states function * Rename recovery_tokens function to get_compatible_tokens and augment it also compute pairs of tokens which could potentially be incompatible	2017-02-26 12:23:48 -08:00
Timothy Clem	ab00f1b0da	Add support for \W and \D negated character classes too	2017-01-31 15:03:48 -08:00
Timothy Clem	902b7f9745	Allow \S for negated whitespace regex shorthand	2017-01-31 14:45:28 -08:00
Max Brunsfeld	0a6e5f9ee6	Fix some build warnings on gcc	2017-01-31 11:46:28 -08:00
Max Brunsfeld	4131e1c16e	Return an error when external token name matches non-terminal rule	2017-01-31 11:36:51 -08:00
Max Brunsfeld	60f6998485	Rename generated language functions to e.g. `tree_sitter_python` They used to be called e.g. `ts_language_python`. Now that there are APIs that deal with the `TSLanguage` objects themselves, such as `ts_language_symbol_count`, the old names were a little confusing.	2017-01-31 10:29:31 -08:00
Max Brunsfeld	d853b6504d	Add version number to TSLanguage structs	2017-01-31 10:21:47 -08:00
Max Brunsfeld	3706678b89	Pass const TSExternalTokenState to external scanner deserialize hook	2016-12-21 13:58:18 -08:00
Max Brunsfeld	34a65f588d	Tweak naming and organization of external-scanner related language fields	2016-12-21 11:24:41 -08:00
Max Brunsfeld	42c41c158c	Refactor logic for handling shared internal/external tokens	2016-12-21 10:49:55 -08:00
Max Brunsfeld	e6c82ead2c	Start work toward maintaining external scanner's state during incremental parses	2016-12-20 17:06:20 -08:00
Max Brunsfeld	2b3da512a4	Add serialize, deserialize and reset callbacks to external scanners Signed-off-by: Nathan Sobo <nathan@github.com>	2016-12-20 13:12:01 -08:00
Max Brunsfeld	a1770ce844	Allow external tokens to be used as extras	2016-12-12 22:06:01 -08:00
Max Brunsfeld	10b51a05a1	Allow external scanners to refer to (and return) internally-defined tokens Tokens that are defined in the grammar's rules may now be included in the externals list also, so that external scanners can check if they are valid lookaheads or not, and if so, can return them to the parser if needed.	2016-12-09 13:32:58 -08:00
Max Brunsfeld	83514293b5	Allow external tokens to be either visible or hidden	2016-12-05 17:26:11 -08:00
Max Brunsfeld	1251ff2e30	Consider externals to be named, not anonymous	2016-12-05 17:09:22 -08:00
Max Brunsfeld	c16b6b2059	Run external scanners during error recovery	2016-12-05 11:50:24 -08:00
Max Brunsfeld	49d25bd0f8	Remove EXTERNAL_TOKEN grammar rule type	2016-12-04 15:02:32 -08:00
Max Brunsfeld	d72b49316b	Handle external tokens in apply_transitive_closure	2016-12-04 10:40:32 -08:00
Max Brunsfeld	0f8e130687	Call external scanner functions when lexing	2016-12-02 22:03:48 -08:00
Max Brunsfeld	c966af0412	Start work on external tokens	2016-12-02 16:24:19 -08:00
Max Brunsfeld	be9e79db1b	Avoid incorrect application of precedence	2016-12-01 10:24:06 -08:00
Max Brunsfeld	996ca91e70	Disallow syntax rules that match the empty string (for now)	2016-11-30 23:19:54 -08:00
Max Brunsfeld	101e304a8a	Avoid unnecessary lookahead set mutations in ParseItemSetBuilder	2016-11-20 21:41:36 -08:00

1 2 3 4 5 ...

529 commits