tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	d646889922	Simplify flatten_rule function	2017-07-13 09:59:23 -07:00
Max Brunsfeld	7293e6f0cc	Fix compile warnings	2017-07-12 22:08:36 -07:00
Max Brunsfeld	62c577af33	Remove unnecessary using statements	2017-07-12 21:41:37 -07:00
Max Brunsfeld	a3006bc2b5	Represent LookaheadSet using vectors of bool	2017-07-12 16:02:01 -07:00
Max Brunsfeld	65bf1389e1	Add a way to automatically inline rules	2017-07-11 23:13:44 -07:00
Max Brunsfeld	26a25278cd	When comparing parse items, ignore consumed part of their productions This speeds up parser generation by increasing the likelihood that we'll recognize parse item sets as equivalent in advance, rather than having to merge their states after the fact.	2017-07-11 17:30:32 -07:00
Max Brunsfeld	a199b217f3	Optimize ParseTableBuilder for non-terminals w/ many productions	2017-07-11 12:54:29 -07:00
Max Brunsfeld	68c3ba1b8b	🎨 merge_parse_state	2017-07-10 16:46:11 -07:00
Max Brunsfeld	5bd5b4bb05	Replace <cctype> -> <cwctype>	2017-07-10 14:35:14 -07:00
Max Brunsfeld	59236d2ed1	Avoid redundant character comparisons in generated lex function	2017-07-10 14:09:31 -07:00
Max Brunsfeld	2755b07222	Don't store unfinished item signature on ParseStates	2017-07-10 10:47:38 -07:00
Max Brunsfeld	1586d70cbe	Compute conflicting tokens more precisely While generating the parse table, keep track of which tokens can follow one another. Then use this information to evaluate token conflicts more precisely. This will result in a smaller parse table than the previous, overly-conservative approach.	2017-07-07 17:54:24 -07:00
Max Brunsfeld	a98abde529	Provide all preceding symbols as context when reporting conflicts	2017-07-07 14:52:56 -07:00
Max Brunsfeld	c91ceaaa8d	🎨 build_parse_table	2017-07-07 14:52:45 -07:00
Max Brunsfeld	0de93b3bf2	Allow negative dynamic precedences	2017-07-06 22:21:59 -07:00
Max Brunsfeld	d8e9d04fe7	Add PREC_DYNAMIC rule for resolving runtime ambiguities	2017-07-06 15:24:45 -07:00
Max Brunsfeld	20982fdcb9	Mark tokens as non-reusable in states where shorter takes take precedence This fixes some randomized test failures in the C grammar, relating to Object-like macros. The object-like macro rule relies on a whitespace token in order to distinguish object-like macros whose values begin with a '(' from function-like macros. The presence of that whitespace token means that other nodes should not be reusable in that state.	2017-06-22 16:04:42 -07:00
Max Brunsfeld	8517313a45	🎨	2017-06-22 15:33:07 -07:00
Max Brunsfeld	8157b81b68	Improve logic for short-circuiting trivial lexing conflict detection	2017-06-22 15:33:01 -07:00
Max Brunsfeld	2c043803f1	Be more conservative about avoiding lexing conflicts when merging states This fixes a bug in the C++ grammar where the `>>` token was merged into a state where it was previously not valid, but the `>` token was valid. This caused nested templates like - std::vector<std::pair<int, int>> to not parse correctly.	2017-06-22 15:32:13 -07:00
Phil Turnbull	fdd8792ebc	Correctly set is_first From scan-build: Value stored to 'is_first' is never read	2017-06-14 11:12:06 -04:00
Phil Turnbull	c58f6401d0	Non-terminal entries always have valid state-ids	2017-06-14 08:49:38 -04:00
Phil Turnbull	577e43f653	shift-extra actions do not have valid state_ids	2017-06-09 16:26:01 -04:00
Phil Turnbull	18ba6ebbd7	Move state_id check into each_referenced_state	2017-06-09 16:25:59 -04:00
Phil Turnbull	6897530c47	Check for invalid state indexes Some ParseActions have a state-id of -1 which can cause an out-of-bounds read when removing duplicate parse states. This was found by AddressSanitizer: ==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438 READ of size 8 at 0x6320000187f8 thread T0 #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)::operator()(unsigned long) const build_parse_table.cc:398 #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)&, unsigned long>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)&&&, unsigned long&&) __functional_base:416 ... 0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8) allocated by thread T0 here: #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b) #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169 #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074 #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068 #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378 #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85 ... SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long)::operator()(unsigned long) const Shadow bytes around the buggy address: 0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa] 0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00	2017-06-07 17:23:44 -04:00
Phil Turnbull	dee86f908a	Correctly check type is ParseActionTypeRecover	2017-06-07 17:05:39 -04:00
joshvera	f76935cc7e	just make it static	2017-03-24 18:38:21 -04:00
joshvera	6938b288a5	Make external scanner symbol map unique	2017-03-24 14:51:37 -04:00
Max Brunsfeld	ed8fbff175	Allow anonymous tokens to be used in grammars' external token lists	2017-03-17 16:31:29 -07:00
Max Brunsfeld	b3edd8f749	Remove use of shared_ptr in choice, repeat, and seq factories	2017-03-17 14:28:13 -07:00
Max Brunsfeld	d9fb863bea	Fix build errors w/ gcc	2017-03-17 14:03:49 -07:00
Max Brunsfeld	416cbb9def	Add missing cassert includes	2017-03-17 13:54:40 -07:00
Max Brunsfeld	90d21adf3b	Format make_visitor helper consistently w/ project	2017-03-17 13:37:26 -07:00
Max Brunsfeld	db4b9ebc7c	Implement Rule as a union rather than an abstract base class	2017-03-17 13:29:31 -07:00
Max Brunsfeld	d222dbb9fd	Allow lexer to accept tokens that ended at previous positions * Track lookahead in each tree * Add 'mark_end' API that external scanners can use	2017-03-13 17:06:52 -07:00
Max Brunsfeld	f04d7c5860	Handle unused tokens	2017-03-09 21:16:37 -08:00
Max Brunsfeld	c79fae6d21	Clean up extract_tokens function	2017-03-09 21:16:20 -08:00
Max Brunsfeld	f049d5d94c	Make ParseItem a struct, not a class	2017-03-08 21:06:30 -08:00
Max Brunsfeld	64e9230071	Use LexTableBuilder to detect conflicts between tokens more correctly	2017-03-08 12:47:38 -08:00
Max Brunsfeld	abf8a4f2c2	🎨	2017-03-01 22:15:26 -08:00
Max Brunsfeld	686dc0997c	Avoid introducing certain lexical conflicts during parse state merging The current pretty conservative approach is to avoid merging parse states which would cause a pair tokens to co-exist for the first time in any parse state, where the two tokens can start with the same character and at least one of the tokens can contain a character which is part of the grammar's separators.	2017-02-27 22:54:38 -08:00
Max Brunsfeld	3c8e6f9987	Restructure parse state merging logic * Remove remnants of templatized remove_duplicate_states function * Rename recovery_tokens function to get_compatible_tokens and augment it also compute pairs of tokens which could potentially be incompatible	2017-02-26 12:23:48 -08:00
Timothy Clem	ab00f1b0da	Add support for \W and \D negated character classes too	2017-01-31 15:03:48 -08:00
Timothy Clem	902b7f9745	Allow \S for negated whitespace regex shorthand	2017-01-31 14:45:28 -08:00
Max Brunsfeld	0a6e5f9ee6	Fix some build warnings on gcc	2017-01-31 11:46:28 -08:00
Max Brunsfeld	4131e1c16e	Return an error when external token name matches non-terminal rule	2017-01-31 11:36:51 -08:00
Max Brunsfeld	60f6998485	Rename generated language functions to e.g. `tree_sitter_python` They used to be called e.g. `ts_language_python`. Now that there are APIs that deal with the `TSLanguage` objects themselves, such as `ts_language_symbol_count`, the old names were a little confusing.	2017-01-31 10:29:31 -08:00
Max Brunsfeld	d853b6504d	Add version number to TSLanguage structs	2017-01-31 10:21:47 -08:00
Max Brunsfeld	3706678b89	Pass const TSExternalTokenState to external scanner deserialize hook	2016-12-21 13:58:18 -08:00
Max Brunsfeld	34a65f588d	Tweak naming and organization of external-scanner related language fields	2016-12-21 11:24:41 -08:00

1 2 3 4 5 ...

544 commits