Commit graph

544 commits

Author SHA1 Message Date
Max Brunsfeld
d646889922 Simplify flatten_rule function 2017-07-13 09:59:23 -07:00
Max Brunsfeld
7293e6f0cc Fix compile warnings 2017-07-12 22:08:36 -07:00
Max Brunsfeld
62c577af33 Remove unnecessary using statements 2017-07-12 21:41:37 -07:00
Max Brunsfeld
a3006bc2b5 Represent LookaheadSet using vectors of bool 2017-07-12 16:02:01 -07:00
Max Brunsfeld
65bf1389e1 Add a way to automatically inline rules 2017-07-11 23:13:44 -07:00
Max Brunsfeld
26a25278cd When comparing parse items, ignore consumed part of their productions
This speeds up parser generation by increasing the likelihood that we'll recognize
parse item sets as equivalent in advance, rather than having to merge their states
after the fact.
2017-07-11 17:30:32 -07:00
Max Brunsfeld
a199b217f3 Optimize ParseTableBuilder for non-terminals w/ many productions 2017-07-11 12:54:29 -07:00
Max Brunsfeld
68c3ba1b8b 🎨 merge_parse_state 2017-07-10 16:46:11 -07:00
Max Brunsfeld
5bd5b4bb05 Replace <cctype> -> <cwctype> 2017-07-10 14:35:14 -07:00
Max Brunsfeld
59236d2ed1 Avoid redundant character comparisons in generated lex function 2017-07-10 14:09:31 -07:00
Max Brunsfeld
2755b07222 Don't store unfinished item signature on ParseStates 2017-07-10 10:47:38 -07:00
Max Brunsfeld
1586d70cbe Compute conflicting tokens more precisely
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
2017-07-07 17:54:24 -07:00
Max Brunsfeld
a98abde529 Provide all preceding symbols as context when reporting conflicts 2017-07-07 14:52:56 -07:00
Max Brunsfeld
c91ceaaa8d 🎨 build_parse_table 2017-07-07 14:52:45 -07:00
Max Brunsfeld
0de93b3bf2 Allow negative dynamic precedences 2017-07-06 22:21:59 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
20982fdcb9 Mark tokens as non-reusable in states where shorter takes take precedence
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
2017-06-22 16:04:42 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Phil Turnbull
fdd8792ebc Correctly set is_first
From scan-build: Value stored to 'is_first' is never read
2017-06-14 11:12:06 -04:00
Phil Turnbull
c58f6401d0 Non-terminal entries always have valid state-ids 2017-06-14 08:49:38 -04:00
Phil Turnbull
577e43f653 shift-extra actions do not have valid state_ids 2017-06-09 16:26:01 -04:00
Phil Turnbull
18ba6ebbd7 Move state_id check into each_referenced_state 2017-06-09 16:25:59 -04:00
Phil Turnbull
6897530c47 Check for invalid state indexes
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:

==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
    #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
    #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
    #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
    #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
    #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
    #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
    #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
    #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
  0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2017-06-07 17:23:44 -04:00
Phil Turnbull
dee86f908a Correctly check type is ParseActionTypeRecover 2017-06-07 17:05:39 -04:00
joshvera
f76935cc7e just make it static 2017-03-24 18:38:21 -04:00
joshvera
6938b288a5 Make external scanner symbol map unique 2017-03-24 14:51:37 -04:00
Max Brunsfeld
ed8fbff175 Allow anonymous tokens to be used in grammars' external token lists 2017-03-17 16:31:29 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
d9fb863bea Fix build errors w/ gcc 2017-03-17 14:03:49 -07:00
Max Brunsfeld
416cbb9def Add missing cassert includes 2017-03-17 13:54:40 -07:00
Max Brunsfeld
90d21adf3b Format make_visitor helper consistently w/ project 2017-03-17 13:37:26 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
f04d7c5860 Handle unused tokens 2017-03-09 21:16:37 -08:00
Max Brunsfeld
c79fae6d21 Clean up extract_tokens function 2017-03-09 21:16:20 -08:00
Max Brunsfeld
f049d5d94c Make ParseItem a struct, not a class 2017-03-08 21:06:30 -08:00
Max Brunsfeld
64e9230071 Use LexTableBuilder to detect conflicts between tokens more correctly 2017-03-08 12:47:38 -08:00
Max Brunsfeld
abf8a4f2c2 🎨 2017-03-01 22:15:26 -08:00
Max Brunsfeld
686dc0997c Avoid introducing certain lexical conflicts during parse state merging
The current pretty conservative approach is to avoid merging parse states which
would cause a pair tokens to co-exist for the first time in any parse state,
where the two tokens can start with the same character and at least one of the
tokens can contain a character which is part of the grammar's separators.
2017-02-27 22:54:38 -08:00
Max Brunsfeld
3c8e6f9987 Restructure parse state merging logic
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
  also compute pairs of tokens which could potentially be incompatible
2017-02-26 12:23:48 -08:00
Timothy Clem
ab00f1b0da Add support for \W and \D negated character classes too 2017-01-31 15:03:48 -08:00
Timothy Clem
902b7f9745 Allow \S for negated whitespace regex shorthand 2017-01-31 14:45:28 -08:00
Max Brunsfeld
0a6e5f9ee6 Fix some build warnings on gcc 2017-01-31 11:46:28 -08:00
Max Brunsfeld
4131e1c16e Return an error when external token name matches non-terminal rule 2017-01-31 11:36:51 -08:00
Max Brunsfeld
60f6998485 Rename generated language functions to e.g. tree_sitter_python
They used to be called e.g. `ts_language_python`. Now that there
are APIs that deal with the `TSLanguage` objects themselves, such
as `ts_language_symbol_count`, the old names were a little confusing.
2017-01-31 10:29:31 -08:00
Max Brunsfeld
d853b6504d Add version number to TSLanguage structs 2017-01-31 10:21:47 -08:00
Max Brunsfeld
3706678b89 Pass const TSExternalTokenState to external scanner deserialize hook 2016-12-21 13:58:18 -08:00
Max Brunsfeld
34a65f588d Tweak naming and organization of external-scanner related language fields 2016-12-21 11:24:41 -08:00