Commit graph

557 commits

Author SHA1 Message Date
Max Brunsfeld
84e4114f79 Allow conflicts involving repeat rules to be whitelisted, via their parent rule 2017-08-03 15:18:29 -07:00
Max Brunsfeld
119c67dd78 Fix conflict reporting for shift/reduce conflicts w/ multiple reductions
We were failing to rule out shift actions with lower precedence.

Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-08-02 15:13:30 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
b5f421cafb Fix name collision that gcc didn't tolerate 2017-07-21 16:28:39 -07:00
Max Brunsfeld
cbdfd89675 Mark reductions as fragile based on their final properties
We previously maintained a set of individual productions that were
involved in conflicts, but that was subtly incorrect because
we don't compare productions themselves when comparing parse items;
we only compare the parse items properties that could affect the
final reduce actions.
2017-07-21 09:54:24 -07:00
Max Brunsfeld
7d9d8bce79 Handle inlined rules that contain other inlined rules 2017-07-20 15:29:06 -07:00
Max Brunsfeld
4649c3a37f Avoid creating redundant rename sequences 2017-07-18 15:29:06 -07:00
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Max Brunsfeld
66dc12587a Call the external scanner whenever an external token is valid
For some reason, there was previously some extra logic that prevented
the external scanner from being invoked if the only valid external
token also had an internal definition.

It's surprising to not call the external scanner if an external
token is valid.
2017-07-17 10:28:59 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
0b94e9d814 Don't include preceding production steps in ParseItem hash 2017-07-13 13:42:28 -07:00
Max Brunsfeld
561821d011 Remove precedence and associativity methods from ParseAction 2017-07-13 13:41:56 -07:00
Max Brunsfeld
d646889922 Simplify flatten_rule function 2017-07-13 09:59:23 -07:00
Max Brunsfeld
7293e6f0cc Fix compile warnings 2017-07-12 22:08:36 -07:00
Max Brunsfeld
62c577af33 Remove unnecessary using statements 2017-07-12 21:41:37 -07:00
Max Brunsfeld
a3006bc2b5 Represent LookaheadSet using vectors of bool 2017-07-12 16:02:01 -07:00
Max Brunsfeld
65bf1389e1 Add a way to automatically inline rules 2017-07-11 23:13:44 -07:00
Max Brunsfeld
26a25278cd When comparing parse items, ignore consumed part of their productions
This speeds up parser generation by increasing the likelihood that we'll recognize
parse item sets as equivalent in advance, rather than having to merge their states
after the fact.
2017-07-11 17:30:32 -07:00
Max Brunsfeld
a199b217f3 Optimize ParseTableBuilder for non-terminals w/ many productions 2017-07-11 12:54:29 -07:00
Max Brunsfeld
68c3ba1b8b 🎨 merge_parse_state 2017-07-10 16:46:11 -07:00
Max Brunsfeld
5bd5b4bb05 Replace <cctype> -> <cwctype> 2017-07-10 14:35:14 -07:00
Max Brunsfeld
59236d2ed1 Avoid redundant character comparisons in generated lex function 2017-07-10 14:09:31 -07:00
Max Brunsfeld
2755b07222 Don't store unfinished item signature on ParseStates 2017-07-10 10:47:38 -07:00
Max Brunsfeld
1586d70cbe Compute conflicting tokens more precisely
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
2017-07-07 17:54:24 -07:00
Max Brunsfeld
a98abde529 Provide all preceding symbols as context when reporting conflicts 2017-07-07 14:52:56 -07:00
Max Brunsfeld
c91ceaaa8d 🎨 build_parse_table 2017-07-07 14:52:45 -07:00
Max Brunsfeld
0de93b3bf2 Allow negative dynamic precedences 2017-07-06 22:21:59 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
20982fdcb9 Mark tokens as non-reusable in states where shorter takes take precedence
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
2017-06-22 16:04:42 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Phil Turnbull
fdd8792ebc Correctly set is_first
From scan-build: Value stored to 'is_first' is never read
2017-06-14 11:12:06 -04:00
Phil Turnbull
c58f6401d0 Non-terminal entries always have valid state-ids 2017-06-14 08:49:38 -04:00
Phil Turnbull
577e43f653 shift-extra actions do not have valid state_ids 2017-06-09 16:26:01 -04:00
Phil Turnbull
18ba6ebbd7 Move state_id check into each_referenced_state 2017-06-09 16:25:59 -04:00
Phil Turnbull
6897530c47 Check for invalid state indexes
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:

==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
    #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
    #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
    #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
    #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
    #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
    #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
    #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
    #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
  0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2017-06-07 17:23:44 -04:00
Phil Turnbull
dee86f908a Correctly check type is ParseActionTypeRecover 2017-06-07 17:05:39 -04:00
joshvera
f76935cc7e just make it static 2017-03-24 18:38:21 -04:00
joshvera
6938b288a5 Make external scanner symbol map unique 2017-03-24 14:51:37 -04:00
Max Brunsfeld
ed8fbff175 Allow anonymous tokens to be used in grammars' external token lists 2017-03-17 16:31:29 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
d9fb863bea Fix build errors w/ gcc 2017-03-17 14:03:49 -07:00
Max Brunsfeld
416cbb9def Add missing cassert includes 2017-03-17 13:54:40 -07:00
Max Brunsfeld
90d21adf3b Format make_visitor helper consistently w/ project 2017-03-17 13:37:26 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
f04d7c5860 Handle unused tokens 2017-03-09 21:16:37 -08:00
Max Brunsfeld
c79fae6d21 Clean up extract_tokens function 2017-03-09 21:16:20 -08:00