Commit graph

1053 commits

Author SHA1 Message Date
Max Brunsfeld
8ee3f96960 Fix formatting of non-ascii unexpected characters
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:08:50 -07:00
Max Brunsfeld
513edec7c1 Merge pull request #77 from philipturnbull/scan-build-fixes
Fix errors found by scan-build
2017-06-20 10:15:20 -07:00
Max Brunsfeld
599367d36d Always recur into error nodes when reporting changed ranges 2017-06-15 17:06:48 -07:00
Max Brunsfeld
c66fddd3aa Add TSInput option to measure columns in bytes not characters 2017-06-15 16:35:34 -07:00
Phil Turnbull
cfca764d48 Root can never be NULL in this context 2017-06-15 07:47:16 -04:00
Max Brunsfeld
b862db766e Merge remote-tracking branch 'origin/master' into update-fixture-grammars 2017-06-14 17:11:44 -07:00
Phil Turnbull
d1b19e8196 Prevent NULL pointer dereference in parser__accept
parser__select_tree can return true if 'left != NULL' and 'right == NULL' which
will later cause a NULL ptr deref:

src/runtime/parser.c:842:14: warning: Access to field 'ref_count' results in a dereference of a null pointer (loaded from variable 'root')
      assert(root->ref_count > 0);
             ^~~~~~~~~~~~~~~
2017-06-14 11:12:06 -04:00
Phil Turnbull
da099d0bbe Prevent NULL pointer dereference in parser__repair_error_callback
Because repair_reduction_count is unsigned, the default of '-1' is 0xffffffff
and will cause the loop to be entered if repair_reduction_count is NULL:

src/runtime/parser.c:691:11: warning: Dereference of null pointer
      if (repair_reductions[j].params.symbol == repair->symbol) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-06-14 11:12:06 -04:00
Phil Turnbull
fdd8792ebc Correctly set is_first
From scan-build: Value stored to 'is_first' is never read
2017-06-14 11:12:06 -04:00
Phil Turnbull
c58f6401d0 Non-terminal entries always have valid state-ids 2017-06-14 08:49:38 -04:00
Phil Turnbull
577e43f653 shift-extra actions do not have valid state_ids 2017-06-09 16:26:01 -04:00
Phil Turnbull
18ba6ebbd7 Move state_id check into each_referenced_state 2017-06-09 16:25:59 -04:00
Phil Turnbull
6897530c47 Check for invalid state indexes
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:

==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
    #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
    #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
    #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
    #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
    #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
    #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
    #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
    #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
  0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2017-06-07 17:23:44 -04:00
Phil Turnbull
dee86f908a Correctly check type is ParseActionTypeRecover 2017-06-07 17:05:39 -04:00
Max Brunsfeld
7b401de5a6 Don't use pointer equality to compare external token states 2017-05-03 09:57:09 -07:00
Max Brunsfeld
e8a9bb7a51 🎨 Extract parser__halt_parse function 2017-05-01 14:41:55 -07:00
Max Brunsfeld
74f5ceddf7 Fix parsing of valid code with halt_on_error flag set
Signed-off-by: Tim Clem <timothy.clem@gmail.com>
2017-05-01 14:25:25 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00
Max Brunsfeld
704c2d5907 Fix lookahead_char type in ts_tree_make_error function 2017-04-27 14:49:04 -07:00
Timothy Clem
91558f0a0e utf8proc_iterate can set codepoint_ref to -1 and returns negative error 2017-04-27 14:46:36 -07:00
Rob Rix
3a888b1623 Define a function providing the type of a given symbol. 2017-04-12 09:47:51 -04:00
joshvera
f76935cc7e just make it static 2017-03-24 18:38:21 -04:00
joshvera
6938b288a5 Make external scanner symbol map unique 2017-03-24 14:51:37 -04:00
Max Brunsfeld
1f908324dc Prevent infinite loop in skip_preceding_trees error recovery strategy 2017-03-21 12:14:44 -07:00
Max Brunsfeld
7e13eac296 Fix lookahead_char type in ts_tree_make_error function 2017-03-21 11:05:48 -07:00
Max Brunsfeld
63fb041961 Merge remote-tracking branch 'origin/check-utf8proc_iterate-return' into update-fixture-grammars 2017-03-21 09:59:35 -07:00
Timothy Clem
f394a48c0b utf8proc_iterate can set codepoint_ref to -1 and returns negative error 2017-03-20 16:54:19 -07:00
Max Brunsfeld
ed31e82ee6 Skip empty tokens when recovering from errors 2017-03-19 22:20:59 -07:00
Max Brunsfeld
ed8fbff175 Allow anonymous tokens to be used in grammars' external token lists 2017-03-17 16:31:29 -07:00
Max Brunsfeld
b3edd8f749 Remove use of shared_ptr in choice, repeat, and seq factories 2017-03-17 14:28:13 -07:00
Max Brunsfeld
d9fb863bea Fix build errors w/ gcc 2017-03-17 14:03:49 -07:00
Max Brunsfeld
416cbb9def Add missing cassert includes 2017-03-17 13:54:40 -07:00
Max Brunsfeld
90d21adf3b Format make_visitor helper consistently w/ project 2017-03-17 13:37:26 -07:00
Max Brunsfeld
db4b9ebc7c Implement Rule as a union rather than an abstract base class 2017-03-17 13:29:31 -07:00
Max Brunsfeld
d222dbb9fd Allow lexer to accept tokens that ended at previous positions
* Track lookahead in each tree
* Add 'mark_end' API that external scanners can use
2017-03-13 17:06:52 -07:00
Max Brunsfeld
f04d7c5860 Handle unused tokens 2017-03-09 21:16:37 -08:00
Max Brunsfeld
c79fae6d21 Clean up extract_tokens function 2017-03-09 21:16:20 -08:00
Max Brunsfeld
f049d5d94c Make ParseItem a struct, not a class 2017-03-08 21:06:30 -08:00
Max Brunsfeld
64e9230071 Use LexTableBuilder to detect conflicts between tokens more correctly 2017-03-08 12:47:38 -08:00
Max Brunsfeld
abf8a4f2c2 🎨 2017-03-01 22:15:26 -08:00
Max Brunsfeld
686dc0997c Avoid introducing certain lexical conflicts during parse state merging
The current pretty conservative approach is to avoid merging parse states which
would cause a pair tokens to co-exist for the first time in any parse state,
where the two tokens can start with the same character and at least one of the
tokens can contain a character which is part of the grammar's separators.
2017-02-27 22:54:38 -08:00
Max Brunsfeld
3c8e6f9987 Restructure parse state merging logic
* Remove remnants of templatized remove_duplicate_states function
* Rename recovery_tokens function to get_compatible_tokens and augment it
  also compute pairs of tokens which could potentially be incompatible
2017-02-26 12:23:48 -08:00
Max Brunsfeld
df520635c6 Prevent crash due to huge number of possible paths through parse stack 2017-02-20 14:34:10 -08:00
Max Brunsfeld
cefc57fe86 Move error cost comparisons into their own source file 2017-02-19 21:54:06 -08:00
Max Brunsfeld
5b4e6df3ff Don't mark error nodes created in the error state as extras 2017-02-19 21:54:06 -08:00
Max Brunsfeld
c14a776a3d Avoid including trailing extra tokens within error nodes unnecessarily 2017-02-19 21:21:54 -08:00
Max Brunsfeld
135d8ef4e0 Merge pull request #58 from tree-sitter/reduce-error-recovery-branching
Reduce the branching factor of the parse stack during error recovery
2017-02-18 11:34:09 -08:00
Rob Rix
638aa87e42 Pass through to ts_string_input_make_with_length. 2017-02-10 09:27:21 -05:00
Rob Rix
eab518e5da Semicolon shame. 2017-02-10 09:20:58 -05:00
Rob Rix
c230658bae Add public API to set the input string with explicit length. 2017-02-10 09:10:31 -05:00