Commit graph

1078 commits

Author SHA1 Message Date
Max Brunsfeld
8f028ebf68 Avoid deep tree comparison when both trees have errors 2017-07-05 17:33:35 -07:00
Max Brunsfeld
782bf48772 Don't do skip_preceding_subtrees recovery when there are lots of versions 2017-07-05 15:34:19 -07:00
Max Brunsfeld
17bc3dfaf7 Add a benchmark command
This command measures the speed of parsing each grammar's examples.
It also uses each grammar to parse all of the *other* grammars' examples
in order to measure error recovery performance with fairly large files.
2017-07-05 14:14:38 -07:00
Max Brunsfeld
d322f0b6a7 🎨 2017-07-04 21:59:54 -07:00
Max Brunsfeld
e7ccd9c17c Put back check for better existing versions during recover action
When checking for better existing versions, only kill the current
version if there is a better version earlier in the rotation.
2017-07-03 12:27:23 -07:00
Max Brunsfeld
f93f78ef2d Remove version-pruning criteria based on pushed node count 2017-07-02 23:42:23 -07:00
Max Brunsfeld
f722923493 Limit the search depth for skip_preceding_trees recovery 2017-06-30 17:49:09 -07:00
Max Brunsfeld
89e5037f01 Manually tail-call-optimize stack_node_release function 2017-06-30 17:49:09 -07:00
Max Brunsfeld
eccb3893eb Prune unneeded stack versions based on a depth criteria 2017-06-30 17:49:09 -07:00
Max Brunsfeld
d6579956f5 Enforce a hard version count limit during the recovery action 2017-06-30 17:49:09 -07:00
Max Brunsfeld
061fba6b92 🎨 Just call it 'inline' 2017-06-29 16:49:59 -07:00
Max Brunsfeld
a89322c5f1 Remove unneeded parameters from public interface of stack_iterate callback 2017-06-29 16:43:56 -07:00
Max Brunsfeld
009d6d1534 Improve heuristics for pruning parse versions based on errors
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
2017-06-29 15:00:20 -07:00
Max Brunsfeld
445be0736a Clean up ts_stack_push function 2017-06-29 15:00:20 -07:00
Max Brunsfeld
66be393b78 Stack - consider empty external token state identical to NULL 2017-06-29 15:00:20 -07:00
Phil Turnbull
5ef9c4d6aa Increase size of ref_counts and add assertions
This bumps the size of the reference counts from 16- to 32-bit counters to make
it less likely to overflow. Also assert in the retain function that the
reference count didn't overflow.

32-bits seems big enough for non-pathological examples but a more fool-proof
fix may be to bump it to 64-bits.
2017-06-29 06:39:01 -07:00
Max Brunsfeld
dfd7b1f5f6 Consolidate memory management logic in Stack 2017-06-27 16:17:24 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
f678018d3d Avoid use-after-free when copying stack iterators 2017-06-27 14:54:27 -07:00
Max Brunsfeld
076002a01e Merge pull request #78 from philipturnbull/update-utf8proc
Out of bounds read in utf8proc
2017-06-23 12:18:21 -07:00
Max Brunsfeld
f62ee5a0f3 Fix OOB reads at ends of chunks
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:09:16 -07:00
Max Brunsfeld
8ee3f96960 Fix formatting of non-ascii unexpected characters
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:08:50 -07:00
Max Brunsfeld
20982fdcb9 Mark tokens as non-reusable in states where shorter takes take precedence
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
2017-06-22 16:04:42 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Max Brunsfeld
513edec7c1 Merge pull request #77 from philipturnbull/scan-build-fixes
Fix errors found by scan-build
2017-06-20 10:15:20 -07:00
Max Brunsfeld
599367d36d Always recur into error nodes when reporting changed ranges 2017-06-15 17:06:48 -07:00
Max Brunsfeld
c66fddd3aa Add TSInput option to measure columns in bytes not characters 2017-06-15 16:35:34 -07:00
Phil Turnbull
cfca764d48 Root can never be NULL in this context 2017-06-15 07:47:16 -04:00
Max Brunsfeld
b862db766e Merge remote-tracking branch 'origin/master' into update-fixture-grammars 2017-06-14 17:11:44 -07:00
Phil Turnbull
d1b19e8196 Prevent NULL pointer dereference in parser__accept
parser__select_tree can return true if 'left != NULL' and 'right == NULL' which
will later cause a NULL ptr deref:

src/runtime/parser.c:842:14: warning: Access to field 'ref_count' results in a dereference of a null pointer (loaded from variable 'root')
      assert(root->ref_count > 0);
             ^~~~~~~~~~~~~~~
2017-06-14 11:12:06 -04:00
Phil Turnbull
da099d0bbe Prevent NULL pointer dereference in parser__repair_error_callback
Because repair_reduction_count is unsigned, the default of '-1' is 0xffffffff
and will cause the loop to be entered if repair_reduction_count is NULL:

src/runtime/parser.c:691:11: warning: Dereference of null pointer
      if (repair_reductions[j].params.symbol == repair->symbol) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-06-14 11:12:06 -04:00
Phil Turnbull
fdd8792ebc Correctly set is_first
From scan-build: Value stored to 'is_first' is never read
2017-06-14 11:12:06 -04:00
Phil Turnbull
c58f6401d0 Non-terminal entries always have valid state-ids 2017-06-14 08:49:38 -04:00
Phil Turnbull
577e43f653 shift-extra actions do not have valid state_ids 2017-06-09 16:26:01 -04:00
Phil Turnbull
18ba6ebbd7 Move state_id check into each_referenced_state 2017-06-09 16:25:59 -04:00
Phil Turnbull
6897530c47 Check for invalid state indexes
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read
when removing duplicate parse states. This was found by AddressSanitizer:

==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438
READ of size 8 at 0x6320000187f8 thread T0
    #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398
    #1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416
...
0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8)
allocated by thread T0 here:
    #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b)
    #1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169
    #2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074
    #3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068
    #4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378
    #5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85
...
SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const
Shadow bytes around the buggy address:
  0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2017-06-07 17:23:44 -04:00
Phil Turnbull
dee86f908a Correctly check type is ParseActionTypeRecover 2017-06-07 17:05:39 -04:00
Max Brunsfeld
7b401de5a6 Don't use pointer equality to compare external token states 2017-05-03 09:57:09 -07:00
Max Brunsfeld
e8a9bb7a51 🎨 Extract parser__halt_parse function 2017-05-01 14:41:55 -07:00
Max Brunsfeld
74f5ceddf7 Fix parsing of valid code with halt_on_error flag set
Signed-off-by: Tim Clem <timothy.clem@gmail.com>
2017-05-01 14:25:25 -07:00
Max Brunsfeld
a98d449d88 Add an option to immediately halt on syntax error 2017-05-01 13:50:49 -07:00
Max Brunsfeld
704c2d5907 Fix lookahead_char type in ts_tree_make_error function 2017-04-27 14:49:04 -07:00
Timothy Clem
91558f0a0e utf8proc_iterate can set codepoint_ref to -1 and returns negative error 2017-04-27 14:46:36 -07:00
Rob Rix
3a888b1623 Define a function providing the type of a given symbol. 2017-04-12 09:47:51 -04:00
joshvera
f76935cc7e just make it static 2017-03-24 18:38:21 -04:00
joshvera
6938b288a5 Make external scanner symbol map unique 2017-03-24 14:51:37 -04:00
Max Brunsfeld
1f908324dc Prevent infinite loop in skip_preceding_trees error recovery strategy 2017-03-21 12:14:44 -07:00
Max Brunsfeld
7e13eac296 Fix lookahead_char type in ts_tree_make_error function 2017-03-21 11:05:48 -07:00