Commit graph

1351 commits

Author SHA1 Message Date
Phil Turnbull
e7662c2213 Handle out-of-bound read in utf16_iterate
Also simplify the test so we call `utf16_iterate` directly. Calling
`utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably
trigger the problem using valgrind.

valgrind also doesn't detect the problem if we use a string literal like:
  `utf16_iterate("", 1, &code_point);`
2017-07-17 13:57:12 -07:00
Max Brunsfeld
66dc12587a Call the external scanner whenever an external token is valid
For some reason, there was previously some extra logic that prevented
the external scanner from being invoked if the only valid external
token also had an internal definition.

It's surprising to not call the external scanner if an external
token is valid.
2017-07-17 10:28:59 -07:00
Max Brunsfeld
99885788bc 🎨 2017-07-14 10:41:09 -07:00
Max Brunsfeld
4b40a1ed6c Support anonymous tokens inside of RENAME rules 2017-07-14 10:19:58 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
0b94e9d814 Don't include preceding production steps in ParseItem hash 2017-07-13 13:42:28 -07:00
Max Brunsfeld
561821d011 Remove precedence and associativity methods from ParseAction 2017-07-13 13:41:56 -07:00
Max Brunsfeld
d646889922 Simplify flatten_rule function 2017-07-13 09:59:23 -07:00
Max Brunsfeld
7293e6f0cc Fix compile warnings 2017-07-12 22:08:36 -07:00
Max Brunsfeld
62c577af33 Remove unnecessary using statements 2017-07-12 21:41:37 -07:00
Max Brunsfeld
a3006bc2b5 Represent LookaheadSet using vectors of bool 2017-07-12 16:02:01 -07:00
Max Brunsfeld
65bf1389e1 Add a way to automatically inline rules 2017-07-11 23:13:44 -07:00
Max Brunsfeld
26a25278cd When comparing parse items, ignore consumed part of their productions
This speeds up parser generation by increasing the likelihood that we'll recognize
parse item sets as equivalent in advance, rather than having to merge their states
after the fact.
2017-07-11 17:30:32 -07:00
Max Brunsfeld
a199b217f3 Optimize ParseTableBuilder for non-terminals w/ many productions 2017-07-11 12:54:29 -07:00
Max Brunsfeld
68c3ba1b8b 🎨 merge_parse_state 2017-07-10 16:46:11 -07:00
Max Brunsfeld
5bd5b4bb05 Replace <cctype> -> <cwctype> 2017-07-10 14:35:14 -07:00
Max Brunsfeld
59236d2ed1 Avoid redundant character comparisons in generated lex function 2017-07-10 14:09:31 -07:00
Max Brunsfeld
2755b07222 Don't store unfinished item signature on ParseStates 2017-07-10 10:47:38 -07:00
Max Brunsfeld
1586d70cbe Compute conflicting tokens more precisely
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
2017-07-07 17:54:24 -07:00
Max Brunsfeld
a98abde529 Provide all preceding symbols as context when reporting conflicts 2017-07-07 14:52:56 -07:00
Max Brunsfeld
c91ceaaa8d 🎨 build_parse_table 2017-07-07 14:52:45 -07:00
Max Brunsfeld
0de93b3bf2 Allow negative dynamic precedences 2017-07-06 22:21:59 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
8f028ebf68 Avoid deep tree comparison when both trees have errors 2017-07-05 17:33:35 -07:00
Max Brunsfeld
782bf48772 Don't do skip_preceding_subtrees recovery when there are lots of versions 2017-07-05 15:34:19 -07:00
Max Brunsfeld
17bc3dfaf7 Add a benchmark command
This command measures the speed of parsing each grammar's examples.
It also uses each grammar to parse all of the *other* grammars' examples
in order to measure error recovery performance with fairly large files.
2017-07-05 14:14:38 -07:00
Max Brunsfeld
d322f0b6a7 🎨 2017-07-04 21:59:54 -07:00
Max Brunsfeld
e7ccd9c17c Put back check for better existing versions during recover action
When checking for better existing versions, only kill the current
version if there is a better version earlier in the rotation.
2017-07-03 12:27:23 -07:00
Max Brunsfeld
f93f78ef2d Remove version-pruning criteria based on pushed node count 2017-07-02 23:42:23 -07:00
Max Brunsfeld
f722923493 Limit the search depth for skip_preceding_trees recovery 2017-06-30 17:49:09 -07:00
Max Brunsfeld
89e5037f01 Manually tail-call-optimize stack_node_release function 2017-06-30 17:49:09 -07:00
Max Brunsfeld
eccb3893eb Prune unneeded stack versions based on a depth criteria 2017-06-30 17:49:09 -07:00
Max Brunsfeld
d6579956f5 Enforce a hard version count limit during the recovery action 2017-06-30 17:49:09 -07:00
Max Brunsfeld
061fba6b92 🎨 Just call it 'inline' 2017-06-29 16:49:59 -07:00
Max Brunsfeld
a89322c5f1 Remove unneeded parameters from public interface of stack_iterate callback 2017-06-29 16:43:56 -07:00
Max Brunsfeld
009d6d1534 Improve heuristics for pruning parse versions based on errors
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
2017-06-29 15:00:20 -07:00
Max Brunsfeld
445be0736a Clean up ts_stack_push function 2017-06-29 15:00:20 -07:00
Max Brunsfeld
66be393b78 Stack - consider empty external token state identical to NULL 2017-06-29 15:00:20 -07:00
Phil Turnbull
5ef9c4d6aa Increase size of ref_counts and add assertions
This bumps the size of the reference counts from 16- to 32-bit counters to make
it less likely to overflow. Also assert in the retain function that the
reference count didn't overflow.

32-bits seems big enough for non-pathological examples but a more fool-proof
fix may be to bump it to 64-bits.
2017-06-29 06:39:01 -07:00
Max Brunsfeld
dfd7b1f5f6 Consolidate memory management logic in Stack 2017-06-27 16:17:24 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00
Max Brunsfeld
f678018d3d Avoid use-after-free when copying stack iterators 2017-06-27 14:54:27 -07:00
Max Brunsfeld
076002a01e Merge pull request #78 from philipturnbull/update-utf8proc
Out of bounds read in utf8proc
2017-06-23 12:18:21 -07:00
Max Brunsfeld
f62ee5a0f3 Fix OOB reads at ends of chunks
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:09:16 -07:00
Max Brunsfeld
8ee3f96960 Fix formatting of non-ascii unexpected characters
Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-06-23 12:08:50 -07:00
Max Brunsfeld
20982fdcb9 Mark tokens as non-reusable in states where shorter takes take precedence
This fixes some randomized test failures in the C grammar, relating to Object-like macros.
The object-like macro rule relies on a whitespace token in order to distinguish object-like
macros whose values begin with a '(' from function-like macros. The presence of that
whitespace token means that other nodes should not be reusable in that state.
2017-06-22 16:04:42 -07:00
Max Brunsfeld
8517313a45 🎨 2017-06-22 15:33:07 -07:00
Max Brunsfeld
8157b81b68 Improve logic for short-circuiting trivial lexing conflict detection 2017-06-22 15:33:01 -07:00
Max Brunsfeld
2c043803f1 Be more conservative about avoiding lexing conflicts when merging states
This fixes a bug in the C++ grammar where the `>>` token was merged into
a state where it was previously not valid, but the `>` token *was*
valid. This caused nested templates like -

std::vector<std::pair<int, int>>

to not parse correctly.
2017-06-22 15:32:13 -07:00
Max Brunsfeld
513edec7c1 Merge pull request #77 from philipturnbull/scan-build-fixes
Fix errors found by scan-build
2017-06-20 10:15:20 -07:00