Commit graph

85 commits

Author SHA1 Message Date
Max Brunsfeld
230f89d0ff Fix build warnings in tests 2017-08-07 12:19:10 -07:00
Max Brunsfeld
94dc703bfc Require that grammars' start rules be visible 2017-08-04 17:07:37 -07:00
Max Brunsfeld
1dca3a0b58 Simplify parse version reordering 2017-08-04 14:51:14 -07:00
Max Brunsfeld
e5c3bf742d Update fixture grammars 2017-08-03 16:32:39 -07:00
Max Brunsfeld
84e4114f79 Allow conflicts involving repeat rules to be whitelisted, via their parent rule 2017-08-03 15:18:29 -07:00
Max Brunsfeld
119c67dd78 Fix conflict reporting for shift/reduce conflicts w/ multiple reductions
We were failing to rule out shift actions with lower precedence.

Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-08-02 15:13:30 -07:00
Max Brunsfeld
09f4796f6b Get tests passing w/ new alias API 2017-08-01 14:35:34 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
cbdfd89675 Mark reductions as fragile based on their final properties
We previously maintained a set of individual productions that were
involved in conflicts, but that was subtly incorrect because
we don't compare productions themselves when comparing parse items;
we only compare the parse items properties that could affect the
final reduce actions.
2017-07-21 09:54:24 -07:00
Max Brunsfeld
7d9d8bce79 Handle inlined rules that contain other inlined rules 2017-07-20 15:29:06 -07:00
Max Brunsfeld
f33421c53e Fix incorrect node renames in the presence of extra tokens 2017-07-18 21:24:34 -07:00
Max Brunsfeld
10d28d4b56 Merge pull request #92 from tree-sitter/utf16-oob
Add test for UTF16 out-of-bound read
2017-07-18 17:24:31 -07:00
Phil Turnbull
52cec9ed39 Rework SpyInput buffer handling
SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for
catching logic errors but defeats valgrind's memory tracking. Use a separate
buffer of exactly the correct size for each request. This correctly catches the
problem under valgrind:

```
==8694== Invalid read of size 2
==8694==    at 0x54EFFB: utf16_iterate (utf16.c:10)
==8694==    by 0x551126: ts_lexer__get_lookahead (lexer.c:54)
==8694==    by 0x5515CD: ts_lexer_start (lexer.c:154)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
==8694==    by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267)
==8694==  Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd
==8694==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8694==    by 0x507C3E: SpyInput::read(void*, unsigned int*) (spy_input.cc:66)
==8694==    by 0x55103D: ts_lexer__get_chunk (lexer.c:29)
==8694==    by 0x5515B6: ts_lexer_start (lexer.c:152)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
```
2017-07-18 12:16:37 -07:00
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
de17c92462 Fix setup in stack test 2017-07-18 08:21:35 -07:00
Max Brunsfeld
085d96d89d Add bash examples to benchmarks 2017-07-17 17:50:04 -07:00
Max Brunsfeld
45c40c8742 Update test grammars to use new serialization API 2017-07-17 17:46:46 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Phil Turnbull
e7662c2213 Handle out-of-bound read in utf16_iterate
Also simplify the test so we call `utf16_iterate` directly. Calling
`utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably
trigger the problem using valgrind.

valgrind also doesn't detect the problem if we use a string literal like:
  `utf16_iterate("", 1, &code_point);`
2017-07-17 13:57:12 -07:00
Phil Turnbull
035abc1e15 Add test for UTF16 out-of-bound read
utf16_iterate does not check that 'length' is a multiple of two which leads to
an out-of-bound read:

==105293== Conditional jump or move depends on uninitialised value(s)
==105293==    at 0x54F014: utf16_iterate (utf16.c:7)
==105293==    by 0x539251: string_iterate(TSInputEncoding, unsigned char const*, unsigned long, int*) (encoding_helpers.cc:15)
==105293==    by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43)
==105293==    by 0x507BAD: SpyInput::read(void*, unsigned int*) (spy_input.cc:47)
==105293==    by 0x551049: ts_lexer__get_chunk (lexer.c:29)
==105293==    by 0x5515C2: ts_lexer_start (lexer.c:152)
==105293==    by 0x5469AB: parser(long,...)(long long) (parser.c:297)
==105293==    by 0x547896: parser__get_lookahead (parser.c:439)
==105293==    by 0x54B2DF: parser__advance (parser.c:1150)
==105293==    by 0x54C2B6: parser_parse (parser.c:1348)
==105293==    by 0x53F06F: ts_document_parse_with_options (document.c:136)
==105293==    by 0x53EF4F: ts_document_parse (document.c:107)
2017-07-17 12:34:39 -07:00
Max Brunsfeld
34279257f9 Merge pull request #91 from tree-sitter/libFuzzer
Add support for fuzzing with libFuzzer
2017-07-17 11:43:01 -07:00
Phil Turnbull
798ef5e4dc Add libFuzzer support
This adds support for fuzzing tree-sitter grammars with libFuzzer. This
currently only works on Linux because of linking issues on macOS. Breifly, the
AddressSanitizer library is dynamically linked into the fuzzer binary and
cannot be found at runtime if built with a compiler that wasn't provided by
Xcode(?). The runtime library is statically linked on Linux so this isn't a
problem.
2017-07-14 13:50:41 -07:00
Max Brunsfeld
a22386e408 Fix compiler warnings in flatten_grammar_test 2017-07-14 10:26:34 -07:00
Max Brunsfeld
4b40a1ed6c Support anonymous tokens inside of RENAME rules 2017-07-14 10:19:58 -07:00
Max Brunsfeld
b3a72954ff Introduce RENAME rule type 2017-07-13 17:17:22 -07:00
Max Brunsfeld
7293e6f0cc Fix compile warnings 2017-07-12 22:08:36 -07:00
Max Brunsfeld
a3006bc2b5 Represent LookaheadSet using vectors of bool 2017-07-12 16:02:01 -07:00
Max Brunsfeld
e4f57d6fee Test more cases in fixture grammar with inline rules 2017-07-12 10:12:42 -07:00
Max Brunsfeld
5c8f7c035e Add stream operator for ParseItemSet 2017-07-12 09:42:56 -07:00
Max Brunsfeld
65bf1389e1 Add a way to automatically inline rules 2017-07-11 23:13:44 -07:00
Max Brunsfeld
26a25278cd When comparing parse items, ignore consumed part of their productions
This speeds up parser generation by increasing the likelihood that we'll recognize
parse item sets as equivalent in advance, rather than having to merge their states
after the fact.
2017-07-11 17:30:32 -07:00
Max Brunsfeld
59236d2ed1 Avoid redundant character comparisons in generated lex function 2017-07-10 14:09:31 -07:00
Max Brunsfeld
1586d70cbe Compute conflicting tokens more precisely
While generating the parse table, keep track of which tokens can follow one another.
Then use this information to evaluate token conflicts more precisely. This will
result in a smaller parse table than the previous, overly-conservative approach.
2017-07-07 17:54:24 -07:00
Max Brunsfeld
a98abde529 Provide all preceding symbols as context when reporting conflicts 2017-07-07 14:52:56 -07:00
Max Brunsfeld
d8e9d04fe7 Add PREC_DYNAMIC rule for resolving runtime ambiguities 2017-07-06 15:24:45 -07:00
Max Brunsfeld
cb652239f6 Add missing semicolons in flatten_grammar test 2017-07-06 12:48:50 -07:00
Max Brunsfeld
96068bbacb Represent all speeds as size_t in benchmarks 2017-07-06 12:24:41 -07:00
Max Brunsfeld
8c005cddc6 Print average and worst speeds in benchmark command 2017-07-06 10:12:22 -07:00
Max Brunsfeld
8f028ebf68 Avoid deep tree comparison when both trees have errors 2017-07-05 17:33:35 -07:00
Max Brunsfeld
c53f9bcbd9 Build benchmarks in Test mode for now 2017-07-05 17:27:50 -07:00
Max Brunsfeld
17bc3dfaf7 Add a benchmark command
This command measures the speed of parsing each grammar's examples.
It also uses each grammar to parse all of the *other* grammars' examples
in order to measure error recovery performance with fairly large files.
2017-07-05 14:14:38 -07:00
Max Brunsfeld
298228d8de Clean up test_grammars file 2017-07-05 11:39:33 -07:00
Max Brunsfeld
d322f0b6a7 🎨 2017-07-04 21:59:54 -07:00
Max Brunsfeld
f93f78ef2d Remove version-pruning criteria based on pushed node count 2017-07-02 23:42:23 -07:00
Max Brunsfeld
fcffd4b732 Add test for an example found during fuzzing 2017-06-30 21:55:50 -07:00
Max Brunsfeld
eccb3893eb Prune unneeded stack versions based on a depth criteria 2017-06-30 17:49:09 -07:00
Max Brunsfeld
a89322c5f1 Remove unneeded parameters from public interface of stack_iterate callback 2017-06-29 16:43:56 -07:00
Max Brunsfeld
009d6d1534 Improve heuristics for pruning parse versions based on errors
* Rewrite the error cost comparison in terms of explicit, discrete
conditions.
* Allow merging versions have different error costs.
* Store the depth of each stack version since the last error. Use this
state to prevent incorrect merging.
* Sort the stack versions in order of preference and put a hard limit on
the version count.
2017-06-29 15:00:20 -07:00
Max Brunsfeld
66be393b78 Stack - consider empty external token state identical to NULL 2017-06-29 15:00:20 -07:00
Max Brunsfeld
0143bfdad4 Avoid use-after-free of external token states
Previously, it was possible for references to external token states to
outlive the trees to which those states belonged.

Now, instead of storing references to external token states in the Stack
and in the Lexer, we store references to the external token trees
themselves, and we retain the trees to prevent use-after-free.
2017-06-27 14:54:27 -07:00