Commit graph

111 commits

Author SHA1 Message Date
Max Brunsfeld
36c2b685b9 Always invalidate old chunk of text when parsing after an edit 2017-10-04 15:09:46 -07:00
Max Brunsfeld
c0073c5b72 Update error corpus to reflect C grammar changes 2017-10-04 15:06:12 -07:00
Max Brunsfeld
d342b61ede Re-enable JS fuzzing example test 2017-09-14 11:39:08 -07:00
Max Brunsfeld
9d67a98510 Merge pull request #103 from tree-sitter/python-assertion-failure
Assertion failure in parser__advance
2017-09-14 11:38:22 -07:00
Max Brunsfeld
d291af9a31 Refactor error comparisons
* Deal with mergeability outside of error comparison function
* Make `better_version_exists` function pure (don't halt other versions
as a side effect).
* Tweak error comparison logic

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-13 16:38:15 -07:00
Phil Turnbull
d9a0fbc210 Add testcase for parser__advance assertion failure
The python testcase decodes to:
```
00000000  35 63 6f 6e 88 2c 29 33  2c 2c 2c 2c 63 6f 6e 88  |5con.,)3,,,,con.|
00000010  2c 2a 2c 3a 35 63 6f 6e  2c                       |,*,:5con,|
```

which triggers:
```
Assertion failed: ((uint32_t)0 < (&reduction.slices)->size), function parser__advance, file src/runtime/parser.c, line 1202.
```
2017-09-13 13:25:31 -04:00
Max Brunsfeld
65ed4281d4 Exclude zeros from speeds reported in benchmarks 2017-09-12 16:30:38 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
8b3941764f Make outstanding_allocation_indices return a vector, not a set 2017-09-07 17:48:44 -07:00
Max Brunsfeld
9d668c5004 Move incompatible token map into LexTableBuilder 2017-08-31 15:46:37 -07:00
Max Brunsfeld
4daf22ba0c Read files in binary mode in tests 2017-08-09 10:07:03 -07:00
Max Brunsfeld
90eef13aeb Fix windows directory listing function 2017-08-08 22:21:39 -07:00
Max Brunsfeld
d4b2c58dc8 Fix another hard-coded / as a path separator 2017-08-08 22:08:27 -07:00
Max Brunsfeld
f6325746aa Provide symbol metadata with dummy language in stack test 2017-08-08 17:47:24 -07:00
Max Brunsfeld
d0dc164013 Disable the default behavior of printing a logo, Microsoft. 2017-08-08 17:35:50 -07:00
Max Brunsfeld
0919f5588b Disable fuzzing example test for now 2017-08-08 17:35:16 -07:00
Max Brunsfeld
0ba6188bab Construct paths portably in test_grammars.cc 2017-08-08 17:25:43 -07:00
Max Brunsfeld
7587353ab6 Avoid unicode literals in tests
MSVC tries to normalize them based on the current locale.
2017-08-08 17:10:08 -07:00
Max Brunsfeld
17b58f41e1 Disable optimization when compiling grammars during tests 2017-08-08 17:09:34 -07:00
Max Brunsfeld
17310769a4 Fix usage of .so instead of .dll on windows 2017-08-08 15:20:38 -07:00
Max Brunsfeld
bb7889fac5 Don't rely on PWD env var on windows 2017-08-08 15:04:27 -07:00
Max Brunsfeld
34b5340d71 Fix paths to corpus files on windows 2017-08-08 14:06:11 -07:00
Max Brunsfeld
cc7277fd7d Avoid using IsNull bandit assertion 2017-08-08 12:52:35 -07:00
Max Brunsfeld
b43ae2826b Use C++ stdlib for random number generation 2017-08-08 12:42:49 -07:00
Max Brunsfeld
fc0f49e4ee Add windows implementations of some IO-related test helpers 2017-08-08 12:21:17 -07:00
Max Brunsfeld
947c161c2f Use a constructor rather than aggregate initialization for Production 2017-08-08 10:41:54 -07:00
Max Brunsfeld
230f89d0ff Fix build warnings in tests 2017-08-07 12:19:10 -07:00
Max Brunsfeld
94dc703bfc Require that grammars' start rules be visible 2017-08-04 17:07:37 -07:00
Max Brunsfeld
1dca3a0b58 Simplify parse version reordering 2017-08-04 14:51:14 -07:00
Max Brunsfeld
e5c3bf742d Update fixture grammars 2017-08-03 16:32:39 -07:00
Max Brunsfeld
84e4114f79 Allow conflicts involving repeat rules to be whitelisted, via their parent rule 2017-08-03 15:18:29 -07:00
Max Brunsfeld
119c67dd78 Fix conflict reporting for shift/reduce conflicts w/ multiple reductions
We were failing to rule out shift actions with lower precedence.

Signed-off-by: Philip Turnbull <philipturnbull@github.com>
2017-08-02 15:13:30 -07:00
Max Brunsfeld
09f4796f6b Get tests passing w/ new alias API 2017-08-01 14:35:34 -07:00
Max Brunsfeld
cb5fe80348 Rename RENAME rule to ALIAS, allow it to create anonymous nodes 2017-07-31 16:41:11 -07:00
Max Brunsfeld
cbdfd89675 Mark reductions as fragile based on their final properties
We previously maintained a set of individual productions that were
involved in conflicts, but that was subtly incorrect because
we don't compare productions themselves when comparing parse items;
we only compare the parse items properties that could affect the
final reduce actions.
2017-07-21 09:54:24 -07:00
Max Brunsfeld
7d9d8bce79 Handle inlined rules that contain other inlined rules 2017-07-20 15:29:06 -07:00
Max Brunsfeld
f33421c53e Fix incorrect node renames in the presence of extra tokens 2017-07-18 21:24:34 -07:00
Max Brunsfeld
10d28d4b56 Merge pull request #92 from tree-sitter/utf16-oob
Add test for UTF16 out-of-bound read
2017-07-18 17:24:31 -07:00
Phil Turnbull
52cec9ed39 Rework SpyInput buffer handling
SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for
catching logic errors but defeats valgrind's memory tracking. Use a separate
buffer of exactly the correct size for each request. This correctly catches the
problem under valgrind:

```
==8694== Invalid read of size 2
==8694==    at 0x54EFFB: utf16_iterate (utf16.c:10)
==8694==    by 0x551126: ts_lexer__get_lookahead (lexer.c:54)
==8694==    by 0x5515CD: ts_lexer_start (lexer.c:154)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
==8694==    by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267)
==8694==  Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd
==8694==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8694==    by 0x507C3E: SpyInput::read(void*, unsigned int*) (spy_input.cc:66)
==8694==    by 0x55103D: ts_lexer__get_chunk (lexer.c:29)
==8694==    by 0x5515B6: ts_lexer_start (lexer.c:152)
==8694==    by 0x54699F: parser(long,...)(long long) (parser.c:297)
==8694==    by 0x54788A: parser__get_lookahead (parser.c:439)
==8694==    by 0x54B2D3: parser__advance (parser.c:1150)
==8694==    by 0x54C2AA: parser_parse (parser.c:1348)
==8694==    by 0x53F063: ts_document_parse_with_options (document.c:136)
==8694==    by 0x53EF43: ts_document_parse (document.c:107)
==8694==    by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82)
==8694==    by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871)
```
2017-07-18 12:16:37 -07:00
Max Brunsfeld
afb499bf2e Handle rename symbols in ts_language APIs 2017-07-18 12:01:52 -07:00
Max Brunsfeld
de17c92462 Fix setup in stack test 2017-07-18 08:21:35 -07:00
Max Brunsfeld
085d96d89d Add bash examples to benchmarks 2017-07-17 17:50:04 -07:00
Max Brunsfeld
45c40c8742 Update test grammars to use new serialization API 2017-07-17 17:46:46 -07:00
Max Brunsfeld
9a04231ab1 Remove length restriction in external scanner serialization API 2017-07-17 17:12:36 -07:00
Phil Turnbull
e7662c2213 Handle out-of-bound read in utf16_iterate
Also simplify the test so we call `utf16_iterate` directly. Calling
`utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably
trigger the problem using valgrind.

valgrind also doesn't detect the problem if we use a string literal like:
  `utf16_iterate("", 1, &code_point);`
2017-07-17 13:57:12 -07:00
Phil Turnbull
035abc1e15 Add test for UTF16 out-of-bound read
utf16_iterate does not check that 'length' is a multiple of two which leads to
an out-of-bound read:

==105293== Conditional jump or move depends on uninitialised value(s)
==105293==    at 0x54F014: utf16_iterate (utf16.c:7)
==105293==    by 0x539251: string_iterate(TSInputEncoding, unsigned char const*, unsigned long, int*) (encoding_helpers.cc:15)
==105293==    by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43)
==105293==    by 0x507BAD: SpyInput::read(void*, unsigned int*) (spy_input.cc:47)
==105293==    by 0x551049: ts_lexer__get_chunk (lexer.c:29)
==105293==    by 0x5515C2: ts_lexer_start (lexer.c:152)
==105293==    by 0x5469AB: parser(long,...)(long long) (parser.c:297)
==105293==    by 0x547896: parser__get_lookahead (parser.c:439)
==105293==    by 0x54B2DF: parser__advance (parser.c:1150)
==105293==    by 0x54C2B6: parser_parse (parser.c:1348)
==105293==    by 0x53F06F: ts_document_parse_with_options (document.c:136)
==105293==    by 0x53EF4F: ts_document_parse (document.c:107)
2017-07-17 12:34:39 -07:00
Max Brunsfeld
34279257f9 Merge pull request #91 from tree-sitter/libFuzzer
Add support for fuzzing with libFuzzer
2017-07-17 11:43:01 -07:00
Phil Turnbull
798ef5e4dc Add libFuzzer support
This adds support for fuzzing tree-sitter grammars with libFuzzer. This
currently only works on Linux because of linking issues on macOS. Breifly, the
AddressSanitizer library is dynamically linked into the fuzzer binary and
cannot be found at runtime if built with a compiler that wasn't provided by
Xcode(?). The runtime library is statically linked on Linux so this isn't a
problem.
2017-07-14 13:50:41 -07:00
Max Brunsfeld
a22386e408 Fix compiler warnings in flatten_grammar_test 2017-07-14 10:26:34 -07:00
Max Brunsfeld
4b40a1ed6c Support anonymous tokens inside of RENAME rules 2017-07-14 10:19:58 -07:00