Commit graph

150 commits

Author SHA1 Message Date
Max Brunsfeld
1ca261c79b Fix some regex parsing bugs
* Allow escape sequences to be used in ranges
* Don't give special meaning to dashes outside of character classes
2018-04-06 12:46:06 -07:00
Axel Hecht
345e344377 Tests for issue 158 2018-04-05 14:39:25 +02:00
Max Brunsfeld
dbe77e7199 Simplify testing-only ts_stack_iterate function 2018-03-29 17:50:07 -07:00
Max Brunsfeld
5520983144 Clean up Stack API
* Remove StackPopResult
* Rename top_state() -> state()
* Rename top_position() -> position()
* Improve docs
2018-03-29 17:37:54 -07:00
Max Brunsfeld
ee995c3d6b Avoid redundant retains/releases by giving ts_stack_push move semantics 2018-03-29 17:18:43 -07:00
Max Brunsfeld
186f70649c Consolidate the unify for detecting conflicting tokens 2018-03-28 10:03:09 -07:00
Max Brunsfeld
a8bc67ac42 Allow LookaheadSet::for_each to terminate early 2018-03-28 10:03:09 -07:00
Max Brunsfeld
43e14332ed Avoid creating duplicate metadata rules 2018-03-28 10:03:09 -07:00
Max Brunsfeld
b7d0606fbd Be less conservative in merging parse states with external tokens
Also, clean up the internal representation of external tokens
2018-03-16 16:00:40 -07:00
Max Brunsfeld
fe29173d5f
Merge pull request #142 from tree-sitter/fuzz-halt-recover
Add 'halt' and 'recover' modes to fuzzer
2018-03-14 09:28:58 -07:00
Phil Turnbull
269b1a0864 Update repo for libFuzzer
libFuzzer has now been broken out from LLVM and can be built separately
2018-03-12 13:08:58 -07:00
Max Brunsfeld
7183f8d3e7 Fix unit reduction elimination bugs
* Handle 'chains' of unit reductions starting in a single state
* Avoid eliminating rules which will later receive aliases
2018-03-12 07:54:18 -07:00
Max Brunsfeld
df2430b94c Remove a C error recovery test temporarily 2018-03-08 14:18:12 -08:00
Max Brunsfeld
0810971f3e 🔥 symbol iterator API
This idea was never fully baked.
2018-03-08 14:16:37 -08:00
Max Brunsfeld
e927d02f43 Allow reusing leaf nodes unless the next leaf has changes 2018-03-07 17:44:54 -08:00
Max Brunsfeld
52087de4f0 Remove the concept of fragile reductions
They were a vestige of when Tree-sitter did sentential form-based
incremental parsing (as opposed to simply state matching). This was
elegant but not compatible with GLR as far as I could tell.
2018-03-02 14:51:54 -08:00
Max Brunsfeld
07fa3eb386 Fix capture of corpus descriptions in integration tests 2018-03-02 11:27:41 -08:00
Max Brunsfeld
a8d539023d Handle subdirectories existing in parsers' examples folders 2018-03-02 11:04:08 -08:00
Phil Turnbull
bc192d95ca Build fuzzer in 'halt' and 'recover' modes
Build each language fuzzer in two modes (halt_on_error=true and
halt_on_error=false) and use different timeouts for each fuzzer.

Also merge the run-fuzzer and reproduce scripts so they use identical
values of ASAN_OPTIONS/UBSAN_OPTIONS/etc0
2018-03-02 10:13:13 -08:00
Max Brunsfeld
d3ac345644 When parsing corpus, anchor header pattern to line start 2018-03-02 09:46:33 -08:00
Max Brunsfeld
82c7e170b3 Fix case where loop was created in the parse stack
Fixes #133
2018-03-02 09:05:20 -08:00
Max Brunsfeld
2daae48fe0 Handle conflicts in repeat rules after external tokens
Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2018-02-14 11:24:51 -08:00
Max Brunsfeld
facafcd6e4 Pass row/column position to input seek method 2018-02-14 07:31:49 -08:00
Max Brunsfeld
299a146b66 Balance repetition trees after parsing 2018-02-12 11:41:56 -08:00
Max Brunsfeld
8c29841adf Represent repetitions with associative structure 2018-02-12 11:41:56 -08:00
Max Brunsfeld
46dcd53090 Do not insert missing tokens if halt_on_error option is passed 2018-01-24 14:04:55 -08:00
Max Brunsfeld
b520bdd2d5
Merge pull request #126 from tree-sitter/mb-fix-epsilon-rule-loophole
Don't allow an epsilon start rule if it is used in other rules
2018-01-23 17:19:55 -08:00
Max Brunsfeld
2e4f76c164 Don't allow an epsilon start rule if it is used in other rules 2018-01-23 17:05:28 -08:00
Max Brunsfeld
315dff3285 Add an API for getting a node's child index 2018-01-09 14:01:36 -08:00
Max Brunsfeld
f653f2b3bb Add ts_node_first_{child,named_child}_for_byte methods 2018-01-09 13:44:59 -08:00
Max Brunsfeld
13adfe4927 Update error corpus 2017-12-29 16:21:04 -08:00
Max Brunsfeld
6304a3bcd1 Make it easier to run tests with debug graphs 2017-12-28 12:41:23 -08:00
Max Brunsfeld
addeb6c4c1 Allocate and free trees using an object pool 2017-12-27 10:34:29 -08:00
Max Brunsfeld
0e69da37a5 Return a character count from the lexer's get_column method 2017-12-20 16:26:38 -08:00
Max Brunsfeld
e5851fd9b9 Don't use non-existent \a syntax in test grammars 2017-12-13 12:21:28 -08:00
Max Brunsfeld
f426b61e7c Fix expectation around preproc directive in C error test 2017-12-13 12:21:13 -08:00
Max Brunsfeld
fbcefe25f7 Avoid creating external tokens that start after they end 2017-12-07 11:50:27 -08:00
Max Brunsfeld
90629bd45a Add some assertions to the fixture grammar tests 2017-12-07 11:50:27 -08:00
Max Brunsfeld
493db39363 Never move the start rule of a grammar into the lexical grammar
This preserves a useful invariant that the root node of the AST is never
a token.
2017-12-07 11:50:27 -08:00
Max Brunsfeld
36c2b685b9 Always invalidate old chunk of text when parsing after an edit 2017-10-04 15:09:46 -07:00
Max Brunsfeld
c0073c5b72 Update error corpus to reflect C grammar changes 2017-10-04 15:06:12 -07:00
Max Brunsfeld
d342b61ede Re-enable JS fuzzing example test 2017-09-14 11:39:08 -07:00
Max Brunsfeld
9d67a98510 Merge pull request #103 from tree-sitter/python-assertion-failure
Assertion failure in parser__advance
2017-09-14 11:38:22 -07:00
Max Brunsfeld
d291af9a31 Refactor error comparisons
* Deal with mergeability outside of error comparison function
* Make `better_version_exists` function pure (don't halt other versions
as a side effect).
* Tweak error comparison logic

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-13 16:38:15 -07:00
Phil Turnbull
d9a0fbc210 Add testcase for parser__advance assertion failure
The python testcase decodes to:
```
00000000  35 63 6f 6e 88 2c 29 33  2c 2c 2c 2c 63 6f 6e 88  |5con.,)3,,,,con.|
00000010  2c 2a 2c 3a 35 63 6f 6e  2c                       |,*,:5con,|
```

which triggers:
```
Assertion failed: ((uint32_t)0 < (&reduction.slices)->size), function parser__advance, file src/runtime/parser.c, line 1202.
```
2017-09-13 13:25:31 -04:00
Max Brunsfeld
65ed4281d4 Exclude zeros from speeds reported in benchmarks 2017-09-12 16:30:38 -07:00
Max Brunsfeld
99d048e016 Simplify error recovery; eliminate recovery states
The previous approach to error recovery relied on special error-recovery
states in the parse table. For each token T, there was an error recovery
state in which the parser looked for *any* token that could follow T.
Unfortunately, sometimes the set of tokens that could follow T contained
conflicts. For example, in JS, the token '}' can be followed by the
open-ended 'template_chars' token, but also by ordinary tokens like
'identifier'. So with the old algorithm, when recovering from an
unexpected '}' token, the lexer had no way to distinguish identifiers
from template_chars.

This commit drops the error recovery states. Instead, when we encounter
an unexpected token T, we recover from the error by finding a previous
state S in the stack in which T would be valid, popping all of the nodes
after S, and wrapping them in an error.

This way, the lexer is always invoked in a normal parse state, in which
it is looking for a non-conflicting set of tokens. Eliminating the error
recovery states also shrinks the lex state machine significantly.

Signed-off-by: Rick Winfrey <rewinfrey@github.com>
2017-09-11 15:22:52 -07:00
Max Brunsfeld
8b3941764f Make outstanding_allocation_indices return a vector, not a set 2017-09-07 17:48:44 -07:00
Max Brunsfeld
9d668c5004 Move incompatible token map into LexTableBuilder 2017-08-31 15:46:37 -07:00
Max Brunsfeld
4daf22ba0c Read files in binary mode in tests 2017-08-09 10:07:03 -07:00