tree-sitter

Author	SHA1	Message	Date
Max Brunsfeld	36c2b685b9	Always invalidate old chunk of text when parsing after an edit	2017-10-04 15:09:46 -07:00
Max Brunsfeld	c0073c5b72	Update error corpus to reflect C grammar changes	2017-10-04 15:06:12 -07:00
Max Brunsfeld	d342b61ede	Re-enable JS fuzzing example test	2017-09-14 11:39:08 -07:00
Max Brunsfeld	9d67a98510	Merge pull request #103 from tree-sitter/python-assertion-failure Assertion failure in parser__advance	2017-09-14 11:38:22 -07:00
Max Brunsfeld	d291af9a31	Refactor error comparisons * Deal with mergeability outside of error comparison function * Make `better_version_exists` function pure (don't halt other versions as a side effect). * Tweak error comparison logic Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-13 16:38:15 -07:00
Phil Turnbull	d9a0fbc210	Add testcase for parser__advance assertion failure The python testcase decodes to: ``` 00000000 35 63 6f 6e 88 2c 29 33 2c 2c 2c 2c 63 6f 6e 88 \|5con.,)3,,,,con.\| 00000010 2c 2a 2c 3a 35 63 6f 6e 2c \|,*,:5con,\| ``` which triggers: ``` Assertion failed: ((uint32_t)0 < (&reduction.slices)->size), function parser__advance, file src/runtime/parser.c, line 1202. ```	2017-09-13 13:25:31 -04:00
Max Brunsfeld	65ed4281d4	Exclude zeros from speeds reported in benchmarks	2017-09-12 16:30:38 -07:00
Max Brunsfeld	99d048e016	Simplify error recovery; eliminate recovery states The previous approach to error recovery relied on special error-recovery states in the parse table. For each token T, there was an error recovery state in which the parser looked for any token that could follow T. Unfortunately, sometimes the set of tokens that could follow T contained conflicts. For example, in JS, the token '}' can be followed by the open-ended 'template_chars' token, but also by ordinary tokens like 'identifier'. So with the old algorithm, when recovering from an unexpected '}' token, the lexer had no way to distinguish identifiers from template_chars. This commit drops the error recovery states. Instead, when we encounter an unexpected token T, we recover from the error by finding a previous state S in the stack in which T would be valid, popping all of the nodes after S, and wrapping them in an error. This way, the lexer is always invoked in a normal parse state, in which it is looking for a non-conflicting set of tokens. Eliminating the error recovery states also shrinks the lex state machine significantly. Signed-off-by: Rick Winfrey <rewinfrey@github.com>	2017-09-11 15:22:52 -07:00
Max Brunsfeld	8b3941764f	Make outstanding_allocation_indices return a vector, not a set	2017-09-07 17:48:44 -07:00
Max Brunsfeld	9d668c5004	Move incompatible token map into LexTableBuilder	2017-08-31 15:46:37 -07:00
Max Brunsfeld	4daf22ba0c	Read files in binary mode in tests	2017-08-09 10:07:03 -07:00
Max Brunsfeld	90eef13aeb	Fix windows directory listing function	2017-08-08 22:21:39 -07:00
Max Brunsfeld	d4b2c58dc8	Fix another hard-coded / as a path separator	2017-08-08 22:08:27 -07:00
Max Brunsfeld	f6325746aa	Provide symbol metadata with dummy language in stack test	2017-08-08 17:47:24 -07:00
Max Brunsfeld	d0dc164013	Disable the default behavior of printing a logo, Microsoft.	2017-08-08 17:35:50 -07:00
Max Brunsfeld	0919f5588b	Disable fuzzing example test for now	2017-08-08 17:35:16 -07:00
Max Brunsfeld	0ba6188bab	Construct paths portably in test_grammars.cc	2017-08-08 17:25:43 -07:00
Max Brunsfeld	7587353ab6	Avoid unicode literals in tests MSVC tries to normalize them based on the current locale.	2017-08-08 17:10:08 -07:00
Max Brunsfeld	17b58f41e1	Disable optimization when compiling grammars during tests	2017-08-08 17:09:34 -07:00
Max Brunsfeld	17310769a4	Fix usage of .so instead of .dll on windows	2017-08-08 15:20:38 -07:00
Max Brunsfeld	bb7889fac5	Don't rely on PWD env var on windows	2017-08-08 15:04:27 -07:00
Max Brunsfeld	34b5340d71	Fix paths to corpus files on windows	2017-08-08 14:06:11 -07:00
Max Brunsfeld	cc7277fd7d	Avoid using IsNull bandit assertion	2017-08-08 12:52:35 -07:00
Max Brunsfeld	b43ae2826b	Use C++ stdlib for random number generation	2017-08-08 12:42:49 -07:00
Max Brunsfeld	fc0f49e4ee	Add windows implementations of some IO-related test helpers	2017-08-08 12:21:17 -07:00
Max Brunsfeld	947c161c2f	Use a constructor rather than aggregate initialization for Production	2017-08-08 10:41:54 -07:00
Max Brunsfeld	230f89d0ff	Fix build warnings in tests	2017-08-07 12:19:10 -07:00
Max Brunsfeld	94dc703bfc	Require that grammars' start rules be visible	2017-08-04 17:07:37 -07:00
Max Brunsfeld	1dca3a0b58	Simplify parse version reordering	2017-08-04 14:51:14 -07:00
Max Brunsfeld	e5c3bf742d	Update fixture grammars	2017-08-03 16:32:39 -07:00
Max Brunsfeld	84e4114f79	Allow conflicts involving repeat rules to be whitelisted, via their parent rule	2017-08-03 15:18:29 -07:00
Max Brunsfeld	119c67dd78	Fix conflict reporting for shift/reduce conflicts w/ multiple reductions We were failing to rule out shift actions with lower precedence. Signed-off-by: Philip Turnbull <philipturnbull@github.com>	2017-08-02 15:13:30 -07:00
Max Brunsfeld	09f4796f6b	Get tests passing w/ new alias API	2017-08-01 14:35:34 -07:00
Max Brunsfeld	cb5fe80348	Rename RENAME rule to ALIAS, allow it to create anonymous nodes	2017-07-31 16:41:11 -07:00
Max Brunsfeld	cbdfd89675	Mark reductions as fragile based on their final properties We previously maintained a set of individual productions that were involved in conflicts, but that was subtly incorrect because we don't compare productions themselves when comparing parse items; we only compare the parse items properties that could affect the final reduce actions.	2017-07-21 09:54:24 -07:00
Max Brunsfeld	7d9d8bce79	Handle inlined rules that contain other inlined rules	2017-07-20 15:29:06 -07:00
Max Brunsfeld	f33421c53e	Fix incorrect node renames in the presence of extra tokens	2017-07-18 21:24:34 -07:00
Max Brunsfeld	10d28d4b56	Merge pull request #92 from tree-sitter/utf16-oob Add test for UTF16 out-of-bound read	2017-07-18 17:24:31 -07:00
Phil Turnbull	52cec9ed39	Rework SpyInput buffer handling SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for catching logic errors but defeats valgrind's memory tracking. Use a separate buffer of exactly the correct size for each request. This correctly catches the problem under valgrind: ``` ==8694== Invalid read of size 2 ==8694== at 0x54EFFB: utf16_iterate (utf16.c:10) ==8694== by 0x551126: ts_lexer__get_lookahead (lexer.c:54) ==8694== by 0x5515CD: ts_lexer_start (lexer.c:154) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ==8694== by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267) ==8694== Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd ==8694== at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8694== by 0x507C3E: SpyInput::read(void, unsigned int) (spy_input.cc:66) ==8694== by 0x55103D: ts_lexer__get_chunk (lexer.c:29) ==8694== by 0x5515B6: ts_lexer_start (lexer.c:152) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ```	2017-07-18 12:16:37 -07:00
Max Brunsfeld	afb499bf2e	Handle rename symbols in ts_language APIs	2017-07-18 12:01:52 -07:00
Max Brunsfeld	de17c92462	Fix setup in stack test	2017-07-18 08:21:35 -07:00
Max Brunsfeld	085d96d89d	Add bash examples to benchmarks	2017-07-17 17:50:04 -07:00
Max Brunsfeld	45c40c8742	Update test grammars to use new serialization API	2017-07-17 17:46:46 -07:00
Max Brunsfeld	9a04231ab1	Remove length restriction in external scanner serialization API	2017-07-17 17:12:36 -07:00
Phil Turnbull	e7662c2213	Handle out-of-bound read in utf16_iterate Also simplify the test so we call `utf16_iterate` directly. Calling `utf16_iterate` via `SpyInput` and `ts_document_parse` doesn't seem to reliably trigger the problem using valgrind. valgrind also doesn't detect the problem if we use a string literal like: `utf16_iterate("", 1, &code_point);`	2017-07-17 13:57:12 -07:00
Phil Turnbull	035abc1e15	Add test for UTF16 out-of-bound read utf16_iterate does not check that 'length' is a multiple of two which leads to an out-of-bound read: ==105293== Conditional jump or move depends on uninitialised value(s) ==105293== at 0x54F014: utf16_iterate (utf16.c:7) ==105293== by 0x539251: string_iterate(TSInputEncoding, unsigned char const, unsigned long, int) (encoding_helpers.cc:15) ==105293== by 0x53939D: string_byte_for_character(TSInputEncoding, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) (encoding_helpers.cc:43) ==105293== by 0x507BAD: SpyInput::read(void, unsigned int) (spy_input.cc:47) ==105293== by 0x551049: ts_lexer__get_chunk (lexer.c:29) ==105293== by 0x5515C2: ts_lexer_start (lexer.c:152) ==105293== by 0x5469AB: parser(long,...)(long long) (parser.c:297) ==105293== by 0x547896: parser__get_lookahead (parser.c:439) ==105293== by 0x54B2DF: parser__advance (parser.c:1150) ==105293== by 0x54C2B6: parser_parse (parser.c:1348) ==105293== by 0x53F06F: ts_document_parse_with_options (document.c:136) ==105293== by 0x53EF4F: ts_document_parse (document.c:107)	2017-07-17 12:34:39 -07:00
Max Brunsfeld	34279257f9	Merge pull request #91 from tree-sitter/libFuzzer Add support for fuzzing with libFuzzer	2017-07-17 11:43:01 -07:00
Phil Turnbull	798ef5e4dc	Add libFuzzer support This adds support for fuzzing tree-sitter grammars with libFuzzer. This currently only works on Linux because of linking issues on macOS. Breifly, the AddressSanitizer library is dynamically linked into the fuzzer binary and cannot be found at runtime if built with a compiler that wasn't provided by Xcode(?). The runtime library is statically linked on Linux so this isn't a problem.	2017-07-14 13:50:41 -07:00
Max Brunsfeld	a22386e408	Fix compiler warnings in flatten_grammar_test	2017-07-14 10:26:34 -07:00
Max Brunsfeld	4b40a1ed6c	Support anonymous tokens inside of RENAME rules	2017-07-14 10:19:58 -07:00

1 2 3

111 commits