Commit graph

130 commits

Author SHA1 Message Date
Max Brunsfeld
31f3e866cf 📝 Add comment for non-terminal extra edge case 2020-03-02 14:21:03 -08:00
Max Brunsfeld
ee46218a73 Fix incremental parsing problem with non-terminal extras
Also add PHP grammar as a fixture to test against.
2020-03-02 14:17:12 -08:00
Tuấn-Anh Nguyễn
23261c4f6f Make ts_language_symbol_name return NULL for out-of-bound ids 2020-02-27 22:24:00 +07:00
Tuấn-Anh Nguyễn
c719e24a45 Make ts_language_field_name_for_id return NULL for out-of-bound id 2020-02-27 21:19:08 +07:00
Max Brunsfeld
570b83e2b2 query: Add immediate child operator 2020-02-19 11:47:52 -08:00
Max Brunsfeld
950a89a525 query: Differentiate between wildcard '*' and named wildcard '(*)' 2020-02-19 09:42:29 -08:00
Max Brunsfeld
1d6ea51b63 query: Make * operator only match named nodes 2020-02-18 21:32:52 -08:00
Max Brunsfeld
de8e3ee188 query: Allow multiple captures on a single node 2020-02-11 16:02:32 -08:00
Max Brunsfeld
d8c3f472d2 Fix fallout from ts_language_next_state fix 2020-02-10 12:00:58 -08:00
Max Brunsfeld
096014cb3e Clean up ts_language_next_state 2020-02-07 14:06:14 -08:00
Max Brunsfeld
ee7c29346a Small cleanup 2020-01-29 16:48:36 -08:00
Max Brunsfeld
7de36a33eb Remove halt_on_error API 2020-01-27 15:36:09 -08:00
Max Brunsfeld
9ffcb16392 Fix tree-balancing logic
Remove incorrect condition that would prevent balancing of repeating 
structures containing only tokens (nodes w/ no children).

Co-Authored-By: Rob Rix <robrix@github.com>
Co-Authored-By: Patrick Thomson <patrickt@users.noreply.github.com>
2020-01-23 10:26:53 -08:00
Max Brunsfeld
9f63139a10 Fix error when set_included_ranges is called with an invalid range list 2020-01-17 10:31:28 -08:00
Max Brunsfeld
f3747863df Add ts_query_disable_pattern API 2020-01-15 17:08:55 -08:00
Max Brunsfeld
3c4a24752b Tweak naming of TSQuery's pattern map variables 2020-01-15 17:08:07 -08:00
Patrick Thomson
39bfcdf595 Fix build with MinGW tooling. (#514)
Courtesy of @Eli-Zaretskii, these fixes should unblock people from
building tree-sitter with MinGW.

I don't think this is an unreasonable maintenance burden, especially
given the Emacs project's interest in using tree-sitter, but
@maxbrunsfeld gets the final call.
2020-01-06 09:21:40 -08:00
Maxim Sukharev
edb5693100 include language.h in query.c (#507)
Building `query.c` requires `TREE_SITTER_LANGUAGE_VERSION_WITH_SYMBOL_DEDUPING` which is defined in `language.h`.

It produces an error:
```
query.c:744:40: error: use of undeclared identifier 'TREE_SITTER_LANGUAGE_VERSION_WITH_SYMBOL_DEDUPING'
```

when building with cgo.
2019-12-16 09:38:18 -08:00
Max Brunsfeld
0cb2ef1082 Fix code paths that still conflated null characters with EOF 2019-12-06 15:29:03 -08:00
Max Brunsfeld
6d1d8cc217 query: Skip workaround code path when using new symbol map field 2019-12-06 12:11:45 -08:00
Max Brunsfeld
56c620c005 Store a mapping to ensure no two symbols map to the same metadata 2019-12-05 17:21:46 -08:00
Maxim Sukharev
a647de1ef5
add missing unicode include to query.c
it causes problems with building tree-sitter with cgo
2019-11-28 01:32:41 +01:00
Max Brunsfeld
e3f6b1a1af Query - If too many states, kill the one w/ the earliest capture 2019-11-22 11:54:12 -08:00
Damien Guard
599e4f0ec4
Fix a few compiler warnings 2019-11-20 10:21:10 -08:00
Max Brunsfeld
ce633a85c6 Improve ts_language_symbol_for_name function 2019-11-15 14:21:13 -08:00
Max Brunsfeld
967da88371 Avoid unnecessary recompiles between debug & test builds
This makes development much quicker when switching back and forth
between compiling with RLS while editing and running tests with
`cargo test`.
2019-11-14 13:34:25 -08:00
Max Brunsfeld
d3b7caa565 Add a TSLexer.eof() API, use it in generated parsers 2019-10-31 14:11:52 -07:00
Max Brunsfeld
a62b7a70f3 Lexer: track EOF state without relying on null character as lookahead 2019-10-31 14:11:52 -07:00
Max Brunsfeld
5a3a672e30 Expand on query docs 2019-10-30 10:26:10 -07:00
Max Brunsfeld
077cd4970c Handle empty list of included ranges w/ non-null pointer 2019-10-29 13:45:04 -07:00
Max Brunsfeld
42dfba29c6 Make ts_tree_get_changed_ranges less confusing 2019-10-28 15:33:41 -07:00
Björn Linse
124ae30138 fix invalid docs for ts_tree_get_changed_ranges 2019-10-25 21:19:33 +02:00
Max Brunsfeld
fcaabea0cf Allow non-terminal extras 2019-10-21 16:08:59 -07:00
Max Brunsfeld
e14e285a10 cli: Check queries when running tree-sitter test 2019-10-18 14:44:16 -07:00
Max Brunsfeld
64c6cf4473 Implicitly reset parser's state if language is changed after a timeout 2019-10-18 11:28:59 -07:00
Max Brunsfeld
fa43ce01a6 Allow queries to capture ERROR nodes 2019-10-16 11:54:32 -07:00
Max Brunsfeld
f490befcde Add ts_query_disable_capture API 2019-10-14 12:30:22 -07:00
Max Brunsfeld
4c17af3ecd Allow queries with no patterns 2019-10-14 12:30:22 -07:00
Max Brunsfeld
c153711539 query: Avoid splitting states on nodes that don't contain captures 2019-10-14 12:30:22 -07:00
Matthew Krupcale
ee9a3c0ebb lib: remove utf8proc dependency (#436)
* Remove dependency on utf8proc

This removes the only external dependency on utf8proc for UTF-8 decoding. It does so by implementing its own UTF-8 decoder. This decoder is both faster and has a simpler API.

 * .gitmodules: remove utf8proc submodule
 * docs/section-2-using-parsers.md: remove requirement for utf8proc submodule
 * docs/section-6-contributing.md: likewise
 * lib/Cargo.toml: remove utf8proc subdirectory package include
 * lib/README.md: remove utf8proc subdirectory description
 * lib/binding_rust/build.rs: remove utf8proc compiler include directory
 * lib/src/lexer.c: remove utf8proc dependencies and types
 * lib/src/lib.c: remove utf8proc dependency
 * lib/src/unicode.h: define types for Unicode decoders
 * lib/src/utf16.{c,h}: implement more readable UTF-16 decoder
 * lib/src/utf8.{c,h}: implement fast UTF-8 decoder
 * lib/utf8proc: remove utf8proc submodule directory
 * script/build-lib: remove utf8proc compiler include directory
 * script/build-wasm: likewise

* Optimize ts_lexer__get_lookahead.

Try to favor non-failure code path and assign lookahead values directly to lexer

 * lib/src/lexer.c: optimize for non-failure code path

* Fix some compiler errors

 * lib/src/lexer.c: cast from signed to unsigned for decode_next result
 * lib/src/utf16.c: fix non-constant initializers for older compilers

* Remove some missed remnants of utf8proc

 * docs/section-2-using-parsers.md: only two include paths necessary now
 * lib/src/lib.c: no need to define UTF8PROC_STATIC

* Use ICU's utf8 and utf16 decoding routines

* Remove unnecessary casts when calling icu macros

* Check buffer length before attempting to decode a unicode character

* Use new unicode function when parsing Queries

Co-Authored-By: Matthew Krupcale <mkrupcale@matthewkrupcale.com>

* Mark libicu files as vendored for GitHub's stats
2019-10-14 11:18:39 -07:00
Max Brunsfeld
9872a083b7 rust: Change QueryCursor::captures to expose the full match 2019-10-03 12:45:58 -07:00
Max Brunsfeld
cb87b7b76e Fix invalid read by query cursor on error nodes
🎩 @bfredl

Refs https://github.com/tree-sitter/tree-sitter/pull/448#issuecomment-536337749
2019-10-01 11:28:51 -07:00
Björn Linse
1d2d043390 fix compiler warning with comparing char with `TSSymbolType' 2019-09-30 19:24:40 +02:00
Max Brunsfeld
b15e90bd26 Handle set! predicate function in queries 2019-09-24 11:54:24 -07:00
Max Brunsfeld
ff9a2c1f53 Make queries work in languages with simple aliases 2019-09-24 11:54:24 -07:00
Björn Linse
15e3bc7fd2 Fix some compiler warnings regarding function prototypes 2019-09-22 11:49:44 +02:00
Max Brunsfeld
a6b6a681ec Fix a bug that prevented early termination of query matches 2019-09-18 16:13:10 -07:00
Max Brunsfeld
186b08381c Terminate failed query matches before descending whenever possible
When iterating over captures, this prevents reasonable queries from 
forcing the tree cursor to buffer matches unnecessarily.
2019-09-18 11:37:49 -07:00
Max Brunsfeld
374a7ac81e Ensure that duplicate captures are ordered by pattern index 2019-09-17 16:27:16 -07:00
Max Brunsfeld
82955759c0 Add an API for getting a pattern's start offset in the source code 2019-09-17 16:19:58 -07:00