Due to an oversight in #1589, I added `primary_field_ids` into the
`TSLanguage` struct in a place that wasn't the end. This is not actually
backwards compatible and causes downstream failures :(
This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).
With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
- Use a proper enum type for quantifiers.
- Drop quantifiers from `TSQueryStep`, which was not used.
- Keep track of the captures introduced during a pattern parse, and
apply the quantifier for the pattern to the captures that were
introduced by the pattern or any sub patterns.
- Use 'quantifier' instead of 'suffix'.
The default is now a whopping 64K matches, which "should be enough for
everyone". You can use the new `ts_query_cursor_set_match_limit`
function to set this to a lower limit, such as the previous default of
32.
This function (and the similar `ts_tree_cursor_goto_first_child_for_byte`)
allows you to efficiently seek the tree cursor to a given position,
exploiting the tree's internal balancing, without having to visit
all of the preceding siblings of each node.
This restores the original signatures of the `set_byte_range` and
`set_point_range` functions. Now, the QueryCursor will properly report
matches that intersect, but are not fully contained by its range.
Co-Authored-By: Nathan Sobo <nathan@zed.dev>
This is a follow-up to my previous commit 1badd131f9 .
I've made this an extra patch as it requires a minor
API change in <tree_sitter/parser.h>.
This commit moves the remaining generated tables into
the read-only segment.
Before:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
text data bss dec hex filename
5353477 24472 0 5377949 520f9d (TOTALS)
After:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
5378147 0 0 5378147 521063 (TOTALS)
This PR adds an API to get name of the field of TSNode's child.
It uses same set of arguments as that of ts_node_child, but returns
field name if it is found, otherwise it returns NULL.
This API is useful to implement custom printing of S-expressions such
as following:
"(binary_expression
(binary_expression_left (identifier))
(binary_expression_operator ("+"))
(binary_expression_right (identifier)
)"
Currently, ts_node_string does not allow any customization for printing.