Don't start with duplicate states in ts_query__analyze_patterns

This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).

With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
This commit is contained in:
Alex Pinkus 2022-01-11 21:57:06 -08:00
parent bf210f0c9e
commit eaf9b170f1
6 changed files with 64 additions and 23 deletions

View file

@ -21,7 +21,7 @@ extern "C" {
* The Tree-sitter library is generally backwards-compatible with languages
* generated using older CLI versions, but is not forwards-compatible.
*/
#define TREE_SITTER_LANGUAGE_VERSION 13
#define TREE_SITTER_LANGUAGE_VERSION 14
/**
* The earliest ABI version that is supported by the current version of the

View file

@ -110,6 +110,7 @@ struct TSLanguage {
const TSSymbol *public_symbol_map;
const uint16_t *alias_map;
const TSSymbol *alias_sequences;
const TSStateId *ts_primary_state_ids;
const TSLexMode *lex_modes;
bool (*lex_fn)(TSLexer *, TSStateId);
bool (*keyword_lex_fn)(TSLexer *, TSStateId);