An example of an error cycle in a `parser.c`:
```
static const TSSymbol ts_symbol_map[] = {
...
[anon_sym_RBRACE] = anon_sym_RBRACE2,
[anon_sym_RBRACE2] = anon_sym_RBRACE,
...
};
```
Due to an oversight in #1589, I added `primary_field_ids` into the
`TSLanguage` struct in a place that wasn't the end. This is not actually
backwards compatible and causes downstream failures :(
This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).
With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
This is a follow-up to my previous commit 1badd131f9 .
I've made this an extra patch as it requires a minor
API change in <tree_sitter/parser.h>.
This commit moves the remaining generated tables into
the read-only segment.
Before:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
text data bss dec hex filename
5353477 24472 0 5377949 520f9d (TOTALS)
After:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
5378147 0 0 5378147 521063 (TOTALS)
This moves most of the generated tables from the data segment into
the text segment (read-only memory) so that it can be shared between
different processes.
As a bonus side effect we can also remove all casts in the generated parsers.
Before:
size --totals target/scratch/*.so
text data bss dec hex filename
853623 4684560 2160 5540343 5489f7 (TOTALS)
After:
size --totals target/scratch/*.so
text data bss dec hex filename
5472086 68616 480 5541182 548d3e (TOTALS)
No need to restrict it to char sets used in multiple places.
This is important because the helper functions are now implemented
more efficiently than the inline comparisons (using a binary search).
Previously, we attempted to completely separate the parse states
for item sets with non-terminal extras from the parse states
for other rules. But there was not a complete separation.
It actually isn't necessary to separate the parse states in this way.
The only special behavior for parse states with non-terminal extra rules
is what happens at the *end* of the rule: these parse states need to
perform an unconditional reduction.
Luckily, it's possible to distinguish these *non-terminal extra ending*
states from other states just based on their normal structure, with
no additional state.