An example of an error cycle in a `parser.c`:
```
static const TSSymbol ts_symbol_map[] = {
...
[anon_sym_RBRACE] = anon_sym_RBRACE2,
[anon_sym_RBRACE2] = anon_sym_RBRACE,
...
};
```
Due to an oversight in #1589, I added `primary_field_ids` into the
`TSLanguage` struct in a place that wasn't the end. This is not actually
backwards compatible and causes downstream failures :(
This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).
With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
This is a follow-up to my previous commit 1badd131f9 .
I've made this an extra patch as it requires a minor
API change in <tree_sitter/parser.h>.
This commit moves the remaining generated tables into
the read-only segment.
Before:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
text data bss dec hex filename
5353477 24472 0 5377949 520f9d (TOTALS)
After:
$ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
done
$ size --totals *.o
5378147 0 0 5378147 521063 (TOTALS)
This moves most of the generated tables from the data segment into
the text segment (read-only memory) so that it can be shared between
different processes.
As a bonus side effect we can also remove all casts in the generated parsers.
Before:
size --totals target/scratch/*.so
text data bss dec hex filename
853623 4684560 2160 5540343 5489f7 (TOTALS)
After:
size --totals target/scratch/*.so
text data bss dec hex filename
5472086 68616 480 5541182 548d3e (TOTALS)
No need to restrict it to char sets used in multiple places.
This is important because the helper functions are now implemented
more efficiently than the inline comparisons (using a binary search).