Commit graph

73 commits

Author SHA1 Message Date
Andrew Hlynskyi
4a007259fc Fix warning from #2454 in more clear way 2023-08-10 03:59:34 +03:00
Amaan Qureshi
b8fe5fe21b fix: do not allow eof to advance states if the new state is the same state 2023-08-02 10:47:27 +01:00
Max Brunsfeld
4b93326898 Don't generate primary states array if it will be unused due to abi version setting 2022-03-02 14:57:59 -08:00
Alex Pinkus
858ea5782b Fix back compat by moving primary_field_ids to the end
Due to an oversight in #1589, I added `primary_field_ids` into the
`TSLanguage` struct in a place that wasn't the end. This is not actually
backwards compatible and causes downstream failures :(
2022-01-17 17:23:02 -08:00
Max Brunsfeld
516fd6f6de Add --abi flag to generate command, generate version 13 by default 2022-01-17 14:50:47 -08:00
Alex Pinkus
eaf9b170f1 Don't start with duplicate states in ts_query__analyze_patterns
This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).

With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
2022-01-16 11:17:47 -08:00
Paul Gey
965e3c9e5e Generator::add_parse_table: Store entries in hash map
This avoids a quadratic behaviour due to repeatedly using `find` on a
growing `Vec`.
2021-08-08 21:45:43 +02:00
Andrew Hlynskyi
3c0152a331 chore(fmt): Apply 'cargo fmt' to the whole code base 2021-05-19 23:21:43 +03:00
Markus F.X.J. Oberhumer
cc519b3121 cli: Improve const-correctness of the generated parsers (part 2 of 2).
This is a follow-up to my previous commit 1badd131f9 .

I've made this an extra patch as it requires a minor
API change in <tree_sitter/parser.h>.

This commit moves the remaining generated tables into
the read-only segment.

Before:
  $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
       gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
    done
  $ size --totals *.o
      text    data     bss     dec     hex filename
   5353477   24472       0 5377949  520f9d (TOTALS)

After:
  $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
       gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
    done
  $ size --totals *.o
   5378147       0       0 5378147  521063 (TOTALS)
2021-05-19 12:49:57 +02:00
Markus F.X.J. Oberhumer
1badd131f9 cli: Improve const-correctness of the generated parsers.
This moves most of the generated tables from the data segment into
the text segment (read-only memory) so that it can be shared between
different processes.

As a bonus side effect we can also remove all casts in the generated parsers.

Before:
  size --totals target/scratch/*.so
      text    data     bss     dec     hex filename
    853623 4684560    2160 5540343  5489f7 (TOTALS)

After:
  size --totals target/scratch/*.so
      text    data     bss     dec     hex filename
   5472086   68616     480 5541182  548d3e (TOTALS)
2021-04-27 09:22:18 +02:00
Max Brunsfeld
57036b4f8a Extract lexer helper functions for all large char sets
No need to restrict it to char sets used in multiple places.
This is important because the helper functions are now implemented
more efficiently than the inline comparisons (using a binary search).
2021-03-11 11:48:48 -08:00
Max Brunsfeld
592fd8678d Organize TSLanguage fields
Due to the breaking ABI change in #943, this is our chance
to reorder the fields in a more logical way.
2021-03-01 10:27:22 -08:00
Max Brunsfeld
d56f9ebe4e Re-enable --prev-abi flag to generate command 2021-02-26 14:51:01 -08:00
Max Brunsfeld
c1639cc456 Add production_id_count field to Language objects
I think this is the last additional field that's needed so
that every array member of TSLanguage has a length that
can be calculated at runtime.
2021-02-25 16:32:05 -08:00
Max Brunsfeld
29bc26ecd5 Fix test failure after non-terminal extras change 2021-02-18 15:43:01 -08:00
Max Brunsfeld
86a891fa63 Fix bugs in parser generation for non-terminal extras
Previously, we attempted to completely separate the parse states
for item sets with non-terminal extras from the parse states
for other rules. But there was not a complete separation.

It actually isn't necessary to separate the parse states in this way.
The only special behavior for parse states with non-terminal extra rules
is what happens at the *end* of the rule: these parse states need to
perform an unconditional reduction.

Luckily, it's possible to distinguish these *non-terminal extra ending*
states from other states just based on their normal structure, with
no additional state.
2021-02-18 14:14:22 -08:00
Max Brunsfeld
6ae04051e7 Tweak whitespace in generated character set functions 2021-02-17 16:32:49 -08:00
Max Brunsfeld
dad8546776 Generate more compact code for character set binary search 2021-02-17 13:52:23 -08:00
Max Brunsfeld
6132a10b1c Use binary search in generated character set functions 2021-02-17 13:08:56 -08:00
Max Brunsfeld
ab78ab3f9b Represent CharacterSet internally as a vector of ranges 2021-01-28 16:10:39 -08:00
Max Brunsfeld
3497f34dd7 Fix parser-generation bugs introduced in #782 2020-11-02 13:43:28 -08:00
Max Brunsfeld
071f4e40f1 Fix generate error when there are aliases in unused rules 2020-10-28 12:34:16 -07:00
Max Brunsfeld
a2d760e426 Ensure nodes are aliased consistently within syntax error nodes
Co-Authored-By: Rick Winfrey <rewinfrey@github.com>
2020-10-27 15:46:09 -07:00
Max Brunsfeld
8bb8e9b8b3 Initialize TSLanguage fields in order of their declaration
This makes parser.c valid under the C++20 standard
2020-10-15 07:20:12 -07:00
Max Brunsfeld
ffd3bdc4c1 Escape ? in C string literals
Fixes #714
2020-09-23 13:06:06 -07:00
Max Brunsfeld
b5a9adb555 Allow queries to match on supertypes
Co-authored-by: Ayman Nadeem <aymannadeem@github.com>
2020-09-21 12:34:48 -07:00
Max Brunsfeld
ff488f89c9 Make the --prev-abi flag work w/ the newest abi change 2020-09-08 10:58:20 -07:00
Max Brunsfeld
2eb04094f8 Handle aliased parent nodes in query analysis 2020-08-21 14:12:04 -07:00
Max Brunsfeld
4c2f36a07b Mark steps as definite on query construction
* Add a ts_query_pattern_is_definite API, just for debugging this
* Store state_count on TSLanguage structs, to allow for scanning parse tables
2020-06-25 15:06:27 -07:00
Max Brunsfeld
ec870e9e66 Avoid extracting helpers for char sets that are only used once 2020-05-26 16:37:45 -07:00
Max Brunsfeld
911fb7f1b2
Extract helper functions to reduce the code size of the lexer function (#626)
* Extract helper functions to reduce code size of ts_lex

* Name char set helper functions based on token name
2020-05-26 13:39:11 -07:00
Max Brunsfeld
b66d149b74 Fix inconsistent whitespace after '{' in generated parser 2020-05-13 15:56:49 -07:00
Max Brunsfeld
cdc973866f Fix build-wasm command on latest emscripten 2020-05-12 15:42:11 -07:00
Riccardo Schirone
780e9cecc9 Do not use multiple unnamed structs inside of unions 2020-04-29 20:42:45 +02:00
Max Brunsfeld
a003e5f6bd generate: Avoid duplicate string tokens in unique symbol map 2020-03-20 11:35:11 -07:00
Alyssa Verkade
0e689657b7 Add a language linkage declaration to parsers
Previously, in order to compile a `tree-sitter` grammar that contained
c++ source in the parser (ie the `scanner.cc` file), you would have to
compile the `parser.c` file separately from the c++ files. For example,
in rust this would result in a `build.rs` close to the following:
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    // NOTE: must have a name that differs from the c static lib
    .compile("tree-sitter-ruby-scanner");

  cc::Build::new()
    .include(&dir)
    .file(dir.join("parser.c"))
    // NOTE: must have a name that differs from the c++ static lib
    .compile("tree-sitter-ruby-parser");
}
```

This was necessary at the time for the following grammars: `ruby`,
`php`, `python`, `embedded-template`, `html`, `cpp`, `ocaml`,
`bash`, `agda`, and `haskell`.

To solve this, we specify an `extern "C"` language linkage declaration
to the functions that must be linked against to compile a parser with the
scanner, making parsers linkable against c++ source.
On all major compilers (gcc, clang, and msvc) this should be the only
change needed due to the combination of clang and gcc both supporting
designated initialization for years and msvc 2019 adopting designated
initializers as a part of the C++20 conformance push.

Subsequently, for rust projects, the necessary `build.rs` would become
(which also brings these parsers into sync with the current docs):
```
extern crate cc;

fn main() {
  let dir: PathBuf = ["tree-sitter-ruby", "src"].iter().collect();

  cc::Build::new()
    .include(&dir)
    .cpp(true)
    .file(dir.join("scanner.cc"))
    .file(dir.join("parser.c"))
    .compile("tree-sitter-ruby");
}
```
2020-02-18 19:46:59 -08:00
Max Brunsfeld
8dd68c360a Fix logic for generating unique symbol map
Previously, this didn't correctly handle the case where *multiple* 
symbols were all simply-aliased to the same *other* symbol.

Refs #500
2020-01-27 12:06:48 -08:00
Max Brunsfeld
fc19312913 Fix node-types bugs involving aliases and external tokens 2019-12-12 10:06:18 -08:00
Max Brunsfeld
a5a9000e29 generate: Ensure that field_map_slices array is long enough 2019-12-09 11:46:32 -08:00
Max Brunsfeld
7032dae4f6 Include alias symbols in unique symbol map 2019-12-06 12:11:09 -08:00
Max Brunsfeld
56c620c005 Store a mapping to ensure no two symbols map to the same metadata 2019-12-05 17:21:46 -08:00
Max Brunsfeld
5767bbc806 Avoid generating C char literals with control characters
Fixes #487
2019-11-13 10:54:34 -08:00
Max Brunsfeld
d765332c61 Don't rely on new eof ABI in parsers unless --next-abi is passed 2019-10-31 14:32:50 -07:00
Max Brunsfeld
d3b7caa565 Add a TSLexer.eof() API, use it in generated parsers 2019-10-31 14:11:52 -07:00
Max Brunsfeld
fcaabea0cf Allow non-terminal extras 2019-10-21 16:08:59 -07:00
Max Brunsfeld
69ab405325 In next ABI, group symbols by action in small parse state table
This is a more compact representation because in most states, many 
symbols share the same actions.
2019-08-30 20:29:55 -07:00
Max Brunsfeld
8037607583 Only generate the new parse table format if --next-abi flag is used 2019-08-29 17:37:33 -07:00
Max Brunsfeld
82ff542d3b Appease MSVC by avoiding empty arrays 2019-08-29 17:31:44 -07:00
Max Brunsfeld
09a2755399 Store parse states with few lookahead symbols in a more compact way 2019-08-29 15:52:23 -07:00
Max Brunsfeld
48a883c1d4 Move external token state id computation out of render module 2019-08-29 15:48:22 -07:00