Commit graph

257 commits

Author SHA1 Message Date
Jake Sarjeant
61b70943b1 feat(cli): add option to select JS runtime other than node 2023-08-03 21:34:47 +03:00
Amaan Qureshi
b8fe5fe21b fix: do not allow eof to advance states if the new state is the same state 2023-08-02 10:47:27 +01:00
Andrew Hlynskyi
a2f834d846 More error contexts + conv panics to errors with context 2023-07-30 21:16:45 +03:00
Amaan Qureshi
f4e788b28e
feat: warn when unused conflicts are present in a grammar 2023-07-28 00:23:28 -04:00
Amaan Qureshi
c521e9c18e
chore: improve error message in some spots loading grammar.json 2023-07-24 00:44:44 -04:00
Amaan Qureshi
5fba369c4a
fix: disallow inlining the first rule
This prevents a panic when indexing symbol_ids during the generation process
2023-07-19 16:14:58 -04:00
Andrew Hlynskyi
0b0cc6c429 Fix rustc 1.71.0 warnings 2023-07-13 17:50:04 +03:00
Andreas Deininger
0751736d17 docs: convert various links to https protocol 2023-04-04 18:05:46 +03:00
Max Brunsfeld
6b87326470
Merge pull request #1787 from kianmeng/fix-typos
Fix typos
2022-08-25 10:25:39 -07:00
Nat Mote
4e3179fbc0
Avoid extracting default alias for extras
Fixes #1834
2022-08-10 07:27:34 -07:00
Kian-Meng Ang
b8552ec6c4 Fix typos 2022-06-28 19:57:42 +08:00
Max Brunsfeld
4b93326898 Don't generate primary states array if it will be unused due to abi version setting 2022-03-02 14:57:59 -08:00
Max Brunsfeld
9866674cf8
Merge pull request #1660 from alex-pinkus/expanded-regex-support
Expand regex support to include emojis and binary ops
2022-02-24 17:14:23 -08:00
Alex Pinkus
8fadf18655 Expand regex support to include emojis and binary ops
The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.

The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.

For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.

Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
2022-02-19 11:41:36 -08:00
Max Brunsfeld
994cb61f2c Always generate parser.h, regardless of chosen ABI version
For some ABI changes, we may need to make changes to the parser.h in order
to restore a previous binary format, but for the current range of supported
ABI versions (13 + 14), the current parser.h is fine.

Refs #1599
2022-01-23 10:29:52 -08:00
Max Brunsfeld
82ceebc10d 🎨 Use base struct syntax to clean up grammar expectations 2022-01-20 17:17:46 -08:00
Alex Pinkus
858ea5782b Fix back compat by moving primary_field_ids to the end
Due to an oversight in #1589, I added `primary_field_ids` into the
`TSLanguage` struct in a place that wasn't the end. This is not actually
backwards compatible and causes downstream failures :(
2022-01-17 17:23:02 -08:00
Max Brunsfeld
516fd6f6de Add --abi flag to generate command, generate version 13 by default 2022-01-17 14:50:47 -08:00
Alex Pinkus
eaf9b170f1 Don't start with duplicate states in ts_query__analyze_patterns
This change exposes a new `primary_state_ids` field on the `TSLanguage`
struct, and populates it by tracking the first encountered state with a
given `core_id`. (For posterity: the initial change just exposed
`core_id` and deduplicated within `ts_analyze_query`).

With this `primary_state_ids` field in place, the
`ts_query__analyze_patterns` function only needs to populate its
subgraphs with starting states that are _primary_, since non-primary
states behave identically to primary ones. This leads to large savings
across the board, since most states are not primary.
2022-01-16 11:17:47 -08:00
Max Brunsfeld
86b408412c Use serde's derive feature everywhere 2021-11-21 13:39:30 -08:00
Max Brunsfeld
a0c085bbec Return an error when trying to inline a token
Fixes #1420
2021-11-19 13:02:04 -08:00
Max Brunsfeld
d05c665863 Convert some of the fixture grammars from JSON to JS
These tests are easier to write and maintain if the grammars are just JS,
like grammars normally are. It doesn't slow the tests down significantly
to shell out to `node` for each of these grammars.
2021-10-22 18:47:23 -06:00
Razze
956705a23d Update to unicode standard 14 2021-10-10 16:40:31 +02:00
FnControlOption
e030434ca7 Handle aliases in unicode property escapes in regexes 2021-08-18 22:22:46 -07:00
Paul Gey
a533e4d7bb Remove unnecessary borrows
This produces an `unused_must_use` warning on nightly:
https://github.com/rust-lang/rust/pull/86426
2021-08-14 15:44:24 +02:00
Max Brunsfeld
c6dd5da5e6
Merge pull request #1329 from narpfel/improve-performance
Improve performance of `tree-sitter generate`
2021-08-11 16:08:23 -07:00
Paul Gey
965e3c9e5e Generator::add_parse_table: Store entries in hash map
This avoids a quadratic behaviour due to repeatedly using `find` on a
growing `Vec`.
2021-08-08 21:45:43 +02:00
Paul Gey
cf69a2c94c Use IndexMap and FxHash for some hot hash maps 2021-08-08 21:45:43 +02:00
Andrew Hlynskyi
533073cdb5 fix(cli): Remove tree-sitter grammar ./... call limitation 2021-08-06 02:11:35 +03:00
Max Brunsfeld
c512a0eed7
Merge pull request #1194 from ahlinc/fix/1032
Close #1032 - fix all weirdness in the generated Cargo.toml
2021-06-29 16:48:23 -07:00
Andrew Hlynskyi
f22d62393b fix(cli): actual Rust binding version in generated Cargo.toml 2021-06-30 00:36:11 +03:00
Andrew Hlynskyi
d3527109a8 Updating of binding.gyp should depend on its content instead of bindings/node folder 2021-06-23 02:42:48 +03:00
Andrew Hlynskyi
22d63338a2 Use double quoted patterns for more precise pattern matching in the binding.gyp files 2021-06-23 02:41:30 +03:00
Andrew Hlynskyi
86b8137457 Add create_path_else fn to handle creation or modification 2021-06-23 02:40:32 +03:00
Andrew Hlynskyi
797c7668c1 feat(cli): Independant language binding files generation 2021-06-23 02:39:38 +03:00
Andrew Hlynskyi
4578e58794 fix(cli): close #1032 - fix repository template url generation in cargo.toml 2021-06-23 01:02:29 +03:00
Douglas Creager
d2d01e77e3 cli: Use anyhow and thiserror for errors
This patch updates the CLI to use anyhow and thiserror for error
management.  The main feature that our custom `Error` type was providing
was a _list_ of messages, which would allow us to annotate "lower-level"
errors with more contextual information.  This is exactly what's
provided by anyhow's `Context` trait.

(This is setup work for a future PR that will pull the `config` and
`loader` modules out into separate crates; by using `anyhow` we wouldn't
have to deal with a circular dependency between with the new crates.)
2021-06-09 16:17:23 -04:00
Andrew Hlynskyi
3c0152a331 chore(fmt): Apply 'cargo fmt' to the whole code base 2021-05-19 23:21:43 +03:00
Markus F.X.J. Oberhumer
cc519b3121 cli: Improve const-correctness of the generated parsers (part 2 of 2).
This is a follow-up to my previous commit 1badd131f9 .

I've made this an extra patch as it requires a minor
API change in <tree_sitter/parser.h>.

This commit moves the remaining generated tables into
the read-only segment.

Before:
  $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
       gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
    done
  $ size --totals *.o
      text    data     bss     dec     hex filename
   5353477   24472       0 5377949  520f9d (TOTALS)

After:
  $ for f in bash c cpp go html java javascript jsdoc json php python ruby rust; do \
       gcc -o $f.o -O2 -Ilib/include -c test/fixtures/grammars/$f/src/parser.c; \
    done
  $ size --totals *.o
   5378147       0       0 5378147  521063 (TOTALS)
2021-05-19 12:49:57 +02:00
Andrew Hlynskyi
b856f7e1bd Remove unneeded dead_code annotations 2021-04-30 06:55:00 +03:00
Markus F.X.J. Oberhumer
1badd131f9 cli: Improve const-correctness of the generated parsers.
This moves most of the generated tables from the data segment into
the text segment (read-only memory) so that it can be shared between
different processes.

As a bonus side effect we can also remove all casts in the generated parsers.

Before:
  size --totals target/scratch/*.so
      text    data     bss     dec     hex filename
    853623 4684560    2160 5540343  5489f7 (TOTALS)

After:
  size --totals target/scratch/*.so
      text    data     bss     dec     hex filename
   5472086   68616     480 5541182  548d3e (TOTALS)
2021-04-27 09:22:18 +02:00
Andrew Hlynskyi
7aa538dd97 fix(cli): use dashed language name in generated package.json and Cargo.toml files 2021-04-22 16:29:48 +03:00
Andrew Hlynskyi
9416f975d3 fix(cli): set actual cli version in generated package.json 2021-04-22 16:29:48 +03:00
an-kumar
aabe6100d0
Update generated Cargo.toml's tree-sitter dependency
tree-sitter 0.19.0 bumped the language version from 12 to 13. `npm install tree-sitter-cli` gets a recent version of tree-sitter, which generates languages with language version 13. However, the Cargo.toml generated from `tree-sitter generate` still has a an old tree-sitter as a dependency. This causes the rust bindings to not work out of the box, as the tree-sitter library expects language version 12.

It would be nice to add a test for this in CI.  `tree-sitter generate` already creates a test for the rust binding, and that test fails out of the box due to the language mismatch.
2021-04-09 10:59:51 -07:00
Max Brunsfeld
c3eb5daa31 Include has_preceding_inherited_fields in Item's hash impl 2021-03-27 10:08:24 -07:00
Max Brunsfeld
57036b4f8a Extract lexer helper functions for all large char sets
No need to restrict it to char sets used in multiple places.
This is important because the helper functions are now implemented
more efficiently than the inline comparisons (using a binary search).
2021-03-11 11:48:48 -08:00
Andrew Hlynskyi
a331607f4e dsl.js: Reuse sym() in RuleBuilder 2021-03-10 23:06:53 +02:00
Max Brunsfeld
9e50befcf8 For node-types.json, process supertypes in a stable order 2021-03-08 12:02:01 -08:00
Max Brunsfeld
8e894ff3f1 Add --no-bindings flag to generate subcommand 2021-03-08 12:01:45 -08:00
Max Brunsfeld
7300249d20 Fix incorrect merging of states with different inherited fields
Co-authored-by: Douglas Creager <dcreager@dcreager.net>
2021-03-05 14:49:28 -08:00