The `Emoji` property alias is already present, but the actual property
is not available since it lives in a new file. This adds that file to
the `generate-unicode-categories-json`.
The `emoji-data` file follows the same format as the ones we already
consume in `generate-unicode-categories-json`, so adding emoji support
is fairly easy. his, grammars would need to hard-code a set of
unicode ranges in their own regex. The Javascript library `emoji-regex`
cannot be used because of #451.
For unclear reasons, the characters #, *, and 0-9 are marked as
`Emoji=Yes` by `emoji-data.txt`. Because of this, a grammar that wishes
to use emojis is likely to want to exclude those characters. For that
reason, this change also adds support for binary operations in regexes,
e.g. `[\p{Emoji}&&[^#*0-9]]`.
Lastly (and perhaps controversially), this change introduces new
variables available at grammar compile time, for the major, minor, and
patch versions of the tree-sitter CLI used to compile the grammar. This
will allow grammars to conditionally adopt these new regex features
while remaining backward compatible with older versions of the CLI.
Without this part of the change, grammar authors who do not precompile
and check-in their `grammar.json` would need to wait for downstream
systems to adopt a newer tree-sitter CLI version before they could begin
to use these features.
This patch adds the `tree-sitter-config` crate, which manages
tree-sitter's configuration file. This new setup allows different
components to define their own serializable configuration types, instead
of having to create a single monolithic configuration type. But the
configuration itself is still stored in a single JSON file.
Before, the default location for the configuration file was
`~/.tree-sitter/config.json`. This patch updates the default location
to follow the XDG Base Directory spec (or other relevant platform-
specific spec). So on Linux, for instance, the new default location is
`~/.config/tree-sitter/config.json`. We will look in the new location
_first_, and fall back on reading from the legacy location if we can't
find anything.
This patch adds a new `tree-sitter-loader` crate, which holds the CLI's
logic for finding and building local grammar definitions at runtime.
This allows other command-line tools to use this logic too!
This patch updates the CLI to use anyhow and thiserror for error
management. The main feature that our custom `Error` type was providing
was a _list_ of messages, which would allow us to annotate "lower-level"
errors with more contextual information. This is exactly what's
provided by anyhow's `Context` trait.
(This is setup work for a future PR that will pull the `config` and
`loader` modules out into separate crates; by using `anyhow` we wouldn't
have to deal with a circular dependency between with the new crates.)
We were only walking one level of depth into the `queries/` folder
during invocations of `test`, which made us attempt to open folders
rather than recurse into them.
We have to pull in the `walkdir` crate, which is required for
cross-platform walking of directories.
Fixes#938.